Įvadas¶

Tikrinama hipotezė: Ar pilnatis turi įtakos eismo įvykiams Jungtinėse valstijose 2021 metais

Duomenų paėmimas¶

US_accidents valymas¶

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
In [2]:
pd.set_option('max_rows', 100)
pd.set_option('max_columns', 100)
In [3]:
us_original = pd.read_csv(r"C:\Users\asein\OneDrive\Documents\US_Accidents_Dec21_updated.csv")
In [4]:
us_original.head(5)
Out[4]:
ID Severity Start_Time End_Time Start_Lat Start_Lng End_Lat End_Lng Distance(mi) Description Number Street Side City County State Zipcode Country Timezone Airport_Code Weather_Timestamp Temperature(F) Wind_Chill(F) Humidity(%) Pressure(in) Visibility(mi) Wind_Direction Wind_Speed(mph) Precipitation(in) Weather_Condition Amenity Bump Crossing Give_Way Junction No_Exit Railway Roundabout Station Stop Traffic_Calming Traffic_Signal Turning_Loop Sunrise_Sunset Civil_Twilight Nautical_Twilight Astronomical_Twilight
0 A-1 3 2016-02-08 00:37:08 2016-02-08 06:37:08 40.108910 -83.092860 40.112060 -83.031870 3.230 Between Sawmill Rd/Exit 20 and OH-315/Olentang... NaN Outerbelt E R Dublin Franklin OH 43017 US US/Eastern KOSU 2016-02-08 00:53:00 42.1 36.1 58.0 29.76 10.0 SW 10.4 0.00 Light Rain False False False False False False False False False False False False False Night Night Night Night
1 A-2 2 2016-02-08 05:56:20 2016-02-08 11:56:20 39.865420 -84.062800 39.865010 -84.048730 0.747 At OH-4/OH-235/Exit 41 - Accident. NaN I-70 E R Dayton Montgomery OH 45424 US US/Eastern KFFO 2016-02-08 05:58:00 36.9 NaN 91.0 29.68 10.0 Calm NaN 0.02 Light Rain False False False False False False False False False False False False False Night Night Night Night
2 A-3 2 2016-02-08 06:15:39 2016-02-08 12:15:39 39.102660 -84.524680 39.102090 -84.523960 0.055 At I-71/US-50/Exit 1 - Accident. NaN I-75 S R Cincinnati Hamilton OH 45203 US US/Eastern KLUK 2016-02-08 05:53:00 36.0 NaN 97.0 29.70 10.0 Calm NaN 0.02 Overcast False False False False True False False False False False False False False Night Night Night Day
3 A-4 2 2016-02-08 06:51:45 2016-02-08 12:51:45 41.062130 -81.537840 41.062170 -81.535470 0.123 At Dart Ave/Exit 21 - Accident. NaN I-77 N R Akron Summit OH 44311 US US/Eastern KAKR 2016-02-08 06:54:00 39.0 NaN 55.0 29.65 10.0 Calm NaN NaN Overcast False False False False False False False False False False False False False Night Night Day Day
4 A-5 3 2016-02-08 07:53:43 2016-02-08 13:53:43 39.172393 -84.492792 39.170476 -84.501798 0.500 At Mitchell Ave/Exit 6 - Accident. NaN I-75 S R Cincinnati Hamilton OH 45217 US US/Eastern KLUK 2016-02-08 07:53:00 37.0 29.8 93.0 29.69 10.0 WSW 10.4 0.01 Light Rain False False False False False False False False False False False False False Day Day Day Day
In [5]:
us_original.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2845342 entries, 0 to 2845341
Data columns (total 47 columns):
 #   Column                 Dtype  
---  ------                 -----  
 0   ID                     object 
 1   Severity               int64  
 2   Start_Time             object 
 3   End_Time               object 
 4   Start_Lat              float64
 5   Start_Lng              float64
 6   End_Lat                float64
 7   End_Lng                float64
 8   Distance(mi)           float64
 9   Description            object 
 10  Number                 float64
 11  Street                 object 
 12  Side                   object 
 13  City                   object 
 14  County                 object 
 15  State                  object 
 16  Zipcode                object 
 17  Country                object 
 18  Timezone               object 
 19  Airport_Code           object 
 20  Weather_Timestamp      object 
 21  Temperature(F)         float64
 22  Wind_Chill(F)          float64
 23  Humidity(%)            float64
 24  Pressure(in)           float64
 25  Visibility(mi)         float64
 26  Wind_Direction         object 
 27  Wind_Speed(mph)        float64
 28  Precipitation(in)      float64
 29  Weather_Condition      object 
 30  Amenity                bool   
 31  Bump                   bool   
 32  Crossing               bool   
 33  Give_Way               bool   
 34  Junction               bool   
 35  No_Exit                bool   
 36  Railway                bool   
 37  Roundabout             bool   
 38  Station                bool   
 39  Stop                   bool   
 40  Traffic_Calming        bool   
 41  Traffic_Signal         bool   
 42  Turning_Loop           bool   
 43  Sunrise_Sunset         object 
 44  Civil_Twilight         object 
 45  Nautical_Twilight      object 
 46  Astronomical_Twilight  object 
dtypes: bool(13), float64(13), int64(1), object(20)
memory usage: 773.4+ MB
In [6]:
us_original.columns
Out[6]:
Index(['ID', 'Severity', 'Start_Time', 'End_Time', 'Start_Lat', 'Start_Lng',
       'End_Lat', 'End_Lng', 'Distance(mi)', 'Description', 'Number', 'Street',
       'Side', 'City', 'County', 'State', 'Zipcode', 'Country', 'Timezone',
       'Airport_Code', 'Weather_Timestamp', 'Temperature(F)', 'Wind_Chill(F)',
       'Humidity(%)', 'Pressure(in)', 'Visibility(mi)', 'Wind_Direction',
       'Wind_Speed(mph)', 'Precipitation(in)', 'Weather_Condition', 'Amenity',
       'Bump', 'Crossing', 'Give_Way', 'Junction', 'No_Exit', 'Railway',
       'Roundabout', 'Station', 'Stop', 'Traffic_Calming', 'Traffic_Signal',
       'Turning_Loop', 'Sunrise_Sunset', 'Civil_Twilight', 'Nautical_Twilight',
       'Astronomical_Twilight'],
      dtype='object')
In [7]:
us_original.isna().sum()
Out[7]:
ID                             0
Severity                       0
Start_Time                     0
End_Time                       0
Start_Lat                      0
Start_Lng                      0
End_Lat                        0
End_Lng                        0
Distance(mi)                   0
Description                    0
Number                   1743911
Street                         2
Side                           0
City                         137
County                         0
State                          0
Zipcode                     1319
Country                        0
Timezone                    3659
Airport_Code                9549
Weather_Timestamp          50736
Temperature(F)             69274
Wind_Chill(F)             469643
Humidity(%)                73092
Pressure(in)               59200
Visibility(mi)             70546
Wind_Direction             73775
Wind_Speed(mph)           157944
Precipitation(in)         549458
Weather_Condition          70636
Amenity                        0
Bump                           0
Crossing                       0
Give_Way                       0
Junction                       0
No_Exit                        0
Railway                        0
Roundabout                     0
Station                        0
Stop                           0
Traffic_Calming                0
Traffic_Signal                 0
Turning_Loop                   0
Sunrise_Sunset              2867
Civil_Twilight              2867
Nautical_Twilight           2867
Astronomical_Twilight       2867
dtype: int64
In [8]:
us_new = us_original[['ID', 'Severity', 'Start_Time', 'Start_Lat', 'Start_Lng', 'Side', 'City', 'County', 'State', 'Timezone']]
us_new = us_new.rename(columns={"Start_Lat":"Latitude","Start_Lng":"Longitude","Start_Time":"Event_Time"})
us_new.head()
Out[8]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone
0 A-1 3 2016-02-08 00:37:08 40.108910 -83.092860 R Dublin Franklin OH US/Eastern
1 A-2 2 2016-02-08 05:56:20 39.865420 -84.062800 R Dayton Montgomery OH US/Eastern
2 A-3 2 2016-02-08 06:15:39 39.102660 -84.524680 R Cincinnati Hamilton OH US/Eastern
3 A-4 2 2016-02-08 06:51:45 41.062130 -81.537840 R Akron Summit OH US/Eastern
4 A-5 3 2016-02-08 07:53:43 39.172393 -84.492792 R Cincinnati Hamilton OH US/Eastern
In [9]:
us_new["Event_Time"] = pd.to_datetime(us_new["Event_Time"])
In [10]:
us_new.shape
Out[10]:
(2845342, 10)
In [11]:
# Parodo, kad daugiau nei 50% duomenų sudaro iš 2021 metų laikotarpio.
us_new["Event_Time"].dt.year.value_counts(normalize=True)
Out[11]:
2021    0.531305
2020    0.219961
2019    0.090891
2017    0.057609
2018    0.057348
2016    0.042886
Name: Event_Time, dtype: float64
In [12]:
us2021 = us_new[(us_new["Event_Time"] >= "2021-01-01") & (us_new["Event_Time"] < "2022-01-01")]
In [13]:
us2021.isna().sum()
Out[13]:
ID               0
Severity         0
Event_Time       0
Latitude         0
Longitude        0
Side             0
City            61
County           0
State            0
Timezone      1588
dtype: int64
In [14]:
us2021.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1511745 entries, 224945 to 2069159
Data columns (total 10 columns):
 #   Column      Non-Null Count    Dtype         
---  ------      --------------    -----         
 0   ID          1511745 non-null  object        
 1   Severity    1511745 non-null  int64         
 2   Event_Time  1511745 non-null  datetime64[ns]
 3   Latitude    1511745 non-null  float64       
 4   Longitude   1511745 non-null  float64       
 5   Side        1511745 non-null  object        
 6   City        1511684 non-null  object        
 7   County      1511745 non-null  object        
 8   State       1511745 non-null  object        
 9   Timezone    1510157 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(6)
memory usage: 126.9+ MB
In [15]:
#Koordinates pakeičiama į str tipą, kad vėliau būtų galima naudoti "location" funkcijoje.
us2021["Latitude"] = us2021["Latitude"].astype(str)
us2021["Longitude"] = us2021["Longitude"].astype(str)
us2021.info()
C:\Users\asein\AppData\Local\Temp/ipykernel_3516/1686928660.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us2021["Latitude"] = us2021["Latitude"].astype(str)
C:\Users\asein\AppData\Local\Temp/ipykernel_3516/1686928660.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us2021["Longitude"] = us2021["Longitude"].astype(str)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1511745 entries, 224945 to 2069159
Data columns (total 10 columns):
 #   Column      Non-Null Count    Dtype         
---  ------      --------------    -----         
 0   ID          1511745 non-null  object        
 1   Severity    1511745 non-null  int64         
 2   Event_Time  1511745 non-null  datetime64[ns]
 3   Latitude    1511745 non-null  object        
 4   Longitude   1511745 non-null  object        
 5   Side        1511745 non-null  object        
 6   City        1511684 non-null  object        
 7   County      1511745 non-null  object        
 8   State       1511745 non-null  object        
 9   Timezone    1510157 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(8)
memory usage: 126.9+ MB
In [16]:
pip install geopy
Requirement already satisfied: geopy in c:\users\asein\anaconda3\lib\site-packages (2.2.0)
Requirement already satisfied: geographiclib<2,>=1.49 in c:\users\asein\anaconda3\lib\site-packages (from geopy) (1.52)
Note: you may need to restart the kernel to use updated packages.
In [17]:
#Pagal koordinates ištraukiama mietos pavadinimas, jei neranda gražina Nan.
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="geoapiExercises")

def location(Latitude, Longitude):
    location = geolocator.reverse(f'{Latitude},{Longitude}')
    address = location.raw['address']
    town = address.get('town', '')
    city = address.get('city', '')
    if city != '': 
        result = city 
    elif town != '': 
        result = town
    else: 
        result = np.nan
    return result
In [18]:
#Susikuriamas naujas dataframe kur nėra miestų
us2021_city_na = us2021[us2021["City"].isna()]
In [19]:
us2021.dropna(inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\util\_decorators.py:311: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
In [20]:
us2021_city_na["New_City"] = us2021_city_na.apply(lambda x:location(x["Latitude"],x['Longitude']), axis=1)
C:\Users\asein\AppData\Local\Temp/ipykernel_3516/2543745419.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us2021_city_na["New_City"] = us2021_city_na.apply(lambda x:location(x["Latitude"],x['Longitude']), axis=1)
In [21]:
us2021_city_na["City"].fillna(us2021_city_na["New_City"], inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\core\generic.py:6392: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)
In [22]:
us2021_city_na.drop('New_City', axis=1, inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\core\frame.py:4906: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
In [23]:
us2021_city_na.dropna(inplace=True)
In [24]:
us2021 = us2021.append(us2021_city_na)
In [25]:
us2021
Out[25]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone
224945 A-224946 2 2021-03-10 19:57:00 42.382359 -71.022318 R East Boston Suffolk MA US/Eastern
224946 A-224947 2 2021-07-30 23:37:00 40.908676 -123.70711599999998 L Salyer Humboldt CA US/Pacific
224947 A-224948 2 2021-10-15 16:42:36 38.97062800000001 -77.116333 R Bethesda Montgomery MD US/Eastern
224948 A-224949 2 2021-12-21 11:42:00 47.642651 -122.31859 R Seattle King WA US/Pacific
224949 A-224950 2 2021-12-09 08:51:00 37.419536 -122.09064 R Mountain View Santa Clara CA US/Pacific
... ... ... ... ... ... ... ... ... ... ...
1501812 A-1501813 2 2021-05-07 05:33:30 38.912285 -76.934168 R Washington District of Columbia DC US/Eastern
1521239 A-1521240 2 2021-02-10 13:30:00 38.912285 -76.934168 R Washington District of Columbia DC US/Eastern
1588891 A-1588892 2 2021-01-11 16:09:00 39.72126 -77.735338 R Antrim Township Washington MD US/Eastern
1598427 A-1598428 2 2021-02-18 00:31:52 38.912285 -76.934168 R Washington District of Columbia DC US/Eastern
1617075 A-1617076 2 2021-02-04 13:34:00 38.912285 -76.934168 R Washington District of Columbia DC US/Eastern

1510143 rows × 10 columns

In [26]:
#Konvertuoja US laika į CET
def Timezone_to_CET(row):
    if row["Timezone"] == "US/Eastern":
        return row["Event_Time"] - pd.DateOffset(hours=6)
    elif row["Timezone"] == "US/Pacific":
        return row["Event_Time"] - pd.DateOffset(hours=9)
    elif row["Timezone"] == "US/Central":
        return row["Event_Time"] - pd.DateOffset(hours=7)
    elif row["Timezone"] == "US/Mountain":
        return row["Event_Time"] - pd.DateOffset(hours=9)
    else:
        return np.nan
In [27]:
us2021 = us2021.assign(CET_Time=us2021.apply(Timezone_to_CET, axis=1))
In [28]:
us2021.head(5)
Out[28]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone CET_Time
224945 A-224946 2 2021-03-10 19:57:00 42.382359 -71.022318 R East Boston Suffolk MA US/Eastern 2021-03-10 13:57:00
224946 A-224947 2 2021-07-30 23:37:00 40.908676 -123.70711599999998 L Salyer Humboldt CA US/Pacific 2021-07-30 14:37:00
224947 A-224948 2 2021-10-15 16:42:36 38.97062800000001 -77.116333 R Bethesda Montgomery MD US/Eastern 2021-10-15 10:42:36
224948 A-224949 2 2021-12-21 11:42:00 47.642651 -122.31859 R Seattle King WA US/Pacific 2021-12-21 02:42:00
224949 A-224950 2 2021-12-09 08:51:00 37.419536 -122.09064 R Mountain View Santa Clara CA US/Pacific 2021-12-08 23:51:00

Moon_data valymas¶

In [29]:
import mysql.connector

mydb = mysql.connector.connect(
        host="127.0.0.1",
        port="3306",
        user="root",
        password="root"
)
cursor = mydb.cursor()
cursor.execute("USE my_db") 
cursor.execute("SHOW TABLES") 
for i in cursor:
    print(i)
('full_moon',)
('moon_data',)
('us_accidents',)
In [30]:
moon = pd.read_sql("SELECT * FROM my_db.moon_data", con=mydb)
In [31]:
moon
Out[31]:
Date Time
0 15 January 1900 08:07:30 pm
1 14 February 1900 02:50:12 pm
2 16 March 1900 09:11:48 am
3 15 April 1900 02:02:06 am
4 14 May 1900 04:36:36 pm
... ... ...
1863 1 September 2050 10:30:54 am
1864 30 September 2050 06:31:48 pm
1865 30 October 2050 04:16:00 am
1866 28 November 2050 04:09:48 pm
1867 28 December 2050 06:15:36 am

1868 rows × 2 columns

In [32]:
moon.columns = moon.columns.str.strip()
In [33]:
moon["Date_Time"] = moon["Date"] + moon["Time"]
In [34]:
moon.head(5)
Out[34]:
Date Time Date_Time
0 15 January 1900 08:07:30 pm 15 January 1900 08:07:30 pm
1 14 February 1900 02:50:12 pm 14 February 1900 02:50:12 pm
2 16 March 1900 09:11:48 am 16 March 1900 09:11:48 am
3 15 April 1900 02:02:06 am 15 April 1900 02:02:06 am
4 14 May 1900 04:36:36 pm 14 May 1900 04:36:36 pm
In [35]:
moon[moon['Date_Time'].str.contains("]")].count()
Out[35]:
Date         23
Time         23
Date_Time    23
dtype: int64
In [36]:
moon["Date_Time"] = moon["Date_Time"].str.split("[").str[0].str.replace("m ", "m")
In [37]:
moon["Date_Time"] = pd.to_datetime(moon['Date_Time'])
In [38]:
moon.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1868 entries, 0 to 1867
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       1868 non-null   object        
 1   Time       1868 non-null   object        
 2   Date_Time  1868 non-null   datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 43.9+ KB

Apjungimas duomenų¶

In [39]:
us2021["Full_moon_CET"] = us2021["CET_Time"].dt.date.isin(moon["Date_Time"].dt.date)
In [40]:
# Duomenys PowerBI vizualizacijai
us2021.to_csv("PowerBI_us_2021_data.csv")
In [41]:
us2021.head(5)
Out[41]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone CET_Time Full_moon_CET
224945 A-224946 2 2021-03-10 19:57:00 42.382359 -71.022318 R East Boston Suffolk MA US/Eastern 2021-03-10 13:57:00 False
224946 A-224947 2 2021-07-30 23:37:00 40.908676 -123.70711599999998 L Salyer Humboldt CA US/Pacific 2021-07-30 14:37:00 False
224947 A-224948 2 2021-10-15 16:42:36 38.97062800000001 -77.116333 R Bethesda Montgomery MD US/Eastern 2021-10-15 10:42:36 False
224948 A-224949 2 2021-12-21 11:42:00 47.642651 -122.31859 R Seattle King WA US/Pacific 2021-12-21 02:42:00 False
224949 A-224950 2 2021-12-09 08:51:00 37.419536 -122.09064 R Mountain View Santa Clara CA US/Pacific 2021-12-08 23:51:00 False
In [42]:
us2021['Full_moon_CET'].value_counts(normalize=True)
Out[42]:
False    0.966549
True     0.033451
Name: Full_moon_CET, dtype: float64
In [43]:
# Išskiriami įvykiai pilanties metu
us_2021_FullMoon = us2021[us2021["Full_moon_CET"]]
In [44]:
us_2021_FullMoon.head(5)
Out[44]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone CET_Time Full_moon_CET
225093 A-225094 2 2021-12-19 15:15:00 38.555563 -121.441143 R Sacramento Sacramento CA US/Pacific 2021-12-19 06:15:00 True
225145 A-225146 2 2021-10-21 06:06:00 34.076742 -118.230768 R Los Angeles Los Angeles CA US/Pacific 2021-10-20 21:06:00 True
225159 A-225160 2 2021-12-19 17:22:00 39.275191 -121.015845 R Nevada City Nevada CA US/Pacific 2021-12-19 08:22:00 True
225179 A-225180 2 2021-07-24 21:47:00 28.449998 -81.469499 R Orlando Orange FL US/Eastern 2021-07-24 15:47:00 True
225230 A-225231 2 2021-11-19 19:23:07 44.495381 -93.870447 R Le Sueur Le Sueur MN US/Central 2021-11-19 12:23:07 True
In [45]:
# Išskiriami įvykiai ne pilnaties metu
us_2021_NonFullMoon = us2021[us2021["Full_moon_CET"] == False]
In [46]:
us_2021_NonFullMoon.head(5)
Out[46]:
ID Severity Event_Time Latitude Longitude Side City County State Timezone CET_Time Full_moon_CET
224945 A-224946 2 2021-03-10 19:57:00 42.382359 -71.022318 R East Boston Suffolk MA US/Eastern 2021-03-10 13:57:00 False
224946 A-224947 2 2021-07-30 23:37:00 40.908676 -123.70711599999998 L Salyer Humboldt CA US/Pacific 2021-07-30 14:37:00 False
224947 A-224948 2 2021-10-15 16:42:36 38.97062800000001 -77.116333 R Bethesda Montgomery MD US/Eastern 2021-10-15 10:42:36 False
224948 A-224949 2 2021-12-21 11:42:00 47.642651 -122.31859 R Seattle King WA US/Pacific 2021-12-21 02:42:00 False
224949 A-224950 2 2021-12-09 08:51:00 37.419536 -122.09064 R Mountain View Santa Clara CA US/Pacific 2021-12-08 23:51:00 False

Analizė¶

In [47]:
import seaborn as sns
In [48]:
# Bus naudojami grafikų X ašiai
Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
Days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
In [49]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))

sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.day_name().str[:3], data=us_2021_NonFullMoon, order=Days, ax=ax1, color = '#5FD4EF')
ax1.set(ylabel=None, xlabel=None)
ax1.set_title("Accidents by Weekday during Non Full Moon")


sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.day_name().str[:3], data=us_2021_FullMoon, order=Days, ax=ax2, color= '#0A536A') 
ax2.set(ylabel=None, xlabel=None)
ax2.set_title("Accidents by Weekday during Full Moon")
Out[49]:
Text(0.5, 1.0, 'Accidents by Weekday during Full Moon')

Tiek pilnaties, tiek nepilnaties metu daugiausiai įvykių penktadieniais. Galima teigti, kad penktadienis pavojingiausia diena vairuotojams.

In [50]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))

sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.month_name().str[:3], data=us_2021_NonFullMoon, order=Months, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel=None)
ax1.set_title("Accidents by Month during Non Full Moon")


sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.month_name().str[:3], data=us_2021_FullMoon, order=Months, ax=ax2,color= '#0A536A') 
ax2.set(ylabel=None, xlabel=None)
ax2.set_title("Accidents by Month during Full Moon")
Out[50]:
Text(0.5, 1.0, 'Accidents by Month during Full Moon')

Bendrai įvykių kiekis išauga žiemos sezonu. Pilnaties metu ypač išsiskiria lapkritis.

In [51]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))

sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.hour, data=us_2021_NonFullMoon, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel="Hours")
ax1.set_title("Accidents by Hour during Non Full Moon")


sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.hour, data=us_2021_FullMoon, ax=ax2,color= '#0A536A') 
ax2.set(ylabel=None, xlabel="Hours")
ax2.set_title("Accidents by Hour during Full Moon")
Out[51]:
Text(0.5, 1.0, 'Accidents by Hour during Full Moon')

Žiūrint į valandinį pasiskirystymą matosi beveik įdentiškas išsidėstymas, dominuoja darbo pradžios ir pabaigos valandos, tai įrodo eismo piką, nepaisant ar pilnatis ar ne.

In [52]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))

sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.day, data=us_2021_NonFullMoon, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel="Hours")
ax1.set_title("Accidents by day during Non Full Moon")


sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.day, data=us_2021_FullMoon, ax=ax2,color= '#0A536A') 
ax2.set(ylabel=None, xlabel="Hours")
ax2.set_title("Accidents by day during Full Moon")
Out[52]:
Text(0.5, 1.0, 'Accidents by day during Full Moon')

Dienų paskirstymas per daug nieko nesakantis, nėra kažkokių akivaizdžių išsišokimų. Pilnaties dienos parodo, 2021 metais pilnas būdavo mėnesio gale.

In [53]:
Accicent_per_day_nonFullMoon =us_2021_NonFullMoon.groupby([us_2021_NonFullMoon["Event_Time"].dt.month,us_2021_NonFullMoon["Event_Time"].dt.day])["ID"].count()
Accicent_per_day_nonFullMoon = Accicent_per_day_nonFullMoon.groupby("Event_Time")
In [54]:
Accicent_per_day_FullMoon = us_2021_FullMoon.groupby([us_2021_FullMoon["Event_Time"].dt.month,us_2021_FullMoon["Event_Time"].dt.day])["ID"].count()
Accicent_per_day_FullMoon = Accicent_per_day_FullMoon.groupby("Event_Time")
In [55]:
plt.figure(figsize=(10, 5))
plt.style.use('default')
X_axis = np.arange(len(Months))
plt.bar(X_axis - 0.2, Accicent_per_day_FullMoon.sum(), width = 0.4, label = "Full moon",color = '#0A536A')
plt.bar(X_axis + 0.2, Accicent_per_day_nonFullMoon.mean(), width = 0.4, label = "Non full moon",color= '#5FD4EF')
plt.xticks(X_axis, Months)
plt.legend()
plt.title("Average number of accidents per day")
Out[55]:
Text(0.5, 1.0, 'Average number of accidents per day')

Net 8 mėnesiai, kai pilnaties dieną įvykusių įvykių skaičius lenkia likusių mėnesio dienų vidurkį. Ypač išsiskiria lapkritis.

In [56]:
plt.figure(figsize=(10, 5))
#plt.style.use("fivethirtyeight")
X_axis = np.arange(len(Months))
plt.bar(X_axis - 0.2, Accicent_per_day_FullMoon.sum(), width = 0.4, label = "Full moon",color = '#0A536A')
plt.bar(X_axis + 0.2, Accicent_per_day_nonFullMoon.max(), width = 0.4, label = "Non full moon",color= '#5FD4EF')
plt.xticks(X_axis, Months)
plt.legend()
plt.title("Max number of accidents per day in month")
Out[56]:
Text(0.5, 1.0, 'Max number of accidents per day in month')

Žiurint į daugiausiai dienos įvykių per mėnesį, vaizdas visai kitas, nei su mėnesių vidurkiais. Lapkritį per pilnatį buvo daugiau įvykių.

In [57]:
# Metinis dienos įvykių vidurkis ne per pilnatį
Yearly_nonFullMoonAVG = us_2021_NonFullMoon.groupby([us_2021_NonFullMoon["Event_Time"].dt.month,us_2021_NonFullMoon["Event_Time"].dt.day])["ID"].count().mean()
Yearly_nonFullMoonAVG
Out[57]:
4021.0110192837465
In [58]:
# Metinis dienos įvykių vidurkis per pilnatį
Yearly_FullMoonAVG = Accicent_per_day_FullMoon.sum().mean()
Yearly_FullMoonAVG
Out[58]:
4209.666666666667
In [59]:
# 4,7% daugiau įvykių įvyksta pilnaties metu.
(Yearly_FullMoonAVG/Yearly_nonFullMoonAVG-1)*100
Out[59]:
4.691746590053492
In [60]:
us_2021_FullMoon["Severity"].value_counts(normalize=True)
Out[60]:
2    0.983965
4    0.016035
Name: Severity, dtype: float64
In [61]:
us_2021_NonFullMoon["Severity"].value_counts(normalize=True)
Out[61]:
2    0.98411
4    0.01589
Name: Severity, dtype: float64

Išvada¶

Buvo tikrinama ar pilnatis turi įtakos eismo įvykiams Jungitnėse valstijose. Nors pilnaties metu 2021 metais įvyko 4,7% daugiau eismo įvykių lyginant su kitomis dienomis, sunku įrodyti ar tai tikrai pilnaties efektas, reiktų daugiau faktorių tikrinti ar lokacijų. Kadangi mėnulio fazė buvo europos laiku, konvertuojant dalis įvykių per pilnatį pasiskirtė per kelias dienas, tai galimai galėjo paveiktį galutinį rezultatą.

Visgi hipotezė nei paneigta, nei įrodyta.