This page is to represent the work related to assignments for course Social Data Analysis and Visualization.

These exercises has been performed by DTU students:

Project Assignment A

Here we have a small introduction video, that shows the idea behind of data analysis.

Project Assigment B

As explained in the video above, the following page will look into vehicle collisions in NYC. Additionally, we will try to explore where, how and why collisions happen.

One of our hypothesis is that when; the weather in NYC is bad (e.g. Snowy), there are more collisions. We will investigate if this actually is true, or not. Additionally, we want to investigate if there is any particular places in NYC are affected by bad weather compared to others.

The Cold Hard Truth (Data)

Our main dataset is the NYPD Motor Vehicle Collisions. It contains records for every reported incident in the NYC area. Records are available from July 2012 till today. For each record there is also specified, where the collision took place, the cause of the collision, injuries, fatalities and more.

The dataset consists of more than;

769054 collision records

The secondary dataset used is an extraction of weather conditions for the NYC area. This was pulled from Weather Underground using their csv service for hourly updates. For each hour, there are measurements of the temperature, visibility, windspeed as well as the overall weather conditions such as Rain, Snow, Clear etc. You can download our scrape from the service here.

This means that we have more than;

32448 hours of weather data

For the analysis the two datasets were combined. The combined dataset can be downloaded here.

Explainer Notebook

If you want to know, how we extracted all of the information from the two datasets. You can take a look backstage, in our explainer notebook.

Explore Your Daily Commute

NYC is a big city. A lot of traffic means a high chance for a collision to occur. Over 500 incidents are reported everyday. Therefore, we are providing an opportunity for you to discover the amount of vehicle collisions that has happened throughout the city in the period 2012-2015. See if you can find the safer commute. Go discover!

The size of each point indicates the accumulated amount of incidents that have happened for that intersection. The color red shows if there is any fatalities. Click on a point to get the numeric values.

Note that all intersections with less that 10 collisions and without fatalities have been omitted to increase performance.

A Closer Look

This is all good. Looking at the worst offenders regarding the number of collisions, can we find out why there are worse than the others? Is there any underlying reason for this?

Let's begin by locating the top 10 intersections with the most collisions (Click a bar to view the intersections).

By hovering over each bar in the top 10 bar chart. We can find that the intersection Tillary Street / Flatbush Avenue Extension is the most dangerous intesection - by some margin - based on the 585 collisions happening.

For now, we can only guess, but clicking on the Tillary Street & Flatbush Avenue Extension bar, or looking below one could assume that the intersection is the perfect storm. The intersection is right before a highway access/exit ramp and extends into Manhattan by the Manhattan Bridge.

TILLARY STREET & FLATBUSH AVENUE EXTENSION

Flatbush Ave Flatbush Ave

What we see is in the top 10. Is that most of the intersections have a few things in common; they are next to a highway access/exit ramp or one block from one.

This means that there are a right thouht put of cars, every day. Resulting in the high amount of car collisions.

Traffic and Weather in New York City

We all assume that weather has a big impact on the traffic anywhere. Collisions are far more likely to happen in bad weather. So what is bad weather really? Is there any condition that is whose than others? And if so can we found out which so you can be more aware in traffic during one of these?

By combining the Motor Vehicle Collisions dataset with weather data from Weather Underground we are able to find out which weather condition on average produces the most collisions per hour. The data has been normalized with the most common weather condition, Mostly Cloudy. However you can change this yourself in the below dropdown, to better understand our data.

From the visualization we see a clear uptick in the collision frequency rainy and snowy conditions. Up to 60%, more collisions compared to Mostly Cloudy conditions. In completely Clear weather seems to hover around 40% less. It does however also raises some questions. Why are the conditions Blowing Snow and Heavy Snow so far down the list?

In early 2015 and 2016, NYC experienced two blizzards. In both cases, NYPD issued Travel Bans meaning that only emergency vehicles were allowed on the roads. This could be a tell in why we in the very bad weather we do not see the uptick in collision frequency in very bad weather.

At 11pm tonight, streets will only be available to emergency vehicles. #SafeNYC

— NYPD NEWS (@NYPDnews) January 26, 2015

After 2:30 p.m and you're on the road, we will arrest you @NYPDChiefofDept says

— NYPD NEWS (@NYPDnews) January 23, 2016

Okay, now that we have confirmed that the frequency of collisions is higher for same conditions relative to Mostly Cloudy. Can we find any causes in the Motor Vehicle Collisions dataset that might have been affected by the weather conditions?

There seems to be a lot of collisions caused by slippery pavement. Let's look at the weather conditions for those collisions and see if they match what we have already found.

As it is clear in the chart above. Accidents happening due to a slippery pavement, were as predicted indeed more frequent at light rain and snow. This shows that some accidents are sensitive to the weather. However this was the general data. There might be more information hidden here. For instance, is there some peticular intersections in NYC that are overly sensitive to this condition?

Weather Sensitive Intersections

If we look at the slippery pavement cause for collisions. We can find som interesting information. Looking at the frequency of accidents, we found perticular sensitive intersection.

Heatmap of slippery pavement accidents

As we see in the above heat map, the spread is very broad. This is in fact very similar to what the overall incident spread. However looking a the intersections, we can find some interesting discoveries.

As shown above we have some intersections, that are particular prone to accidents, with slippery pavements.

But why do people have accidents here? Let's take a look at the intersections thought the wonders of Street view:

RICHMOND HILL ROAD & OLD MILL ROAD

As we can see on the image above. The top intersection is just by the graveyard at the church of St. Andrew. This can be no of coincidence!

As it should be visible from the image, this intersection is down hill, and with a sharp turn. We belive that this is why there are so many accidents here. Looking further up the road, it is clear that the city council have tried to take some actions to increase the safety. However, this seems not to be sufficient. We would recommend that the speed limit was lowered.

HEATH AVENUE & WEST KINGSBRIDGE ROAD

The next intersection one the list, is located in the Bronx, not Staten Island. Here again, the oncoming traffic is lead down a steep hill. Leading into a small but significant turn. Here the driver is not able to see the light beforehand, as of this turn. And when visible he might be unable to stop as of the slope. Additionally, we can see that the pavement here is made of concrete, meaning that it is going to be extra slippery even in light rain or snow.

Clustering to Find Sensitive Intersections

Let's try to apply a different approach, for finding these sensitive intersections.

As intersections sometimes can be located very close together. How do you then determine what intersections, the collisions happened in? Therefore, we will apply a clustering method, in order locate small groups of intersections, that might be extra sensitive.

A perticular good method creating these things of clusters, is the Density-based spatial clustering of applications with noise (DBSCAN). It allows for clustering intersections by distrance, and remove noise (intersections with small amounts of collisions).

We have made this cluster analysis for all weather conditions. Below is a snippet of what we found based on the conditions:

The dendrogram shows these the clusters for each condition. As well as the conditions for each cluster. Try clicking an intersection (the points in the outer circle).

 

Intersections Sensitive to Heavy Rain

The three intersections on 3rd Ave, Manhattan seem to be very sensitive to heavy rainfall. Looking at the street view (clicking on the intersection node, choosing streetview), it is not hard to see why. Especially on 3rd Ave & E 58th St:

The intersection has a bunch of steel road plates embedded in the intersection and the streets leading to it. These plates would get very slippery in wet conditions and it could be the reason so many accidents happen there.

Intersections Sensitive to Snow

As we found with collisions caused by slippery pavement sloped roads did have an impact on the collision rate. This intersection Groove St & Hygeia Pl, Staten Island appears to be sensitive to snowy conditions:

Again, it should not be hard to come to a conclusion why this intersection is sensitive to snowy conditions. This just further confirms that roads sloped and/or curved roads leading to an intersection seem to produce more accidents under snowy conditions.

Predicting Accidents

Looking at the historical data is great as it can tell us a lot if interesting facts. However could we also use it to say something about the future?

Looking into the future is hard. However it is not completly impossible, thanks to machine learning. Therefore we have applied some machine learning techniques to our data. In order to try to predict, where collisions will happen. So given a hour of day, a weather condition as well as an approximated location. We can predict the specific postal area, with upto 35% precision.

This means, that if you know the current weather condition in NYC (just try to look outside), as well as the current hour (just look at the clock on your computer or phone), we can say where collisions might happen with a 35% precision.

By applying a K-Mean clustering and Random Forest Classification. As explained in detail in the explainer notebook. We were able to train our model, so that we can make these predictions, with such high accuracy.

Below, we have a visualization, the different clusters that we used in our analysis. Here you can see for yourself, how the different amount of clusters affect the precision of the model.

prediction score

Note that the shown shapes, are the convexes of clusters of collisions, hence the shape gaps in the visualization.

Summary