Introduction Being the second-largest city in the United States, Los Angeles is known for its heavy vehicle reliance, huge population density, and commuting culture. Downtown LA, where most commuters head to every single morning for work, has been a classic case study for urban planning for its complex outdated urban layout as well as high concentration of crimes. As thousands of commuters, residents, and visitors travel every day in this sophisticated neighborhood, street parking plays a crucial role in keeping the area’s ground transportation running. Therefore, being able to identify areas that are safe and accessible is essential for anyone who, for any reason, has to park in downtown LA. This study aims to assist drivers with a need for parking in streets of downtown LA by suggesting the drivers with the most optimal parking areas using the “Recommended Parking Score”. This score takes into consideration not only the safety of the streets, but also the price of available street parking meters. To achieve this, we need to take a comprehensive look at the neighborhood’s demographics, the trend of car-related crimes, as well as other the price distribution of parking meters in the area. In the very end, we hope to draw insights that might also help planners and policymakers fixing some of these problems we identified in our study.
File setup and data collection
The first step of this analysis comprises the essential tasks of loading necessary packages, configuring different APIs for data collection, and managing global environment settings.
Code
# Import packagesimport altair as altimport geopandas as gpdimport pandas as pdimport numpy as npimport hvplot.pandasimport pandas as pd#import seaborn as snsfrom matplotlib import pyplot as pltimport holoviews as hvfrom shapely.geometry import Polygonfrom shapely.geometry import MultiPolygonimport requestsimport geoviews as gvimport geoviews.tile_sources as gvtsimport foliumfrom folium import pluginsfrom shapely.geometry import Pointimport xyzservicesimport osmnx as oximport networkx as nximport pygrisimport cenpy%matplotlib inline# See lots of columnspd.options.display.max_rows =9999pd.options.display.max_colwidth =200# Hide warnings due to issue in shapely package # See: https://github.com/shapely/shapely/issues/1345np.seterr(invalid="ignore");
Data Wrangling
This step involves gathering data on parking for 2023 and preliminary data cleaning for the large dataset. All geospatial datasets are set to a uniform coordinate reference system, and boundary shapefiles are primed for use in OSM street network API.
import pandas as pdimport geopandas as gpdfrom shapely.geometry import Pointimport numpy as np# Suppress warnings for invalid operationsnp.seterr(invalid="ignore")# Load parking meter datameters = pd.read_csv('/Users/bin/Downloads/LADOT_Metered_Parking_Inventory___Policies_20241222.csv')# Remove parentheses and split LatLng into separate Latitude and Longitude columnsmeters['LatLng'] = meters['LatLng'].str.strip("()")meters[['LATITUDE', 'LONGITUDE']] = meters['LatLng'].str.split(', ', expand=True).astype(float)# Convert to GeoDataFramegeometry = [Point(xy) for xy inzip(meters['LONGITUDE'], meters['LATITUDE'])]meters = gpd.GeoDataFrame(meters, geometry=geometry)meters.crs ='EPSG:4326'# Reproject to Web Mercatormeters = meters.to_crs('EPSG:3857')# Check the first few rows of the GeoDataFrame# print(meters.head())
Parking meters in LA
Here is an overview of all parking meters in the city of Los Angeles and its adjacent cities. We observe the concentration of parking meters in central Los Angeles is significantly higher than its surrounding neighborhood. As shown by the map, most of the parking meters are located in downtown LA, followed by Fair Fax then Korean town where the demand of street parking is high. However, this map alone dose not provide enough information for us to draw useful conclusions in an urbanized area with such complexity as the distribution of parking meters might be determined by population, profitability, parking demand, planning, and policy making. Therefore, we need to look at the problem using additional lenses.
import foliumfrom folium.plugins import FastMarkerClusterimport xyzservices.providers# Extract LATITUDE and LONGITUDE columnscoords = meters[["LATITUDE", "LONGITUDE"]].values.tolist() # Convert to list of (lat, lon)# Create a map centered on Los Angeles with light modem = folium.Map( location=[34.05, -118.25], # Center on Los Angeles zoom_start=12, tiles=xyzservices.providers.CartoDB.Positron # Light mode tiles)# Add a FastMarkerCluster with the meter locationsFastMarkerCluster(data=coords).add_to(m)# Display the mapm
Make this Notebook Trusted to load map: File -> Trust Notebook
Open Street Map Data
To streamline the workflow with this large dataset, relevant OSM data is refined by excluding highways, where parking is not allowed. This ensures the dataset focuses solely on accessible areas with available parking spaces. A new graph is created and plotted to reflect only the non-highway streets.
import osmnx as ox# Define a smaller area or bounding boxcity_name ='Los Angeles, California, USA'# Retrieve the graph with simplification and largest component onlyG = ox.graph.graph_from_place(city_name, network_type='drive', simplify=True, retain_all=False)# Optional: Plot the graphox.plot_graph(G, bgcolor='k', node_color='w', node_size=5, edge_color='w', edge_linewidth=0.5)
Number of Parking Meters by Length per Street
The following map shows the number of parking meters by street segments in the city of Los Angeles with a focus on downtown LA area. Interestingly, even though we observe most of the parking meters by street in downtown, the segments with higher density seem to be dispersed across the area. In addition to downtown, other areas such as Hollywood, Beverly Hills, Culver City, and around Santa Monica. The closer they are from downtown LA, the more disperse they seem to be. There are also noticeable segments expanding northwest of the city of Los Angeles towards Calabasas and Burbank. Very few segments are present to the east and south of downtown LA.
# Filter out highways (e.g., motorways, trunk roads)non_highway_edges = [(u, v, key) for u, v, key, data in G.edges(keys=True, data=True) if'highway'notin data or'highway'in data and data['highway'] !='motorway']# Create a new graph with non-highway streetsG = G.edge_subgraph(non_highway_edges)# Plot the non-highway street network#ox.plot_graph(G, bgcolor='k', node_color='w', edge_color='w', node_size=5, edge_linewidth=0.5)
# Step 1: Convert the graph to a GeoDataFrame containing edgesla_edges = ox.graph_to_gdfs(G, edges=True, nodes=False)# Step 2: Project the graph to the EPSG:3857 coordinate reference systemG_projected = ox.project_graph(G, to_crs='EPSG:3857')# Step 3: Extract longitude and latitude coordinates from the 'meters' GeoDataFramex_coords = meters['geometry'].x # Longitude in metersy_coords = meters['geometry'].y # Latitude in meters# Step 4: Find the nearest edges for each parking meternearest_edges = ox.distance.nearest_edges(G_projected, X=x_coords, Y=y_coords)# Step 5: Create a DataFrame with edge identifiers and count occurrencesmeters_nodes = pd.DataFrame(nearest_edges, columns=['u', 'v', 'key'])meters_nodes['Count'] =1# Step 6: Group by edge identifiers and calculate total countsgrouped_counts = meters_nodes.groupby(['u', 'v'])['Count'].sum().reset_index()# Step 7: Merge edge counts with the edges GeoDataFramemerged_gdf = la_edges.merge(grouped_counts, on=['u', 'v'], how='left')# Step 8: Filter rows with non-zero countsmerged_gdf = merged_gdf[merged_gdf['Count'] >0]# Step 9: Drop unnecessary columns for cleaner datacolumns_to_remove = ['u', 'v', 'osmid', 'oneway', 'lanes', 'ref', 'maxspeed', 'reversed', 'access', 'bridge', 'junction', 'width', 'tunnel']merged_gdf = merged_gdf.drop(columns=columns_to_remove)# Step 10: Calculate the normalized count ('truecount')merged_gdf['truecount'] = merged_gdf['Count'] / merged_gdf['length']# Step 11: Filter out edges with lengths outside the range [10, 100]length_filter = (merged_gdf['length'] >=10) & (merged_gdf['length'] <=100)merged_gdf = merged_gdf[length_filter]
Make this Notebook Trusted to load map: File -> Trust Notebook
Parking Price in LA
Taking a closer look at the street parking prices in the city of Los Angeles, we observe the most expensive parking in downtown LA which can be as expensive as 6 times the price of adjacent areas. This makes total sense since downtown LA gathers the most high-income working-class commuters who demand parking the most, making the area most profitable.
import pandas as pdimport foliumfrom folium.plugins import MarkerClusterimport xyzservices.providers# Example data with RateRange addeddata = {"SpaceID": ["WW516", "CB3034", "BH398"],"MeteredTimeLimit": ["2HR", "2HR", "1HR"],"RateRange": ["$1.00", "$1.00 - $6.00", "$1.00"],"LATITUDE": [34.060385, 34.047109, 34.045795],"LONGITUDE": [-118.304103, -118.245841, -118.21555],}# Convert to DataFramemeters = pd.DataFrame(data)# Convert MeteredTimeLimit to numeric hours (e.g., "2HR" -> 2)meters['TimeLimit'] = meters['MeteredTimeLimit'].str.extract('(\d+)').astype(float)# Convert RateRange to numeric (use midpoint for ranges)def extract_rate(rate_range): rates = [float(r.replace("$", "")) for r in rate_range.split(" - ")]returnsum(rates) /len(rates) # Use midpointmeters['RateValue'] = meters['RateRange'].apply(extract_rate)# Create a map centered on Los Angeles with light modem = folium.Map( location=[34.05, -118.25], # Center on Los Angeles zoom_start=12, tiles=xyzservices.providers.CartoDB.Positron # Light mode tiles)# Add CircleMarkers to the map with size based on RateValuefor _, row in meters.iterrows(): folium.CircleMarker( location=[row['LATITUDE'], row['LONGITUDE']], radius=row['RateValue'] *5, # Scale the size color="blue", fill=True, fill_opacity=0.6, popup=f"Metered Time Limit: {row['MeteredTimeLimit']}<br>Rate Range: {row['RateRange']}<br>Rate Value: ${row['RateValue']:.2f}" ).add_to(m)# Display the mapm
Make this Notebook Trusted to load map: File -> Trust Notebook
# parking meters heated mapimport foliumfrom folium.plugins import HeatMapimport pandas as pd# Sample data as a DataFrame (replace with your actual data)data = {"SpaceID": ["WW516", "CB3034", "BH398", "UC8", "CB2345"],"LATITUDE": [34.060385, 34.047109, 34.045795, 34.136733, 34.030958],"LONGITUDE": [-118.304103, -118.245841, -118.215550, -118.363025, -118.255362],}meters = pd.DataFrame(data)# Extract coordinates for the heatmapcoordinates = meters[["LATITUDE", "LONGITUDE"]].values.tolist()# Create a map centered on Los Angeles with a light mode basemapm = folium.Map(location=[34.05, -118.25], zoom_start=12, tiles="CartoDB Positron")# Add heatmap layerHeatMap(coordinates).add_to(m)# Display the mapm
Make this Notebook Trusted to load map: File -> Trust Notebook
import pandas as pd# API URL for the CSV dataurl ="https://data.lacity.org/resource/2nrs-mtv8.csv"# Read the CSV file from the APIcrime_data = pd.read_csv(url)crime_data.head()
dr_no
date_rptd
date_occ
time_occ
area
area_name
rpt_dist_no
part_1_2
crm_cd
crm_cd_desc
...
status
status_desc
crm_cd_1
crm_cd_2
crm_cd_3
crm_cd_4
location
cross_street
lat
lon
0
190326475
2020-03-01T00:00:00.000
2020-03-01T00:00:00.000
2130
7
Wilshire
784
1
510
VEHICLE - STOLEN
...
AA
Adult Arrest
510
998.0
NaN
NaN
1900 S LONGWOOD AV
NaN
34.0375
-118.3506
1
200106753
2020-02-09T00:00:00.000
2020-02-08T00:00:00.000
1800
1
Central
182
1
330
BURGLARY FROM VEHICLE
...
IC
Invest Cont
330
998.0
NaN
NaN
1000 S FLOWER ST
NaN
34.0444
-118.2628
2
200320258
2020-11-11T00:00:00.000
2020-11-04T00:00:00.000
1700
3
Southwest
356
1
480
BIKE - STOLEN
...
IC
Invest Cont
480
NaN
NaN
NaN
1400 W 37TH ST
NaN
34.0210
-118.3002
3
200907217
2023-05-10T00:00:00.000
2020-03-10T00:00:00.000
2037
9
Van Nuys
964
1
343
SHOPLIFTING-GRAND THEFT ($950.01 & OVER)
...
IC
Invest Cont
343
NaN
NaN
NaN
14000 RIVERSIDE DR
NaN
34.1576
-118.4387
4
200412582
2020-09-09T00:00:00.000
2020-09-09T00:00:00.000
630
4
Hollenbeck
413
1
510
VEHICLE - STOLEN
...
IC
Invest Cont
510
NaN
NaN
NaN
200 E AVENUE 28
NaN
34.0820
-118.2130
5 rows × 28 columns
Vehicle Stolen Crimes Point
In this section, we will be analyzing car-related crimes in the city of Los Angeles, specifically examining vehicle stealing incidents. According to the 2020 crime data, most cars are stolen from central Los Angeles in addition to relatively smaller clusters around San Fernando and Long Beach.
# vehicle related crime mapimport pandas as pdimport folium# API URL for the CSV dataurl ="https://data.lacity.org/resource/2nrs-mtv8.csv"# Read the CSV file from the APIcrime_data = pd.read_csv(url)# Filter for "VEHICLE - STOLEN" casesvehicle_stolen_data = crime_data[crime_data['crm_cd_desc'] =="VEHICLE - STOLEN"]# Check for valid lat/lon data and drop rows with missing coordinatesvehicle_stolen_data = vehicle_stolen_data.dropna(subset=['lat', 'lon'])# Convert lat/lon to numeric (in case they are read as strings)vehicle_stolen_data['lat'] = pd.to_numeric(vehicle_stolen_data['lat'])vehicle_stolen_data['lon'] = pd.to_numeric(vehicle_stolen_data['lon'])# Create a map centered on Los Angeles with a light mode basemapm = folium.Map(location=[34.05, -118.25], zoom_start=12, tiles="CartoDB Positron")# Add markers for each "VEHICLE - STOLEN" casefor _, row in vehicle_stolen_data.iterrows(): folium.Marker( location=[row['lat'], row['lon']], popup=f"Location: {row['location']}<br>Date: {row['date_occ']}" ).add_to(m)# Display the mapm
Make this Notebook Trusted to load map: File -> Trust Notebook
Vehicle Stolen Heat Map
This trend can be better represented by a heatmap that captures the location where a car was stolen. For the purpose of this study, we will be looking at the city of Los Angeles in particular. Shown by the map, the highest number of vehicles stolen incidents are located right around downtown LA and expands towards its surrounding neighborhoods such as Culver City, Inglewood, and parts of Pasadena. With downtown LA being the origin of the heat, it is reasonable for us to draw a conclusion that downtown LA demands the most attention and is considered the most dangerous when it comes to vehicle-related crimes.
import pandas as pdimport foliumfrom folium.plugins import HeatMap# API URL for the CSV dataurl ="https://data.lacity.org/resource/2nrs-mtv8.csv"# Read the CSV file from the APIcrime_data = pd.read_csv(url)# Filter for "VEHICLE - STOLEN" casesvehicle_stolen_data = crime_data[crime_data['crm_cd_desc'] =="VEHICLE - STOLEN"]# Drop rows with missing coordinatesvehicle_stolen_data = vehicle_stolen_data.dropna(subset=['lat', 'lon'])# Convert lat/lon to numericvehicle_stolen_data['lat'] = pd.to_numeric(vehicle_stolen_data['lat'])vehicle_stolen_data['lon'] = pd.to_numeric(vehicle_stolen_data['lon'])# Extract latitude and longitude as a list of [lat, lon]heat_data = vehicle_stolen_data[['lat', 'lon']].values.tolist()# Create a map centered on Los Angeles with a light mode basemapm = folium.Map(location=[34.05, -118.25], zoom_start=12, tiles="CartoDB Positron")# Add the heatmap layerHeatMap(heat_data, radius=10).add_to(m)# Display the mapm
Make this Notebook Trusted to load map: File -> Trust Notebook
# Import packagesimport altair as altimport geopandas as gpdimport pandas as pdimport numpy as npimport hvplot.pandasimport pandas as pd#import seaborn as snsfrom matplotlib import pyplot as pltimport holoviews as hvfrom shapely.geometry import Polygonfrom shapely.geometry import MultiPolygonimport requestsimport geoviews as gvimport geoviews.tile_sources as gvtsimport foliumfrom folium import pluginsfrom shapely.geometry import Pointimport xyzservicesimport osmnx as oximport networkx as nximport pygrisimport cenpy%matplotlib inline# See lots of columnspd.options.display.max_rows =9999pd.options.display.max_colwidth =200# Hide warnings due to issue in shapely package # See: https://github.com/shapely/shapely/issues/1345np.seterr(invalid="ignore");
available = cenpy.explorer.available()available.head()# Return a dataframe of
/Users/bin/miniforge3/envs/musa-550-fall-2023/lib/python3.10/site-packages/cenpy/explorer.py:70: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
listcols = raw_table.applymap(lambda x: isinstance(x, list)).any()
The Annual Business Survey (ABS) provides information on selected economic and demographic characteristics for businesses and business owners by sex, ethnicity, race, and veteran status. Further, ...
006:07
public
Annual Business Survey: Characteristics of Businesses: 2017
The Annual Business Survey (ABS) provides information on selected economic and demographic characteristics for businesses and business owners by sex, ethnicity, race, and veteran status. Further, ...
006:07
public
Annual Business Survey: Characteristics of Businesses: 2018
The Annual Business Survey (ABS) provides information on selected economic and demographic characteristics for businesses and business owners by sex, ethnicity, race, and veteran status. Further, ...
006:07
public
2019 Annual Business Survey: Characteristics of Business
The Annual Business Survey (ABS) provides information on selected economic and demographic characteristics for businesses and business owners by sex, ethnicity, race, and veteran status. Further, ...
006:07
public
2020 Annual Business Survey: Characteristics of Business
The Annual Business Survey (ABS) provides information on selected economic and demographic characteristics for businesses and business owners by sex, ethnicity, race, and veteran status. Further, ...
006:07
public
2021 Annual Business Survey: Characteristics of Business
True
NaN
True
(abscb,)
2021.0
# Connect to Census APIacs = cenpy.remote.APIConnection("ACSDT5Y2021")# Define variables of interestvariables = ["NAME","B19013_001E", # Median income"B03002_001E", # Total population"B03002_003E", # Not Hispanic, White"B03002_012E", # Hispanic or Latino"B08301_001E", # Means of transportation to work"B08301_010E"# Public transportation]# Los Angeles County and California codesla_county_code ="037"ca_state_code ="06"# Query ACS data for Los Angeles block groupsla_inc_data = acs.query( cols=variables, geo_unit="block group:*", geo_filter={"state": ca_state_code, "county": la_county_code, "tract": "*"})# Convert numerical columns to floatfor variable in variables:if variable !="NAME": la_inc_data[variable] = la_inc_data[variable].astype(float)
# Create a cleaned DataFrame from la_inc_datala_final = la_inc_data.copy()# Adjusted columns to dropcolumns_to_drop = ["STATEFP", "COUNTYFP", "TRACTCE", "BLKGRPCE", "GEOID", "NAMELSAD","MTFCC", "FUNCSTAT", "ALAND", "AWATER", "INTPTLAT", "INTPTLON"]# Drop unnecessary columnsifall(col in la_final.columns for col in columns_to_drop): la_final.drop(columns=columns_to_drop, inplace=True)else: missing_cols = [col for col in columns_to_drop if col notin la_final.columns]print(f"Warning: The following columns are missing and cannot be dropped: {missing_cols}")# Verify the structure of the cleaned DataFrameprint(la_final.columns)la_final.head()
Warning: The following columns are missing and cannot be dropped: ['STATEFP', 'COUNTYFP', 'TRACTCE', 'BLKGRPCE', 'GEOID', 'NAMELSAD', 'MTFCC', 'FUNCSTAT', 'ALAND', 'AWATER', 'INTPTLAT', 'INTPTLON']
Index(['NAME', 'B19013_001E', 'B03002_001E', 'B03002_003E', 'B03002_012E',
'B08301_001E', 'B08301_010E', 'state', 'county', 'tract',
'block group'],
dtype='object')
NAME
B19013_001E
B03002_001E
B03002_003E
B03002_012E
B08301_001E
B08301_010E
state
county
tract
block group
0
Block Group 1, Census Tract 1011.10, Los Angeles County, California
63242.0
1630.0
932.0
571.0
697.0
13.0
06
037
101110
1
1
Block Group 2, Census Tract 1011.10, Los Angeles County, California
56250.0
1492.0
864.0
314.0
772.0
47.0
06
037
101110
2
2
Block Group 3, Census Tract 1011.10, Los Angeles County, California
99567.0
757.0
509.0
120.0
468.0
0.0
06
037
101110
3
3
Block Group 1, Census Tract 1011.22, Los Angeles County, California
120833.0
2608.0
1879.0
117.0
1195.0
0.0
06
037
101122
1
4
Block Group 2, Census Tract 1011.22, Los Angeles County, California