Skip to content

NYC Citi Bike Data Analysis Dashboard Case Study

This case study presents a Python and Streamlit dashboard project built to analyse NYC Citi Bike trip data, identify station demand patterns, explore high-volume routes, and understand how weather conditions may influence bike usage across New York City.

The project uses Citi Bike trip data from 2022, enriched with NOAA weather data, to create a practical urban mobility analysis dashboard. The goal was to transform raw trip records into clear visual insights that could support operational decisions around bike redistribution, station planning, demand monitoring, and service optimisation.

The final result is a deployed NYC Citi Bike data analysis dashboard built with Python, Streamlit, Plotly, pandas, and Kepler.gl.

Project Overview

Bike-sharing systems generate large volumes of trip data every day. Each ride contains useful information about where users start, where they end, how long they ride, what type of bike they use, and when the trip happens.

The challenge is that raw trip data is difficult to interpret without cleaning, aggregation, and visualisation. A business or operations team needs a clear way to identify demand patterns, high-traffic stations, popular routes, and possible service pressure points.

This project was designed to answer those questions through a dashboard that makes the data easier to explore and understand.

Business Problem

A bike-sharing company needs to make sure bikes are available in the right places at the right times.

If some stations experience high departures but fewer arrivals, bikes may become unavailable during peak periods. If other stations receive more bikes than they lose, they may become overcrowded. These imbalances can affect customer experience, operational efficiency, and long-term service planning.

The key business problem was:

How can Citi Bike trip data be used to identify high-demand stations, popular routes, peak usage periods, and operational patterns that could support better bike redistribution and station planning?

Project Goals

The main goals of this project were to:

Identify the most active Citi Bike stations across New York City.

Analyse popular trip routes between start and end stations.

Explore hourly, daily, and seasonal bike usage patterns.

Understand whether temperature and weather conditions appear to influence trip volume.

Build an interactive dashboard that communicates insights clearly.

Support data-driven decisions around station planning, bike redistribution, and operational strategy.

Data Sources

The project used two main data sources.

The first source was Citi Bike’s public trip data for 2022. This dataset included trip-level information such as ride ID, ride type, start time, end time, start station, end station, station coordinates, and rider type.

The second source was NOAA weather data, collected through the NOAA API. This data was used to add daily average temperature information and explore the relationship between weather conditions and Citi Bike usage.

Combining trip data with weather data created a more complete view of demand patterns across the city.

Tools Used

This project was completed using:

Python for data analysis and dashboard development.

pandas for data cleaning, transformation, and aggregation.

NumPy for numerical operations.

Plotly for interactive charts.

Matplotlib and Seaborn for exploratory visualisation.

Kepler.gl for geospatial route mapping.

Streamlit for dashboard deployment.

GitHub for project version control and documentation.

Data Preparation Process

Before building the dashboard, the raw data needed to be cleaned and prepared.

The data preparation process included checking for missing values, reviewing data types, removing unnecessary fields, creating new time-based variables, aggregating station and route activity, and preparing smaller datasets suitable for dashboard performance.

New fields were created to support the analysis, including trip date, trip hour, daily trip counts, station-level totals, route-level totals, and temperature-linked trip summaries.

The final prepared datasets allowed the dashboard to focus on the most important operational patterns without overwhelming the user with unnecessary raw data.

Analysis Approach

The analysis focused on practical questions that a transport or operations team would care about.

The first part of the analysis looked at station demand. This helped identify which start stations generated the highest number of trips and where usage was most concentrated.

The second part looked at daily bike trips compared with average temperature. This helped explore whether warmer or colder conditions appeared to influence total ride volume.

The third part focused on geospatial trip movement. Using start and end station coordinates, the dashboard visualised aggregated routes across New York City to show where high-volume bike movement was happening.

Together, these views created a clearer picture of station activity, demand timing, and geographic trip behaviour.

Key Findings

The analysis showed that Citi Bike usage was highly concentrated in central Manhattan and other commuter-heavy areas.

Peak usage appeared around commuting periods, especially during morning and evening travel times.

Some stations had much higher trip activity than others, suggesting that bike availability and redistribution may need closer monitoring in those locations.

Popular routes often connected business districts, transport hubs, and dense urban areas.

Weekday usage patterns suggested strong commuter-driven demand, while other areas showed more irregular usage that may be linked to tourism, leisure, or local events.

The weather analysis suggested that temperature can influence bike usage, with trip volumes changing alongside seasonal and daily temperature patterns.

The geospatial map made it easier to identify high-volume movement patterns and potential operational pressure points across the city.

Dashboard Features

The final Streamlit dashboard includes several core visual sections.

The first section shows the top 20 most popular Citi Bike start stations in New York City. This helps quickly identify where trip demand is strongest.

The second section compares daily bike trips with average daily temperature. This allows users to explore the relationship between weather conditions and trip activity.

The third section displays an interactive geospatial map of aggregated bike trips across New York City. This view shows route movement between start and end stations and helps reveal high-volume travel patterns.

These dashboard elements were designed to make the analysis easy to understand for both technical and non-technical stakeholders.

Business Value

This project demonstrates how urban mobility data can be transformed into practical business insight.

A bike-sharing company could use this type of dashboard to monitor demand, identify high-pressure stations, plan bike redistribution, understand commuter behaviour, and support station expansion decisions.

Instead of relying only on raw trip logs, the dashboard gives stakeholders a clearer view of where and when demand is happening.

This makes it easier to move from data collection to operational decision-making.

Recommendations

Based on the analysis, Citi Bike or a similar bike-sharing operator could benefit from monitoring high-demand stations more closely during peak commuting periods.

Stations with consistently high departures may require more frequent bike rebalancing to avoid shortages.

Popular commuter routes could be prioritised for service reliability, especially around business districts and transport hubs.

Weather and seasonality should also be considered when planning fleet availability, since demand can change significantly depending on temperature and time of year.

Further analysis could include predictive demand forecasting, real-time station availability, customer segmentation, and automated alerts for stations at risk of imbalance.

Final Dashboard

The final project was deployed as an interactive NYC Citi Bike data analysis dashboard.

The dashboard allows users to explore station demand, daily trip activity, temperature patterns, and geographic route movement across New York City.

You can view the deployed NYC Citi Bike data analysis dashboard here:

NYC Citi Bike data analysis dashboard

GitHub Repository

The full project files are available on GitHub, including the Python dashboard file, prepared datasets, mapping configuration, notebooks, and documentation.

You can view the NYC Citi Bike Python dashboard GitHub repository here:

NYC Citi Bike Python dashboard GitHub repository

About This Project

This project was created as part of my data analytics portfolio to demonstrate practical skills in Python, data cleaning, geospatial analysis, dashboard development, and business-focused data storytelling.

It shows how raw transport data can be turned into a clear dashboard that supports real-world decision-making.

About Me

I’m Elia Lanzuise, a Melbourne-based data analyst focused on transforming raw data into clear, actionable insights using Python, SQL, Tableau, Power BI, Excel, and dashboard development.

My work focuses on business performance analysis, customer behaviour, operational insights, data visualisation, and turning complex datasets into practical decisions.

You can explore more of my work on my data analyst portfolio.

If your business needs help turning raw data into dashboards, reports, and clear insights, you can learn more about my business dashboard services.