Citi Bike usage analysis built for operational decisions
An end-to-end Python and Streamlit project that turned a year of NYC Citi Bike trip data, around 30 million rides, into an interactive operational dashboard: daily demand against weather, hourly usage rhythm, the busiest stations, and a geospatial map of the highest volume routes that tells an operations team where and when to focus bikes.
From raw trip records to operational priorities
The project was built around a practical operational question: where, and at what times, does Citi Bike run into bike availability problems across New York, and which stations and routes are worth focusing on first?
Business problem
A bike share operator running around 30 million annual trips cannot make fleet and rebalancing decisions from a monthly report. Customers complain that bikes are not available at certain times and places. The goal was to convert a full year of trip records into clear signals of demand by weather, hour, station, and route that support rebalancing, station planning, and service decisions.
Final output
The final deliverable was a five page Streamlit dashboard fed by a Python data pipeline: cleaned trip data, a NOAA weather merge, and aggregated station and route tables feeding the charts and map.
The dashboard helps stakeholders see the relationship between weather and demand, the daily usage rhythm, the busiest stations by season, and the high volume corridors that matter most for rebalancing and expansion.
How the analysis was built
The Python work converted raw, messy trip records into a clean, dashboard ready dataset enriched with weather, then aggregated it into the station and route tables that drive the charts and the map.
Demand by weather, hour, station, and route
Rather than a single headline number, the analysis looks at demand from four angles that an operations team can act on: how it moves with weather, how it moves through the day, where it concentrates, and which routes carry it.
How the analysis reads demand
Each view answers a different operational question, and the geospatial map is filtered so it shows recurring high volume routes rather than one-off journeys.
Daily bike trips follow temperature closely, rising into spring, peaking June to August, and dropping to their lowest in winter.
Usage is lowest overnight, climbs from around 06:00, and peaks in the late afternoon and early evening, 16:00 to 18:00.
The busiest start stations cluster in central Manhattan and commuter heavy zones, led by W 21 St & 6 Ave.
The map keeps only station pairs with 750 or more trips, so the view shows real corridors, not isolated rides.
The network at a glance
Headline cuts of the 2022 Citi Bike network, from the busiest stations to the daily and hourly demand shape.
Top stations
Busiest start stations by trips.
- W 21 St & 6 Ave1,461
- West St & Chambers St1,309
- Broadway & W 58 St1,296
- 1 Ave & E 68 St1,239
Daily rhythm
Trips by time of day.
- OvernightLowest
- Morning riseFrom 06:00
- Peak window16:00 to 18:00
Seasonal demand
How usage moves with weather.
- Peak monthsJun to Aug
- LowestWinter
- DriverTemperature
Routes
Geospatial map focus.
- Filter750+ trips
- ConcentrationManhattan
- PatternCommuter
Dashboard pages built around stakeholder questions
The app moves from an introduction into weather, hourly, station, map, and recommendation detail, each page answering a specific operational question.
Weather and Bike Trips
Daily trips plotted against daily temperature, showing the strong seasonal relationship between weather and demand.
Bike Trips by Hours
The daily usage curve, with demand lowest overnight, rising from 06:00, and peaking in the late afternoon.
Most Popular Stations
The top 20 start stations by trip volume, with a season filter so demand can be compared across the year.
Interactive Map with Bike Trips
A Kepler.gl map of aggregated trips, filtered to station pairs with 750 or more trips to surface the busiest corridors.
Recommendations
Practical actions on seasonal scaling and waterfront station planning, drawn from the patterns in the data.
What the data made clear
The dashboard turns a year of trip records into a small number of patterns an operations team can actually act on.
Daily bike trips follow temperature closely across the year, rising into spring, peaking June to August, and falling to their lowest in winter. Temperature is a clear driver of demand, which makes seasonality something the operation can plan around.
Trips are lowest overnight, climb sharply from around 06:00, and peak in the late afternoon and early evening, 16:00 to 18:00. Weekday usage runs above weekends, pointing to commuter driven demand.
The busiest start stations cluster in central Manhattan and commuter heavy zones, led by W 21 St & 6 Ave. Some stations also show an imbalance between departures and arrivals, a signal for rebalancing.
Filtering the map to station pairs with 750 or more trips shows demand running in dense, recurring corridors along Manhattan’s waterfront and its north to south routes, rather than spread evenly across the network.
Built on real data, with an honest scope
An operational dashboard only holds up if the data is cleaned properly and the conclusions stay within what the data can actually show. Both were built in from the start.
Conclusions you can defend
The findings rest on cleaned, real world trip data enriched with independent weather data, and the scope is stated plainly.
Cleaning, the NOAA weather merge, feature engineering, and station and route aggregation, all in pandas.
A Kepler.gl map of aggregated station to station trips, filtered to 750 or more trips to surface the busiest routes.
Weather, hourly usage, top stations, the interactive map, and recommendations, deployed live on Streamlit Cloud.
Daily trips against temperature, the hourly demand curve, and a station ranking with a season filter for drill-down.
How an operations team could use the dashboard
Scale bike availability seasonally: because demand tracks temperature so closely, Citi Bike could reduce active fleet and rebalancing effort by roughly 30 to 40% from November to April, lowering operating cost during a predictable low demand window.
Plan stations around real corridors: new stations and rebalancing effort along the waterfront should follow the dense, recurring high volume routes the map highlights, rather than even spacing.
Match staffing to the daily peak: concentrate availability and redistribution around the late afternoon peak, 16:00 to 18:00, when demand is highest.
Stay in scope: capital decisions and exact station counts were left out as a network operator call, with the analysis pointing the direction rather than costing the build.
What this project demonstrates
Working with real world API data, data cleaning at scale, joining external datasets, feature engineering and aggregation in pandas, geospatial visualization with Kepler.gl, and the ability to turn raw trip records into an interactive, deployed dashboard with clear operational recommendations.
Need the full portfolio or resume?
Download the PDF portfolio for a polished overview of the projects, or open the resume for the formal career summary, tools, and work history.