Case Study: Public Transport Demand Forecasting¶

Case Study

Key Result: 12% improvement in service reliability, 35% reduction in overcrowding incidents, optimized staff rostering and resource allocation.


Agency Type	Transport Authority
Domain	Public Transport
Challenge	Optimizing service planning and resource allocation
AI Approach	Time-series forecasting with external data integration

Executive Summary¶

A metropolitan transport authority implemented an AI-powered demand forecasting system to predict passenger volumes across the network. The system enabled dynamic service adjustments, improving service reliability by 12%, reducing overcrowding incidents by 35%, and optimizing staff rostering.

The Challenge¶

Situation¶

1.2 million daily passenger trips
450 bus routes, 15 train lines, 3 light rail corridors
Service planning based on historical averages
Significant demand variability (events, weather, holidays)
Limited ability to respond to demand changes

Problems¶

Fixed schedules didn't match actual demand patterns
Overcrowding during peaks affecting customer experience
Empty services during off-peak wasting resources
Event-driven demand spikes poorly managed
Difficulty planning for new routes or changes

Business Impact¶

Customer satisfaction declining (4.1 → 3.6/5 over 3 years)
Complaints about crowding increased 40%
$8M annual cost from inefficient resource allocation
Staff rostering challenges
Political pressure on service quality

The Solution¶

AI Approach¶

Model Type: Time series forecasting with external factors Architecture: Ensemble (Prophet + LSTM + Gradient Boosting) Integration: Operations planning system

System Design¶

flowchart LR
    subgraph IN["<strong>Data Inputs</strong>"]
        I1[Historical Ridership]
        I2[Real-time Counts]
        I3[External Factors]
    end

    subgraph FE["<strong>Feature Engineering</strong>"]
        F1[Time Features]
        F2[Calendar Events]
        F3[Weather & Special Factors]
    end

    subgraph MOD["<strong>Forecasting Models</strong>"]
        M1[Prophet - Trend]
        M2[LSTM - Sequential]
        M3[XGBoost - Boosting]
    end

    subgraph OUT["<strong>Operations Integration</strong>"]
        O1[Service Planning]
        O2[Rostering System]
        O3[Real-time Display]
    end

    IN --> FE --> MOD --> ENS[Ensemble Forecast]
    ENS --> OUT

    style IN fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style FE fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style MOD fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style OUT fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style ENS fill:#e0f2f1,stroke:#00796b,stroke-width:2px

Forecasting Horizons¶

Horizon	Use Case	Update Frequency
5 years	Network planning	Quarterly
12 months	Budget and procurement	Monthly
3 months	Service planning	Weekly
1 week	Rostering	Daily
Next day	Operations	Hourly
2 hours	Real-time response	15 minutes

External Factors Integrated¶

Calendar: - Day of week, time of day - Public holidays - School terms - Major events (sports, concerts)

Weather: - Temperature - Rainfall - Severe weather warnings - Heat/cold waves

Special Factors: - Construction/disruptions - Fare changes - COVID/pandemic effects - Economic indicators

Implementation¶

Timeline¶

Phase	Duration	Activities
Discovery	6 weeks	Requirements, data assessment
Data engineering	12 weeks	Data pipeline, feature store
Model development	16 weeks	Model training, ensemble design
Integration	10 weeks	Planning system integration
Pilot	8 weeks	Bus network pilot
Full rollout	12 weeks	All modes deployment
Total	64 weeks

Team¶

Role	FTE	Responsibility
Product Owner	0.5	Requirements, stakeholder management
Data Scientist	2.0	Model development
ML Engineer	1.0	Model serving, infrastructure
Data Engineer	1.5	Data pipelines, integration
Planning Analyst	1.0	Domain expertise, validation
Software Developer	1.0	UI, system integration

Data Preparation¶

Data Sources: - Automatic Fare Collection (tap on/off) - Automatic Passenger Counters (buses) - Train car weight sensors - Weather bureau API - Events calendar (venues, sports) - School term calendar - Historical service changes

Data Volume: - 5 years of ridership data - ~2 billion journey records - 15-minute aggregation granularity - Stop/station level detail

Feature Engineering: - 156 features per stop-time combination - Lagged values (same time yesterday, last week, last year) - Rolling statistics (7-day, 28-day averages) - Fourier features for cyclical patterns - Holiday and event embeddings

Results¶

Forecast Accuracy¶

Horizon	MAPE	R²	Notes
Next day	8.2%	0.94	Very high accuracy
1 week	11.5%	0.89	Good accuracy
1 month	15.3%	0.82	Acceptable
3 months	18.7%	0.75	Useful for planning

Mode	MAPE (next day)	Accuracy Driver
Train	6.8%	Regular commuter patterns
Bus (urban)	9.4%	More variability
Bus (suburban)	12.1%	Event-sensitive
Light rail	7.2%	Regular patterns

Operational Impact¶

Metric	Before	After	Improvement
Service reliability	88%	92%	+4.5%
Overcrowding incidents	850/mo	552/mo	-35%
Empty service runs	12%	8%	-33%
Forecast-based planning	0%	78%	New capability
Staff rostering accuracy	72%	89%	+24%

Customer Experience¶

Metric	Before	After	Change
Customer satisfaction	3.6/5	4.⅕	+14%
Crowding complaints	2,400/mo	1,560/mo	-35%
Journey time reliability	82%	88%	+7%

Financial Impact¶

Item	Annual Value
Reduced overtime (better rostering)	$1,200,000
Fuel savings (optimized services)	$800,000
Maintenance savings	$400,000
Reduced complaint handling	$200,000
Total Savings	$2,600,000

Challenges and Lessons Learned¶

Challenge 1: Data Quality¶

Issue: Fare collection data had gaps and errors Solution: - Data quality pipeline with automated cleaning - Imputation for missing values - Confidence scoring for forecasts Lesson: Invest in data quality monitoring

Challenge 2: Event Handling¶

Issue: One-off events (new stadium, major concert) hard to predict Solution: - Similar event matching algorithm - Manual adjustment capability for planners - Rapid learning from actual outcomes Lesson: Combine AI with human expertise for unusual events

Challenge 3: COVID Disruption¶

Issue: Pandemic completely changed demand patterns Solution: - Regime detection for structural breaks - Weighted training with recent data - Scenario-based forecasting Lesson: Build for adaptability, not just accuracy

Challenge 4: Real-time Integration¶

Issue: Difficult to update forecasts fast enough for operations Solution: - Pre-computed forecasts adjusted with real-time signals - Edge computing for rapid updates - Graceful degradation when systems lag Lesson: Architect for speed and resilience

Challenge 5: Planner Trust¶

Issue: Planners didn't trust AI forecasts initially Solution: - Side-by-side comparison with traditional methods - Explainability showing key factors - Allowing overrides with feedback loop Lesson: Trust is earned through transparency and performance

Governance and Compliance¶

Governance Structure¶

Executive sponsor: Chief Operating Officer
Steering committee: Monthly review
Technical governance: Data and analytics team
Risk tier: Tier 1 (Low) - No individual decisions

Data Considerations¶

No personal data used (aggregated counts only)
Fare collection data de-identified
Weather and event data from public sources
Privacy impact assessment: Not required (aggregate data)

Model Monitoring¶

Daily accuracy tracking by route/line
Alert on accuracy degradation >5%
Weekly performance review
Monthly model refresh evaluation

Technical Details¶

Model Architecture¶

Prophet (Trend & Seasonality): - Daily, weekly, yearly seasonality - Holiday effects - Trend with changepoint detection - Base 40% of ensemble

LSTM (Sequence Learning): - 2-layer bidirectional LSTM - 168-hour (1 week) input sequence - Multi-step output (up to 7 days) - Base 35% of ensemble

XGBoost (Feature-Rich): - 156 engineered features - Weather and event integration - Handles non-linear relationships - Base 25% of ensemble

Ensemble: - Weighted average based on recent performance - Weights adjusted dynamically per route/time - Confidence intervals from all models

Infrastructure¶

Training: Cloud HPC (weekly retrain)
Serving: Kubernetes cluster
Feature store: Real-time feature computation
Cache: Pre-computed forecasts
API: RESTful + streaming

Performance¶

Throughput: 50,000 forecasts/minute
Latency: <100ms per forecast
Availability: 99.95%
Storage: 2TB feature store

Recommendations for Similar Projects¶

Do¶

Invest heavily in data quality
Build for multiple forecast horizons
Integrate external factors (weather, events)
Plan for regime changes (COVID taught us)
Combine AI with human expertise
Start with high-volume, stable routes

Don't¶

Expect 100% automation of planning decisions
Ignore data quality issues
Over-fit to historical patterns
Deploy without planner buy-in
Neglect model monitoring
Assume patterns are static

Cost-Benefit Summary¶

Costs (First Year)¶

Item	Cost
Discovery & planning	$60,000
Data engineering	$180,000
Model development	$280,000
Integration	$160,000
Infrastructure	$120,000
Pilot	$80,000
Total Year 1	$880,000

Ongoing Costs (Annual)¶

Item	Cost
Infrastructure	$180,000
Model maintenance	$150,000
Support	$70,000
Total Annual	$400,000

Benefits (Annual)¶

Item	Value
Operational savings	$2,600,000
Customer experience (est. value)	$500,000
Annual Benefit	$3,100,000

ROI: 252% | Payback: 5 months¶

Contact¶

For more information about this case study, contact the AI Toolkit team.

Related documents: AI Use Case Template | How to AI Monitoring