Skip to content

Case Study: Public Transport Demand Forecasting

Case Study

Key Result: 12% improvement in service reliability, 35% reduction in overcrowding incidents, optimized staff rostering and resource allocation.
Agency Type Transport Authority
Domain Public Transport
Challenge Optimizing service planning and resource allocation
AI Approach Time-series forecasting with external data integration

Executive Summary

A metropolitan transport authority implemented an AI-powered demand forecasting system to predict passenger volumes across the network. The system enabled dynamic service adjustments, improving service reliability by 12%, reducing overcrowding incidents by 35%, and optimizing staff rostering.


The Challenge

Situation

  • 1.2 million daily passenger trips
  • 450 bus routes, 15 train lines, 3 light rail corridors
  • Service planning based on historical averages
  • Significant demand variability (events, weather, holidays)
  • Limited ability to respond to demand changes

Problems

  • Fixed schedules didn't match actual demand patterns
  • Overcrowding during peaks affecting customer experience
  • Empty services during off-peak wasting resources
  • Event-driven demand spikes poorly managed
  • Difficulty planning for new routes or changes

Business Impact

  • Customer satisfaction declining (4.1 → 3.6/5 over 3 years)
  • Complaints about crowding increased 40%
  • $8M annual cost from inefficient resource allocation
  • Staff rostering challenges
  • Political pressure on service quality

The Solution

AI Approach

Model Type: Time series forecasting with external factors Architecture: Ensemble (Prophet + LSTM + Gradient Boosting) Integration: Operations planning system

System Design

flowchart LR
    subgraph IN["<strong>Data Inputs</strong>"]
        I1[Historical Ridership]
        I2[Real-time Counts]
        I3[External Factors]
    end

    subgraph FE["<strong>Feature Engineering</strong>"]
        F1[Time Features]
        F2[Calendar Events]
        F3[Weather & Special Factors]
    end

    subgraph MOD["<strong>Forecasting Models</strong>"]
        M1[Prophet - Trend]
        M2[LSTM - Sequential]
        M3[XGBoost - Boosting]
    end

    subgraph OUT["<strong>Operations Integration</strong>"]
        O1[Service Planning]
        O2[Rostering System]
        O3[Real-time Display]
    end

    IN --> FE --> MOD --> ENS[Ensemble Forecast]
    ENS --> OUT

    style IN fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style FE fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style MOD fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style OUT fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style ENS fill:#e0f2f1,stroke:#00796b,stroke-width:2px

Forecasting Horizons

Horizon Use Case Update Frequency
5 years Network planning Quarterly
12 months Budget and procurement Monthly
3 months Service planning Weekly
1 week Rostering Daily
Next day Operations Hourly
2 hours Real-time response 15 minutes

External Factors Integrated

Calendar: - Day of week, time of day - Public holidays - School terms - Major events (sports, concerts)

Weather: - Temperature - Rainfall - Severe weather warnings - Heat/cold waves

Special Factors: - Construction/disruptions - Fare changes - COVID/pandemic effects - Economic indicators


Implementation

Timeline

Phase Duration Activities
Discovery 6 weeks Requirements, data assessment
Data engineering 12 weeks Data pipeline, feature store
Model development 16 weeks Model training, ensemble design
Integration 10 weeks Planning system integration
Pilot 8 weeks Bus network pilot
Full rollout 12 weeks All modes deployment
Total 64 weeks

Team

Role FTE Responsibility
Product Owner 0.5 Requirements, stakeholder management
Data Scientist 2.0 Model development
ML Engineer 1.0 Model serving, infrastructure
Data Engineer 1.5 Data pipelines, integration
Planning Analyst 1.0 Domain expertise, validation
Software Developer 1.0 UI, system integration

Data Preparation

Data Sources: - Automatic Fare Collection (tap on/off) - Automatic Passenger Counters (buses) - Train car weight sensors - Weather bureau API - Events calendar (venues, sports) - School term calendar - Historical service changes

Data Volume: - 5 years of ridership data - ~2 billion journey records - 15-minute aggregation granularity - Stop/station level detail

Feature Engineering: - 156 features per stop-time combination - Lagged values (same time yesterday, last week, last year) - Rolling statistics (7-day, 28-day averages) - Fourier features for cyclical patterns - Holiday and event embeddings


Results

Forecast Accuracy

Horizon MAPE Notes
Next day 8.2% 0.94 Very high accuracy
1 week 11.5% 0.89 Good accuracy
1 month 15.3% 0.82 Acceptable
3 months 18.7% 0.75 Useful for planning
Mode MAPE (next day) Accuracy Driver
Train 6.8% Regular commuter patterns
Bus (urban) 9.4% More variability
Bus (suburban) 12.1% Event-sensitive
Light rail 7.2% Regular patterns

Operational Impact

Metric Before After Improvement
Service reliability 88% 92% +4.5%
Overcrowding incidents 850/mo 552/mo -35%
Empty service runs 12% 8% -33%
Forecast-based planning 0% 78% New capability
Staff rostering accuracy 72% 89% +24%

Customer Experience

Metric Before After Change
Customer satisfaction 3.6/5 4.⅕ +14%
Crowding complaints 2,400/mo 1,560/mo -35%
Journey time reliability 82% 88% +7%

Financial Impact

Item Annual Value
Reduced overtime (better rostering) $1,200,000
Fuel savings (optimized services) $800,000
Maintenance savings $400,000
Reduced complaint handling $200,000
Total Savings $2,600,000

Challenges and Lessons Learned

Challenge 1: Data Quality

Issue: Fare collection data had gaps and errors Solution: - Data quality pipeline with automated cleaning - Imputation for missing values - Confidence scoring for forecasts Lesson: Invest in data quality monitoring

Challenge 2: Event Handling

Issue: One-off events (new stadium, major concert) hard to predict Solution: - Similar event matching algorithm - Manual adjustment capability for planners - Rapid learning from actual outcomes Lesson: Combine AI with human expertise for unusual events

Challenge 3: COVID Disruption

Issue: Pandemic completely changed demand patterns Solution: - Regime detection for structural breaks - Weighted training with recent data - Scenario-based forecasting Lesson: Build for adaptability, not just accuracy

Challenge 4: Real-time Integration

Issue: Difficult to update forecasts fast enough for operations Solution: - Pre-computed forecasts adjusted with real-time signals - Edge computing for rapid updates - Graceful degradation when systems lag Lesson: Architect for speed and resilience

Challenge 5: Planner Trust

Issue: Planners didn't trust AI forecasts initially Solution: - Side-by-side comparison with traditional methods - Explainability showing key factors - Allowing overrides with feedback loop Lesson: Trust is earned through transparency and performance


Governance and Compliance

Governance Structure

  • Executive sponsor: Chief Operating Officer
  • Steering committee: Monthly review
  • Technical governance: Data and analytics team
  • Risk tier: Tier 1 (Low) - No individual decisions

Data Considerations

  • No personal data used (aggregated counts only)
  • Fare collection data de-identified
  • Weather and event data from public sources
  • Privacy impact assessment: Not required (aggregate data)

Model Monitoring

  • Daily accuracy tracking by route/line
  • Alert on accuracy degradation >5%
  • Weekly performance review
  • Monthly model refresh evaluation

Technical Details

Model Architecture

Prophet (Trend & Seasonality): - Daily, weekly, yearly seasonality - Holiday effects - Trend with changepoint detection - Base 40% of ensemble

LSTM (Sequence Learning): - 2-layer bidirectional LSTM - 168-hour (1 week) input sequence - Multi-step output (up to 7 days) - Base 35% of ensemble

XGBoost (Feature-Rich): - 156 engineered features - Weather and event integration - Handles non-linear relationships - Base 25% of ensemble

Ensemble: - Weighted average based on recent performance - Weights adjusted dynamically per route/time - Confidence intervals from all models

Infrastructure

  • Training: Cloud HPC (weekly retrain)
  • Serving: Kubernetes cluster
  • Feature store: Real-time feature computation
  • Cache: Pre-computed forecasts
  • API: RESTful + streaming

Performance

  • Throughput: 50,000 forecasts/minute
  • Latency: <100ms per forecast
  • Availability: 99.95%
  • Storage: 2TB feature store

Recommendations for Similar Projects

Do

  • Invest heavily in data quality
  • Build for multiple forecast horizons
  • Integrate external factors (weather, events)
  • Plan for regime changes (COVID taught us)
  • Combine AI with human expertise
  • Start with high-volume, stable routes

Don't

  • Expect 100% automation of planning decisions
  • Ignore data quality issues
  • Over-fit to historical patterns
  • Deploy without planner buy-in
  • Neglect model monitoring
  • Assume patterns are static

Cost-Benefit Summary

Costs (First Year)

Item Cost
Discovery & planning $60,000
Data engineering $180,000
Model development $280,000
Integration $160,000
Infrastructure $120,000
Pilot $80,000
Total Year 1 $880,000

Ongoing Costs (Annual)

Item Cost
Infrastructure $180,000
Model maintenance $150,000
Support $70,000
Total Annual $400,000

Benefits (Annual)

Item Value
Operational savings $2,600,000
Customer experience (est. value) $500,000
Annual Benefit $3,100,000

ROI: 252% | Payback: 5 months


Contact

For more information about this case study, contact the AI Toolkit team.


Related documents: AI Use Case Template | How to AI Monitoring