Choosing the right database is crucial for any project, yet performance benchmarks often feel detached from practical application. Our recent university study, leveraging actual Disneyland visitor data, offers a compelling, real-world comparison between MongoDB and MySQL, revealing significant insights into their respective strengths and weaknesses.
The Disneyland Dataset: A Perfect Testbed
Our research utilized a comprehensive Disneyland Paris dataset, ideal for stress-testing databases with complex analytical challenges. This dataset included:
- Attraction wait times: 37 attractions (2018-2019)
- Daily park attendance: (2018-2022)
- Historical weather conditions: (1999-2022)
- Park operational schedules: (2018-2022)
This rich data demanded high-frequency writes (hourly weather, 15-minute wait times), intricate aggregations, time-series analysis, and join-like operations, making it an excellent candidate for a rigorous performance showdown.
Ensuring a Level Playing Field
To guarantee an unbiased comparison, we meticulously set up both databases under identical conditions:
- Hardware: A single server (20GB RAM, 6 CPU cores, HDD storage)
- Indexing: Equivalent indexes applied to both systems
- Schema: Normalized structure for MySQL; embedded documents for MongoDB
- Queries: Identical business logic translated into SQL and MongoDB aggregation pipelines
The primary schema distinction lay in how weather data was handled: MySQL used a traditional normalized approach with foreign keys, while MongoDB opted for embedded weather objects within documents, sidestepping the need for explicit joins.
Performance Unveiled: The Numbers Speak
We executed each query 100 times sequentially to gather robust average performance metrics.
Write Operations (Data Insertion)
Operation | MySQL | MongoDB | MongoDB Advantage |
---|---|---|---|
Weather Data Insert | 56.7ms | 19.6ms | 65.4% faster |
Wait Times Insert | 59.2ms | 19.3ms | 67.4% faster |
Attendance Insert | 57.2ms | 19.3ms | 66.2% faster |
Schedule Insert | 57.7ms | 28.4ms | 50.7% faster |
Conclusion: MongoDB consistently demonstrated superior performance in write operations, often achieving speeds more than double that of MySQL.
Read Operations (Analytical Queries)
Query Type | MySQL | MongoDB | Winner |
---|---|---|---|
RQ1: Avg. wait times by attraction | 52.6 seconds | 1.0 seconds | MongoDB (98.1% faster) |
RQ2: Attendance by weather | 413ms | 20.7 seconds | MySQL (98% faster) |
RQ3: Wait times vs. weather | 1h 12m 37s | 5ms* | MongoDB |
RQ4: Holiday vs. regular attendance | 66.3ms | 58.5ms | MongoDB (11.7% faster) |
*_MySQL query failed after 4+ hours; MongoDB completed swiftly._
Interpreting the Results
MongoDB’s Dominance
- Streamlined Aggregations: The staggering 98% advantage in RQ1 highlights MongoDB’s efficiency with straightforward grouping and averaging via its aggregation pipeline.
- Complex Temporal Analysis: RQ3’s dramatic outcome underscores MongoDB’s prowess in handling complex time-based correlations, completing tasks in milliseconds where MySQL struggled for hours.
- Write-Intensive Workloads: With 50-67% faster write performance, MongoDB emerges as a strong contender for applications requiring rapid data ingestion.
MySQL’s Unexpected Triumph
MySQL unexpectedly outpaced MongoDB by 98% in RQ2, which involved analyzing attendance based on weather conditions. This query necessitated complex temporary table operations and multi-source correlations—a domain where SQL’s sophisticated optimization capabilities genuinely shine.
The Technical Rationale
Why MongoDB Excels:
- No JOIN Overhead: Embedded documents inherently eliminate the need for costly table joins.
- Optimized Aggregation: Its pipeline is purpose-built for efficient analytical workloads.
- Flexible Indexing: Better tailored for document-centric queries.
- Schema Adaptability: Offers greater flexibility during data writes.
Why MySQL Holds Its Ground:
- Advanced Query Optimization: Benefits from decades of SQL optimizer enhancements.
- Efficient Temporary Operations: Superior handling of intricate intermediate query results.
- Memory Management: Particularly effective for specific multi-table operations.
Real-World Guidance for Database Selection
Based on our findings:
Opt for MongoDB when:
- You have high-frequency data ingestion (e.g., IoT, real-time analytics).
- Your workload involves extensive time-series data analysis.
- Your aggregation needs are simple to moderate.
- Your application demands schema flexibility.
Choose MySQL when:
- Your queries involve complex correlations across multiple data sources.
- You heavily rely on temporary tables and intricate JOINs.
- Your ecosystem is already SQL-centric.
- Strict data consistency is a paramount requirement.
The Critical Takeaway: Context is King
The most profound lesson from our study is that no single database is universally superior. Instead, the specific characteristics of your workload dictate the optimal choice. MongoDB shone in write-heavy scenarios and simpler aggregations, while MySQL proved its mettle in complex relational operations.
The significant performance disparities observed, such as the 98% differences, are not theoretical edge cases. They represent real-world implications that can drastically impact application responsiveness and user experience.
This performance analysis was a component of a Modern Database Systems course at TH Köln. The complete benchmark code and query implementations are publicly available here for those interested in validating or expanding upon these tests.