π§ From 500 Million Orders to 5ms Queries
A Deep Dive into MongoDB B-Tree Indexes, Compound Indexes, Disk Pages, and Real-World Query Optimization

Modern applications rely heavily on databases. When a system is small, almost every query feels fast. But as your product grows, database performance becomes one of the biggest engineering challenges.
Imagine an e-commerce platform with:
500,000,000 orders
10,000,000 users
real-time dashboards
admin analytics queries
At this scale, even a simple query can become painfully slow.
For example:
db.orders.find({ status: "delivered" })
Why does a simple query sometimes take seconds instead of milliseconds?
The answer lies in how indexes work internally inside MongoDB.
This article dives deep into:
B-Tree index internals
Disk page layout
How MongoDB stores index entries
Why compound indexes matter
Page splits and index growth
Real-world e-commerce examples
By the end, youβll understand exactly how MongoDB finds your data among hundreds of millions of documents.
Real-World Scenario: Large E-Commerce Platform
Consider an orders collection.
Example document:
{
_id: ObjectId("6651a1..."),
user_id: 784321,
product_id: 99231,
status: "delivered",
price: 1499,
created_at: ISODate("2025-03-10T10:21:00Z")
}
Collection size:
orders collection
-----------------
500 million documents
Order distribution:
delivered β 420M
pending β 60M
cancelled β 20M
What Happens Without an Index
Suppose the backend runs this query:
db.orders.find({ status: "delivered" })
Without an index, MongoDB performs a collection scan.
Execution plan:
COLLSCAN
This means MongoDB checks every document.
doc1 β check
doc2 β check
doc3 β check
...
doc500,000,000 β check
Even if only one result is needed, MongoDB still scans everything.
This is extremely expensive.
Creating an Index
To speed things up we create an index:
db.orders.createIndex({ status: 1 })
MongoDB builds a B-Tree index.
Important fact:
Indexes do not store full documents.
Instead they store:
index key
+
pointer to the document
Example index entries:
cancelled β pointer
delivered β pointer
delivered β pointer
delivered β pointer
pending β pointer
These entries are stored in sorted order.
MongoDB Storage Engine
MongoDB uses the WiredTiger storage engine.
WiredTiger stores data in disk pages.
Typical page layout:
collection file
βββ page 1
βββ page 2
βββ page 3
βββ ...
Indexes are stored separately:
orders_status_index
βββ root page
βββ internal pages
βββ leaf pages
Each page contains many index entries.
The Structure of a MongoDB B-Tree
A B-Tree is a balanced search tree.
Structure:
Root node
Internal nodes
Leaf nodes
Visual example:
ROOT
[ delivered ]
/ \
cancelled pending
In real production systems the tree may contain thousands of nodes.
Leaf Nodes: Where Index Data Lives
Leaf nodes store the actual index entries.
Example leaf page:
cancelled β doc pointer
delivered β doc pointer
delivered β doc pointer
delivered β doc pointer
pending β doc pointer
Each entry contains:
indexed field value
+
pointer to the document
Because entries are sorted, searching becomes extremely fast.
Step-by-Step: How MongoDB Searches the Index
Query:
db.orders.find({ status: "delivered" })
Step 1 β Start at the root
ROOT
[ delivered ]
MongoDB compares the search key with root entries.
Step 2 β Traverse internal nodes
ROOT
[ delivered ]
/ \
< delivered β₯ delivered
The tree guides MongoDB toward the correct branch.
Step 3 β Reach leaf node
Eventually MongoDB finds the leaf node containing the key.
Example:
delivered β doc pointer
delivered β doc pointer
delivered β doc pointer
Step 4 β Fetch documents
MongoDB retrieves the documents using the stored pointers.
This avoids scanning unrelated data.
The Hidden Problem With Single Indexes
Single-field indexes are not always enough.
Consider this admin dashboard query:
db.orders.find({
status: "delivered"
}).sort({ created_at: -1 }).limit(20)
What happens internally?
Find all
"delivered"entriesSort them by date
Return latest 20
But remember:
delivered orders = 420 million
Even with an index, MongoDB still needs to process hundreds of millions of entries.
Compound Index to the Rescue
Solution:
db.orders.createIndex({
status: 1,
created_at: -1
})
Now the index stores entries like:
(delivered, Mar 10) β pointer
(delivered, Mar 9) β pointer
(delivered, Mar 8) β pointer
(pending, Mar 10) β pointer
(cancelled, Mar 4) β pointer
Data is sorted by:
status β created_at
Now the query becomes extremely efficient.
db.orders.find({
status: "delivered"
})
.sort({ created_at: -1 })
.limit(20)
MongoDB can:
jump directly to delivered
read newest entries
stop after 20 results
No sorting required.
Understanding the Left-Prefix Rule
Compound indexes follow a rule called the left-prefix rule.
If the index is:
{ status: 1, created_at: -1 }
It supports queries like:
status
status + created_at
But not efficiently:
created_at only
Because the index is sorted first by status.
What Happens When an Index Page Fills
Index pages have limited size.
When a page becomes full, MongoDB performs a page split.
Before split:
PAGE A
cancelled
delivered
delivered
delivered
pending
After split:
PAGE A PAGE B
cancelled delivered
delivered delivered
pending
Parent nodes are updated with new pointers.
How the Tree Grows
After many inserts, the tree grows.
Example structure:
ROOT
[ delivered ]
/ \
INTERNAL INTERNAL
/ \ / \
LEAF LEAF LEAF LEAF
Despite growing larger, the tree remains balanced.
This ensures predictable performance.
Time Complexity
B-Tree search complexity:
O(log n)
Even with 500 million documents, MongoDB may only read:
3-4 index pages
to locate matching documents.
Index Design in Real Production Systems
Large systems often maintain indexes like:
User order history:
{ user_id: 1, created_at: -1 }
Latest delivered orders:
{ status: 1, created_at: -1 }
Fast product analytics:
{ product_id: 1, created_at: -1 }
These power:
user dashboards
admin analytics
order history
real-time monitoring
The Most Important Lesson
Indexes are not just about adding fields.
They are about matching how your application queries data.
A poorly designed index may still scan millions of records.
But a well-designed compound index can reduce query time from:
seconds β milliseconds
Final Thoughts
Understanding how MongoDB indexes actually work internallyβincluding B-Trees, disk pages, compound indexes, and page splitsβgives engineers a huge advantage when building scalable systems.
Many database performance problems are not caused by infrastructure.
They are caused by misunderstanding how indexes organize data.
Once you understand the internals, database performance stops feeling mysterious and starts becoming predictable.



