Skip to main content

Command Palette

Search for a command to run...

🧠 From 500 Million Orders to 5ms Queries

A Deep Dive into MongoDB B-Tree Indexes, Compound Indexes, Disk Pages, and Real-World Query Optimization

Published
β€’6 min read
🧠 From 500 Million Orders to 5ms Queries

Modern applications rely heavily on databases. When a system is small, almost every query feels fast. But as your product grows, database performance becomes one of the biggest engineering challenges.

Imagine an e-commerce platform with:

500,000,000 orders
10,000,000 users
real-time dashboards
admin analytics queries

At this scale, even a simple query can become painfully slow.

For example:

db.orders.find({ status: "delivered" })

Why does a simple query sometimes take seconds instead of milliseconds?

The answer lies in how indexes work internally inside MongoDB.

This article dives deep into:

  • B-Tree index internals

  • Disk page layout

  • How MongoDB stores index entries

  • Why compound indexes matter

  • Page splits and index growth

  • Real-world e-commerce examples

By the end, you’ll understand exactly how MongoDB finds your data among hundreds of millions of documents.


Real-World Scenario: Large E-Commerce Platform

Consider an orders collection.

Example document:

{
  _id: ObjectId("6651a1..."),
  user_id: 784321,
  product_id: 99231,
  status: "delivered",
  price: 1499,
  created_at: ISODate("2025-03-10T10:21:00Z")
}

Collection size:

orders collection
-----------------
500 million documents

Order distribution:

delivered β†’ 420M
pending β†’ 60M
cancelled β†’ 20M

What Happens Without an Index

Suppose the backend runs this query:

db.orders.find({ status: "delivered" })
Without an index, MongoDB performs a collection scan.

Execution plan:

COLLSCAN

This means MongoDB checks every document.

doc1 β†’ check
doc2 β†’ check
doc3 β†’ check
...
doc500,000,000 β†’ check

Even if only one result is needed, MongoDB still scans everything.

This is extremely expensive.


Creating an Index

To speed things up we create an index:

db.orders.createIndex({ status: 1 })

MongoDB builds a B-Tree index.

Important fact:

Indexes do not store full documents.

Instead they store:

index key
+
pointer to the document

Example index entries:

cancelled β†’ pointer
delivered β†’ pointer
delivered β†’ pointer
delivered β†’ pointer
pending   β†’ pointer

These entries are stored in sorted order.


MongoDB Storage Engine

MongoDB uses the WiredTiger storage engine.

WiredTiger stores data in disk pages.

Typical page layout:

collection file
 β”œβ”€β”€ page 1
 β”œβ”€β”€ page 2
 β”œβ”€β”€ page 3
 └── ...

Indexes are stored separately:

orders_status_index
 β”œβ”€β”€ root page
 β”œβ”€β”€ internal pages
 └── leaf pages

Each page contains many index entries.


The Structure of a MongoDB B-Tree

A B-Tree is a balanced search tree.

Structure:

Root node
Internal nodes
Leaf nodes

Visual example:

              ROOT
           [ delivered ]
           /          \
      cancelled      pending

In real production systems the tree may contain thousands of nodes.


Leaf Nodes: Where Index Data Lives

Leaf nodes store the actual index entries.

Example leaf page:

cancelled β†’ doc pointer
delivered β†’ doc pointer
delivered β†’ doc pointer
delivered β†’ doc pointer
pending   β†’ doc pointer

Each entry contains:

indexed field value
+
pointer to the document

Because entries are sorted, searching becomes extremely fast.


Step-by-Step: How MongoDB Searches the Index

Query:

db.orders.find({ status: "delivered" })

Step 1 β€” Start at the root

ROOT
[ delivered ]

MongoDB compares the search key with root entries.


Step 2 β€” Traverse internal nodes

         ROOT
      [ delivered ]
       /         \
 < delivered   β‰₯ delivered

The tree guides MongoDB toward the correct branch.


Step 3 β€” Reach leaf node

Eventually MongoDB finds the leaf node containing the key.

Example:

delivered β†’ doc pointer
delivered β†’ doc pointer
delivered β†’ doc pointer

Step 4 β€” Fetch documents

MongoDB retrieves the documents using the stored pointers.

This avoids scanning unrelated data.


The Hidden Problem With Single Indexes

Single-field indexes are not always enough.

Consider this admin dashboard query:

db.orders.find({
  status: "delivered"
}).sort({ created_at: -1 }).limit(20)

What happens internally?

  1. Find all "delivered" entries

  2. Sort them by date

  3. Return latest 20

But remember:

delivered orders = 420 million

Even with an index, MongoDB still needs to process hundreds of millions of entries.


Compound Index to the Rescue

Solution:

db.orders.createIndex({
  status: 1,
  created_at: -1
})

Now the index stores entries like:

(delivered, Mar 10) β†’ pointer
(delivered, Mar 9) β†’ pointer
(delivered, Mar 8) β†’ pointer
(pending, Mar 10) β†’ pointer
(cancelled, Mar 4) β†’ pointer

Data is sorted by:

status β†’ created_at

Now the query becomes extremely efficient.

db.orders.find({
  status: "delivered"
})
.sort({ created_at: -1 })
.limit(20)

MongoDB can:

jump directly to delivered
read newest entries
stop after 20 results

No sorting required.


Understanding the Left-Prefix Rule

Compound indexes follow a rule called the left-prefix rule.

If the index is:

{ status: 1, created_at: -1 }

It supports queries like:

status
status + created_at

But not efficiently:

created_at only

Because the index is sorted first by status.


What Happens When an Index Page Fills

Index pages have limited size.

When a page becomes full, MongoDB performs a page split.

Before split:

PAGE A
cancelled
delivered
delivered
delivered
pending

After split:

PAGE A           PAGE B
cancelled        delivered
delivered        delivered
                 pending

Parent nodes are updated with new pointers.


How the Tree Grows

After many inserts, the tree grows.

Example structure:

                ROOT
           [ delivered ]
           /           \
      INTERNAL       INTERNAL
       /     \        /     \
    LEAF   LEAF   LEAF   LEAF

Despite growing larger, the tree remains balanced.

This ensures predictable performance.


Time Complexity

B-Tree search complexity:

O(log n)

Even with 500 million documents, MongoDB may only read:

3-4 index pages

to locate matching documents.


Index Design in Real Production Systems

Large systems often maintain indexes like:

User order history:

{ user_id: 1, created_at: -1 }

Latest delivered orders:

{ status: 1, created_at: -1 }

Fast product analytics:

{ product_id: 1, created_at: -1 }

These power:

user dashboards
admin analytics
order history
real-time monitoring

The Most Important Lesson

Indexes are not just about adding fields.

They are about matching how your application queries data.

A poorly designed index may still scan millions of records.

But a well-designed compound index can reduce query time from:

seconds β†’ milliseconds

Final Thoughts

Understanding how MongoDB indexes actually work internallyβ€”including B-Trees, disk pages, compound indexes, and page splitsβ€”gives engineers a huge advantage when building scalable systems.

Many database performance problems are not caused by infrastructure.

They are caused by misunderstanding how indexes organize data.

Once you understand the internals, database performance stops feeling mysterious and starts becoming predictable.