MongoDB is a popular NoSQL database that stores data in JSON-like documents. These documents are stored in collections, which are similar to tables in relational databases.
A key aspect of working with any database is the ability to query and retrieve data. MongoDB provides a powerful aggregation framework that allows complex data processing and analytics directly on the database server.
The aggregation framework consists of stages that process documents as they pass through the pipeline. One of the most useful stages for analytics and joining data is the $lookup
stage.
What is $lookup?
The $lookup
stage performs a left outer join between two collections in MongoDB. This allows documents from one "left" collection to be enriched with data from a "right" collection.
For example, consider two collections – orders
and customers
:
orders:
{
_id: 1,
customerId: 123,
items: [...],
total: 99
}
customers:
{
_id: 123,
name: ‘John Doe‘,
email: ‘[email protected]‘
}
Using $lookup
, we can embed the customer
document inside each order
, providing easy access to joined data:
{
_id: 1,
customerId: 123,
items: [...],
total: 99
customer: {
_id: 123,
name: ‘John Doe‘,
email: ‘[email protected]‘
}
}
The joined data is embedded directly within the source document, avoiding complex application-side joins.
When to Use $lookup
$lookup
is useful when you need to access or aggregate data from related collections. Some examples include:
-
Embedding data from another collection for easier access in application code. This avoids complex joins in your code.
-
Aggregating data from multiple collections, such as total order value per customer.
-
Looking up additional data based on foreign keys, such as finding product details for each item in an order.
-
Joining collections to incorporate additional filtering, sorting, limits, etc.
In summary, anytime you need to access or process data from more than one collection, $lookup
provides a simple way to join the data right in MongoDB.
$lookup Syntax and Parameters
The syntax for the $lookup
stage is:
{
$lookup: {
from: <collection>,
localField: <field from input document>,
foreignField: <field from "from" collection>,
as: <output array field>
}
}
It takes the following parameters:
from
– The target collection to joinlocalField
– The field from the input documents to matchforeignField
– The field from thefrom
collection to matchas
– The name of the output array field that will contain the joined documents
For example, to join orders
with customers
on customerId
:
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerDoc"
}
}
This will add a customerDoc
field to each input document, containing an array with the matched document from customers
.
$lookup Examples
Let‘s look at some examples of using $lookup
for some common use cases:
Joining Orders and Customers
Given these collections:
orders:
{
_id: 1,
customerId: 123,
items: [...],
total: 99
}
customers:
{
_id: 123,
name: ‘John Doe‘
}
We can join the data as follows:
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerDoc"
}
}
])
The result will be:
{
_id: 1,
customerId: 123,
items: [...],
total: 99,
customerDoc: [
{
_id: 123,
name: ‘John Doe‘
}
]
}
Embedding Product Details
If we have a products collection, we can also embed product details within each order item:
orders:
{
items: [
{productId: 456, qty: 2},
{productId: 457, qty: 1}
]
}
products:
{
_id: 456,
name: ‘T-Shirt‘
price: 19.99
}
Join the data as follows:
db.orders.aggregate([
{
$unwind: "$items"
},
{
$lookup: {
from: "products",
localField: "items.productId",
foreignField: "_id",
as: "productDetails"
}
}
])
The result will be:
{
items: {
productId: 456,
qty: 2,
productDetails: [ {...} ]
}
}
Lookup with Matching
We can filter the joined data by adding a $match
stage before $lookup
:
db.orders.aggregate([
{
$match: {
"items.qty": { $gte: 2 }
}
},
{
$lookup: {...}
}
])
This will only join data for orders where an item has qty >= 2
.
Multiple Lookups
To perform multiple joins, simply add additional $lookup
stages:
db.orders.aggregate([
{
$lookup: {
from: "customers",
as: "customerDoc"
}
},
{
$unwind: "$items"
},
{
$lookup: {
from: "products",
as: "productDetails"
}
}
])
This allows joining data from many collections in one query.
Lookup Optimization
There are a few things to keep in mind when using $lookup
for optimum performance:
-
Index the join fields – Make sure the
localField
andforeignField
are indexed to allow efficient lookups. -
Perform filtering early – Add
$match
and$sort
before$lookup
to filter documents earlier. -
Limit unneeded data – Use
$project
after$lookup
to only retain needed fields. -
Denormalize where possible – For frequently joined data, denormalize by embedding related data to avoid joins.
With proper indexing and stage optimization, $lookup
can perform well even for large datasets.
When Not To Use $lookup
While $lookup
is very useful, it is not required for all cases. Some situations where you may want to avoid it:
-
When related data needs to be queried independently – Embedding data duplicates data across documents.
-
For high-volume writes – Embedded data needs to be updated in multiple places.
-
When joins are infrequent – Lookup on query may be easier than always embedding.
-
To avoid large documents – Embedded data increases document size.
For these cases, denormalizing by linking documents using references or DBRefs may be more optimal.
Summary
To recap:
-
$lookup
performs a left outer join between two collections in MongoDB -
It allows embedding documents from one collection into another for easy access
-
This avoids complex application-side joins and allows aggregating data across collections
-
Make sure to index join fields and optimize pipeline stages for performance
-
Avoid using
$lookup
when embedding data is not optimal
$lookup
is a powerful tool for analytics and joining related data right within your MongoDB database. Properly leveraging it will help you better work with your document data.