in

How to Use $lookup in MongoDB

default image

MongoDB is a popular NoSQL database that stores data in JSON-like documents. These documents are stored in collections, which are similar to tables in relational databases.

A key aspect of working with any database is the ability to query and retrieve data. MongoDB provides a powerful aggregation framework that allows complex data processing and analytics directly on the database server.

The aggregation framework consists of stages that process documents as they pass through the pipeline. One of the most useful stages for analytics and joining data is the $lookup stage.

What is $lookup?

The $lookup stage performs a left outer join between two collections in MongoDB. This allows documents from one "left" collection to be enriched with data from a "right" collection.

For example, consider two collections – orders and customers:

orders:
{
  _id: 1,
  customerId: 123,
  items: [...],
  total: 99 
}

customers: 
{
  _id: 123,
  name: ‘John Doe‘,
  email: ‘[email protected]‘
}

Using $lookup, we can embed the customer document inside each order, providing easy access to joined data:

{
  _id: 1,
  customerId: 123,  
  items: [...],
  total: 99
  customer: {    
    _id: 123,
    name: ‘John Doe‘,
    email: ‘[email protected]‘ 
  }
}

The joined data is embedded directly within the source document, avoiding complex application-side joins.

When to Use $lookup

$lookup is useful when you need to access or aggregate data from related collections. Some examples include:

  • Embedding data from another collection for easier access in application code. This avoids complex joins in your code.

  • Aggregating data from multiple collections, such as total order value per customer.

  • Looking up additional data based on foreign keys, such as finding product details for each item in an order.

  • Joining collections to incorporate additional filtering, sorting, limits, etc.

In summary, anytime you need to access or process data from more than one collection, $lookup provides a simple way to join the data right in MongoDB.

$lookup Syntax and Parameters

The syntax for the $lookup stage is:

{
  $lookup: {
    from: <collection>,
    localField: <field from input document>, 
    foreignField: <field from "from" collection>,
    as: <output array field>  
  }
}

It takes the following parameters:

  • from – The target collection to join
  • localField – The field from the input documents to match
  • foreignField – The field from the from collection to match
  • as – The name of the output array field that will contain the joined documents

For example, to join orders with customers on customerId:

{
  $lookup: {
    from: "customers",
    localField: "customerId",
    foreignField: "_id",
    as: "customerDoc"
  }
} 

This will add a customerDoc field to each input document, containing an array with the matched document from customers.

$lookup Examples

Let‘s look at some examples of using $lookup for some common use cases:

Joining Orders and Customers

Given these collections:

orders:
{
  _id: 1,
  customerId: 123,
  items: [...],
  total: 99
}

customers:
{
  _id: 123,
  name: ‘John Doe‘
}

We can join the data as follows:

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customerDoc"  
    }
  }
]) 

The result will be:

{
  _id: 1,
  customerId: 123,
  items: [...],
  total: 99,
  customerDoc: [    
    {
      _id: 123,
      name: ‘John Doe‘
    }
  ]
}

Embedding Product Details

If we have a products collection, we can also embed product details within each order item:

orders:
{
  items: [
    {productId: 456, qty: 2},
    {productId: 457, qty: 1}  
  ]
}

products: 
{
  _id: 456,
  name: ‘T-Shirt‘
  price: 19.99
}

Join the data as follows:

db.orders.aggregate([
  { 
    $unwind: "$items" 
  },
  {
    $lookup: {
      from: "products",
      localField: "items.productId",  
      foreignField: "_id",
      as: "productDetails"    
    }
  } 
])

The result will be:

{
  items: {
    productId: 456,
    qty: 2,
    productDetails: [ {...} ] 
  }  
}

Lookup with Matching

We can filter the joined data by adding a $match stage before $lookup:

db.orders.aggregate([
  { 
    $match: {
      "items.qty": { $gte: 2 }  
    }
  },
  { 
    $lookup: {...}
  }
])

This will only join data for orders where an item has qty >= 2.

Multiple Lookups

To perform multiple joins, simply add additional $lookup stages:

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",  
      as: "customerDoc"
    }
  },
  { 
    $unwind: "$items"
  }, 
  {
    $lookup: {
      from: "products",
      as: "productDetails" 
    }
  }
])

This allows joining data from many collections in one query.

Lookup Optimization

There are a few things to keep in mind when using $lookup for optimum performance:

  • Index the join fields – Make sure the localField and foreignField are indexed to allow efficient lookups.

  • Perform filtering early – Add $match and $sort before $lookup to filter documents earlier.

  • Limit unneeded data – Use $project after $lookup to only retain needed fields.

  • Denormalize where possible – For frequently joined data, denormalize by embedding related data to avoid joins.

With proper indexing and stage optimization, $lookup can perform well even for large datasets.

When Not To Use $lookup

While $lookup is very useful, it is not required for all cases. Some situations where you may want to avoid it:

  • When related data needs to be queried independently – Embedding data duplicates data across documents.

  • For high-volume writes – Embedded data needs to be updated in multiple places.

  • When joins are infrequent – Lookup on query may be easier than always embedding.

  • To avoid large documents – Embedded data increases document size.

For these cases, denormalizing by linking documents using references or DBRefs may be more optimal.

Summary

To recap:

  • $lookup performs a left outer join between two collections in MongoDB

  • It allows embedding documents from one collection into another for easy access

  • This avoids complex application-side joins and allows aggregating data across collections

  • Make sure to index join fields and optimize pipeline stages for performance

  • Avoid using $lookup when embedding data is not optimal

$lookup is a powerful tool for analytics and joining related data right within your MongoDB database. Properly leveraging it will help you better work with your document data.

Written by