What's Mongo Data Federation?

MongoDB Data Federation is a feature of MongoDB Atlas that lets you query data across multiple data sources as if they were one MongoDB database—without moving or duplicating the data.

Think of it as a virtual MongoDB layer on top of different storage systems.

What problem does it solve?

Normally, MongoDB queries only work on data stored inside MongoDB clusters.
Data Federation lets you run MongoDB Query Language (MQL) on:

MongoDB Atlas clusters
Data stored in cloud object storage (AWS S3, Azure Blob Storage, Google Cloud Storage)
Multiple clusters and storage sources at once

👉 No ETL (Extract – Transform – Load). No data replication.

MongoDB Atlas cluster with Data Federation - S3

H1 - MongoDB Data Federation

Key features

1. Query across multiple sources

Join Atlas collections with S3 data
Combine historical cold data + hot operational data

2. Schema-on-read

No need to predefine schema
MongoDB infers structure at query time

3. Cost-efficient analytics

Keep old data in cheap object storage
Query only when needed

4. Read-only (important)

Data Federation is query-only
You cannot write/update/delete data through it

Import Data from S3 bucket to MongoDB Atlas cluster Step by Step.

H2 - Import S3 data to MongoDB with Data Federation

Goal: Import multiple files `.json` / `.json.gz` from S3 → Atlas Cluster.

Case Study: Import files s3://import-bucket/data/user-devices/filexxxx.json.gz to MongoDB Atlas

collection "user-devices".

STEP 1 — Create a Federated Database Instance

H3 - Create a Federated Database Instance

Go to Atlas Console Cluster Project

Left Menu → Data Federation

Click “Create a Federated Database Instance”

Select Set up manually

Config Cloud provider & Data Source choose AWS provider, and input a instance name
for ex: "FederatedDatabaseInstance0" and add a Data Source.

H4 - Config cloud provider and data source

Click "Next" to Add Atlas to the trust relationships of your AWS IAM role:
You will see 2 fields:

Atlas AWS account ARN: #yourAccountARN
Your unique external ID: #yourExternalID

We will use these fields to create a AWS Role. It describes the trust relationships that allows Atlas to assume your new AWS IAM role and policy S3 access.

Create IAM Policy

File: s3-mongo-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::<your bucket>"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": ["arn:aws:s3:::<your bucket>/*"]
    }
  ]
}

Create policy use AWS CLI:

aws iam create-policy \
  --policy-name MongoAtlasDataFederationS3 \
  --policy-document file://s3-mongo-policy.json

Create IAM Role for Atlas

File: atlas-trust.json

  {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "#yourAccountARN"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "#yourExternalID"
}
}
}
]    
}

Create IAM Role use AWS CLI:


  aws iam create-role \
  --role-name MongoAtlasFederationRole \
  --assume-role-policy-document file://atlas-trust.json

Attach policy:



aws iam attach-role-policy \
  --role-name MongoAtlasFederationRole \
  --policy-arn arn:aws:iam::<YOUR_AWS_ID>:policy/MongoAtlasDataFederationS3

After the atlas-trust role created successfully, we will get a rolr ARN and use to access AWS S3 for next step.

Config S3 Data Sourece Federated Database Instance

Enter your bucket name and prefix path.

Finally we have a json file config like this:

{
  "databases": [
    {
      "collections": [
        {
          "name": "user-devices",
          "dataSources": [
            {
              "path": "/",
              "storeName": "s3_store"
            }
          ]
        }
      ],
      "name": "your-federation-db", //your federationDB
      "views": []
    }
  ],
  "stores": [
    {
      "bucket": "import-bucket",
      "delimiter": "/",
      "name": "s3_store",
      "prefix": "data/user-devices/", #prefix path to json / json.gz files
      "provider": "s3",
      "region": "us-east-1"
    }
  ]
}

Now we config successfully we can connect to the Federation Instance by a uri connection and query to verify data.

db.getCollection("user-devices").countDocuments();

STEP 2 — Run pipeline from Federated collection (S3) import to Cluster DB

Connect to Federated Instance and run query

Import $out: replace all collection data

db.getCollection("user-devices").aggregate([

{ $out: { atlas: {

clusterName: "your-cluster-name",

db: "your-real-db",

coll: "your-real-collection"

}} }

]);

Import $merge: merge / upsert collection data

  
  db.getCollection("user-devices").aggregate([
  { $project: { _id: 0 } },

  // set a customId is _id
  { $set: { _id: "$customId" }},

  // remove customId use _id 
  { $unset: "customId" },

  // merge vào cluster khác
  {
    $merge: {
      into: {
        atlas: {
          clusterName: "your-cluster-name",
          db: "your-real-db",
          coll: "your-real-collection"
        }
      },
      on: "_id",              // phải dùng _id
      whenMatched: "replace", // replace toàn document
      whenNotMatched: "insert"
    }
  }
],
{
  pipelineOptions: { batchSize: 100 }
});

Facebook SDK (Plugin)

What's Mongo Data Federation? How to Import Data from AWS S3 bucket to MongoDB Atlas cluster with Data Federation

What's Mongo Data Federation?

What problem does it solve?

Key features

Import Data from S3 bucket to MongoDB Atlas cluster Step by Step.

Goal: Import multiple files `.json` / `.json.gz` from S3 → Atlas Cluster.

STEP 1 — Create a Federated Database Instance

Create IAM Policy

Create IAM Role for Atlas

STEP 2 — Run pipeline from Federated collection (S3) import to Cluster DB

Post a Comment

Hỗ trợ

Dịch Vụ

#buttons=(Accept !) #days=(20)

Contact form

Facebook SDK (Plugin)

What's Mongo Data Federation? How to Import Data from AWS S3 bucket to MongoDB Atlas cluster with Data Federation

What's Mongo Data Federation?

What problem does it solve?

Key features

Import Data from S3 bucket to MongoDB Atlas cluster Step by Step.

Goal: Import multiple files .json / .json.gz from S3 → Atlas Cluster.

STEP 1 — Create a Federated Database Instance

Create IAM Policy

Create IAM Role for Atlas

STEP 2 — Run pipeline from Federated collection (S3) import to Cluster DB

You may like these posts

Post a Comment

#buttons=(Accept !) #days=(20)

Contact form

Goal: Import multiple files `.json` / `.json.gz` from S3 → Atlas Cluster.