10

I am trying to dump a collection to .json file but after looking in pymongo tutorial I can not find any thing that relates to it.

Tutorial link: https://api.mongodb.com/python/current/tutorial.html

Dharman
  • 26,923
  • 21
  • 73
  • 125
AnhNg
  • 119
  • 1
  • 1
  • 7
  • https://docs.mongodb.com/manual/reference/program/mongoexport/ – Alex Blex Mar 07 '18 at 13:33
  • Does this answer your question? [PyMongo/Mongoengine equivalent of mongodump](https://stackoverflow.com/questions/24610484/pymongo-mongoengine-equivalent-of-mongodump) – Phoenix Dec 14 '19 at 09:59

5 Answers5

11

Just get all documents and save them to file e.g.:

from bson.json_util import dumps
from pymongo import MongoClient

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        file.write('[')
        for document in cursor:
            file.write(dumps(document))
            file.write(',')
        file.write(']')
swateek
  • 5,869
  • 8
  • 33
  • 46
kamillitw
  • 404
  • 3
  • 6
  • How to save them to file exactly? – AnhNg Mar 07 '18 at 14:52
  • @AnhNg I've added an example, take a look. – kamillitw Mar 08 '18 at 08:52
  • 11
    This gives errors like this: `TypeError: Object of type 'ObjectId' is not JSON serializable` – JCGB Sep 10 '19 at 10:35
  • 2
    Got the same `TypeError`. You can solve it by replacing `file.write(json.dumps(document))` by importing `from bson.json_util import dumps` and replacing the line by `file.write(dumps(document))` [Learn more](https://stackoverflow.com/questions/16586180/typeerror-objectid-is-not-json-serializable) – darorck May 05 '20 at 22:21
  • 3
    This actually produces an invalid JSON because of the last `file.write(',')` before `file.write(']')` will result in the of file `,]` which is invalid. – garyj Oct 02 '21 at 02:55
5

The accepted solution produces an invalid JSON. It results in trailing comma , before the close square bracket ]. The JSON spec does not allow trailing commas. See this answer and this reference.

To build on the accepted solution I used the following:

from bson.json_util import dumps
from pymongo import MongoClient
import json

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        json.dump(json.loads(dumps(cursor)), file)
garyj
  • 1,054
  • 2
  • 10
  • 19
2

Here's another way of not saving a , before the closing square brackets. Also using with open to save some space.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
type_documents_count = db['cluster'].count_documents(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    # Start from one as type_documents_count also starts from 1.
    for i, document in enumerate(type_documents, 1):
        file.write(json.dumps(document, default=str))
        if i != type_documents_count:
            file.write(',')
    file.write(']')

It basically doesn't write the comma if number of iterations is equal to the number of documents (which is the last document it saves).

robscott
  • 81
  • 7
1

Complementing @kamilitw I use length of cursor to make a JSON file correctly. I use count() and if-else:

def writeToJSONFile(collection):
    cursor = collection.find({})
    file = open("collection.json", "w")
    file.write('[')
    qnt_cursor = 0
    for document in cursor:
        qnt_cursor += 1
        num_max = cursor.count()
        if (num_max == 1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
        elif (num_max >= 1 and qnt_cursor <= num_max-1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
            file.write(',')
        elif (qnt_cursor == num_max):
            file.write(json.dumps(document, indent=4, default=json_util.default))
    file.write(']')
    return file

So the JSON file will be correct in the and, because before as writing like this: [{"test": "test"},], now it's writing: [{"test":"test1"},{"test":"test2"}]

Z4-tier
  • 6,125
  • 2
  • 25
  • 39
0
"""
@Author: Aseem Jain
@profile: https://www.linkedin.com/in/premaseem/

"""
import os
import pymongo

# configure credentials / db name
db_user = os.environ["MONGO_ATLAS_USER"]
db_pass = os.environ["MONGO_ATLAS_PASSWORD"]
db_name = "sample_mflix"

connection_string = f"mongodb+srv://{db_user}:{db_pass}@sharedcluster.lv3wx.mongodb.net/{db_name}?retryWrites=true&w=majority"

client = pymongo.MongoClient(connection_string)
db = client[db_name]

# create database back directory with db_name
os.makedirs(db_name, exist_ok=True)

# list all tables in database
tables = db.list_collection_names()

# dump all tables in db
for table in tables:
    print("exporting data for table", table )
    data = list(db[table].find())
    # write data in json file
    with open(f"{db.name}/{table}.json","w") as writer:
        writer.write(str(data))

exit(0)
Aseem Jain
  • 165
  • 1
  • 7