we're using Google's Firestore for embedded machine configuration data. Because this data controls a configurable pageflow and lots of other things, it's segmented up into lots of subcollections. Each machine has it's own top level document in this system. However, it takes forever when we go to add machines to the fleet because we have to manually copy over all this data in multiple documents. Does anyone know how to go about recursively copying a Firestore document, all it's subcollections, their documents, subcollections, etc in Python. You'd have a document ref to the top level as well as a name for the new top level doc.
Asked
Active
Viewed 357 times
1
-
Hi there, can you elaborate more on how you are updating all the documents of your firestore? Like how you have structured your data.For example, is your data structured to use lookups? [1]Are you updating all the documents within a single procedure? If so have you tried decoupling it with the help of Cloud Firestore function triggers?[2] With these you can define asynchronous functions that listen on document changes and divide the computing workload. [1] https://www.youtube.com/watch?v=i1n9Kw3AORw&t=438s [2] https://firebase.google.com/docs/functions/firestore-events#function_triggers – Antonio Ramirez May 05 '21 at 21:30
1 Answers
1
You can use something like this to recursively read and write from a collection to another one:
def read_recursive(
source: firestore.CollectionReference,
target: firestore.CollectionReference,
batch: firestore.WriteBatch,
) -> None:
global batch_nr
for source_doc_ref in source:
document_data = source_doc_ref.get().to_dict()
target_doc_ref = target.document(source_doc_ref.id)
if batch_nr == 500:
log.info("commiting %s batched operations..." % batch_nr)
batch.commit()
batch_nr = 0
batch.set(
reference=target_doc_ref,
document_data=document_data,
merge=False,
)
batch_nr += 1
for source_coll_ref in source_doc_ref.collections():
target_coll_ref = target_doc_ref.collection(source_coll_ref.id)
read_recursive(
source=source_coll_ref.list_documents(),
target=target_coll_ref,
batch=batch,
)
batch = db_client.batch()
read_recursive(
source=db_client.collection("src_collection_name"),
target=db_client.collection("target_collection_name"),
batch=batch,
)
batch.commit()
Writes are in batches and this saves a lot of time (in my case it finished in half the time compared with set).
cristi
- 1,836
- 1
- 20
- 29