A (really) fast way to write in a file using python

Question

I'm facing an issue where I'm trying to save some datas (str and array) in a json file (or xml I still don't really know) in a really fast way.

To better understand let assume that I'm running this deep learning code using opencv and yolov4:

https://gist.github.com/YashasSamaga/e2b19a6807a13046e399f4bc3cca3a49

this will give me position np.array([x, y, w, h]) of differents str(labels) associated to a certain float(weight)

I want to save in a file, for each frame, the exact position, weight and label for everything present on there without affecting performances and which will works for a video longer than one hour.

I tryied this:

import cv2
import xml.etree.cElementTree as ET

# create root for xml
root = ET.Element("root")

# create iteration counter
nbriteration = 0

while video_is_running:
    doc = ET.SubElement(root,'frame', Frame=str(nbriteration))
    
    # Use opencv yolov4 code here which catch every detected_elements (contain position, label and weight)
    for element in detected_elements:
        # "save" each element in the XML using the branch related to the frame
        ET.SubElement(doc, '%s'%element.label, x = str(element.pos_x), y = str(element.pos_y), weight = str(element.weight))
    nbriteration += 1

# write xml
tree = ET.ElementTree(root))
tree.write(r"D:\output\filename.xml")

# pretty display for notepad++ (I don't mind for the time taken here)
doc = etree.parse(r"D:\output\filename.xml")
et = etree.tostring(doc, encoding='UTF-8', xml_declaration=True, pretty_print=True)
doc.write(r"D:\output\filename.xml", pretty_print=True)

The problem is, for a long video, it'll affect the performances of my program.

I took a look at here or here but I'm in both cases : write a lot of datas as fast as possible for real time application.

Is there a better way to write all this data without affecting performance?

"I have several questions which are all related"—[please ask only one question at a time](https://meta.stackexchange.com/q/39223/248627). — Chris, Apr 24 '21 at 21:43
Performance-optimized file formats are generally specialized. See [msgpack](https://msgpack.org/index.html) for a schemaless example. — Charles Duffy, Apr 24 '21 at 21:47
...that said, numpy arrays have their own optimized serialization and deserialization code, and you should be using that instead of XML or JSON when you care more about performance and compactness than about ease of implementing consumers in non-Python languages. https://numpy.org/doc/stable/reference/generated/numpy.save.html is a place you might start. — Charles Duffy, Apr 24 '21 at 21:48
Also, if your data is small enough that you can afford to build it into one big in-memory ElementTree structure, that means it should also be small enough to store it as native Python data structures (lists/dicts/etc) and not do any serialization at all until after you've finished all the analysis. — Charles Duffy, Apr 24 '21 at 21:58
BTW, if you want to quickly hand off your data for a separate process to serialize, https://stackoverflow.com/questions/34824382/sharing-numpy-arrays-between-multiple-processes-without-inheritance or https://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing might be of interest to you. — Charles Duffy, Apr 24 '21 at 22:04
The problem here, is I have, sometimes, more than 50 elements to write per frame, It's a mixing between array(s), float and str but I'll take a look at np.save, thanks for this! As I initially mentionned, I'm not in love with XML which is heavy in some way but I know how to code it and it's easy to read it with notepad++. — Panda50, Apr 24 '21 at 22:04
Personally, btw, I don't use Python for anything where performance is important. I'd probably consider OpenCV's Julia bindings if I wanted to do something that had to be fast with its output. — Charles Duffy, Apr 24 '21 at 22:07
I tryied to serialize using ```ThreadPool``` but it wasn't really good... I'll take a look at what you gave me. Thanks a lot Charles and sorry for this 'messy'. I'd love to write it in C++ but I have to use python (not my choice) — Panda50, Apr 24 '21 at 22:08

A (really) fast way to write in a file using python

0 Answers0