I am trying to export part of an XML file to a multi-level DataFrame because I find it more convenient to work with. An exemple of the file would be :
<file filename="stack_example" created="today">
<unit time="day" volume="cm3" surface="cm2"/>
<zone z_id="10">
<surfacehistory type="calculation">
<surfacedata time-begin="1" time-end="2">
<thing identity="1">
<location l-identity="2"> 1.256</location>
<location l-identity="45"> 2.3</location>
</thing>
<thing identity="3">
<location l-identity="2"> 1.6</location>
<location l-identity="4"> 2.5</location>
<location l-identity="17"> 2.4</location>
</thing>
</surfacedata>
<surfacedata time-begin="2" time-end="3">
<thing identity="1">
<location l-identity="78"> 3.2</location>
</thing>
<thing identity="5">
<location l-identity="2"> 1.7</location>
<location l-identity="7"> 4.5</location>
</thing>
</surfacedata>
</surfacehistory>
</zone>
</file>
The ideal output from this example would be a Pandas Dataframe similar to this :
time-begin time-end thing location surface
1 2 1 2 1,256
45 2,3
3 2 1,6
4 2,5
17 2,4
2 3 1 78 3,2
5 2 1,7
7 4,5
Here is the current code I wrote :
import pandas as pd
from bs4 import BeautifulSoup
import lxml
datas = open("stack_example.xml","r")
doc = BeautifulSoup(datas.read(), "lxml")
doc.unit.get("surface")
l = []
temp={}
surfacedatas = doc.surfacehistory.find_all("surfacedata")
for surfacedata in surfacedatas:
time_begin = surfacedata.get("time-begin")
time_end = surfacedata.get("time-end")
temp["time_begin"]=[time_begin]
temp["time_end"]=[time_end]
things = surfacedata.find_all("thing", recursive=False)
for thing in thingss:
identity = thing.get("identity")
temp["thing"]=[identity]
locations = thing.find_all("location", recursive=False)
for location in locations:
l_identity = location.get("l-identity")
surface = location.getText()
temp["surface"]=[surface]
temp["location"]=[l_identity]
l.append(pd.DataFrame(temp))
res = pd.concat(l, ignore_index=True).fillna(0.)
It only gets the last location of all things because the location gets refreshed in the loop, but I'm not sure on how to achieve desired result from this point.