0

There are a lot of related questions in SO. But the problem I'm trying to solve looks a bit different.

The requirements:

  1. A large xml input string with unknown data structure
  2. with a unknown number of children
  3. the elements have no attributes
  4. flatten to a csv file with the element names as column names

An example input looks like this:

<ARandomRoot>
  <ARandomLOne>
    <Id>12</Id>
    <OtherId>34</OtherId>    
  </ARandomLOne>
  <AnotherRandomLOne>
    <ARandomLTwo>
      <ARandomLTree>
        <NumberOfElements>2</NumberOfElements>
        <ARandomLFour>
          <RandomDataOne>R1</RandomDataOne>
          <RandomDataTwo>10.12</RandomDataTwo>          
        </ARandomLFour>
        <ARandomLFour>
          <RandomDataOne>R2</RandomDataOne>
          <RandomDataTwo>9.8</RandomDataTwo>          
        </ARandomLFour>
      </ARandomLTree>
    </ARandomLTwo>
  </AnotherRandomLOne>
</ARandomRoot>

And the output should be:

ARandomRoot-ARandomLOne-Id,ARandomRoot-ARandomLOne-OtherId,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-NumberOfElements,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-ARandomLFour-RandomDataOne,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-ARandomLFour-RandomDataTwo
12,34,2,R1,10.12
12,34,2,R2,9.8

Code I've got so far from another SO question with minor change

        var xml = XDocument.Parse(input);

        Func<string, string> csvFormat = t => String.Format("\"{0}\"", t.Replace("\"", "\"\""));

        Func<XDocument, IEnumerable<string>> getFields =
            xd =>
                xd
                    .Descendants()
                    .SelectMany(d => d.Elements())
                    .Select(e => e.Name.ToString());

        Func<XDocument, IEnumerable<string>> getHeaders =
            xd =>
                xd
                    .Descendants()
                    .SelectMany(d => d.Elements())
                    .Select(e => e.Name.ToString())
                    .Distinct();

        var headers =
            String.Join(",",
                getHeaders(xml)
                    .Select(f => csvFormat(f)));

        var query =
            from elements in xml.Descendants()
            select string.Join(",",
                getFields(xml)
                    .Select(f => elements.Elements(f).Any()
                        ? elements.Element(f).Value
                        : "")
                    .Select(x => csvFormat(x)));

        var csv =
            String.Join(Environment.NewLine,
                new[] { headers }.Concat(query));

While this produces the desired headers, but the data is not flattened.

Can someone please point me to the correct direction?

Allen Zhang
  • 2,132
  • 1
  • 19
  • 30
  • It looks like you need a kind of combination of https://stackoverflow.com/questions/8795479/using-linq-to-xml-method-to-get-path-to-all-leaves and https://stackoverflow.com/questions/451950/get-the-xpath-to-an-xelement – Renat May 28 '19 at 00:07
  • @Renat, thanks for the comment. A dictionary produces the output all in one line. But I need a pivot table to be the output, i.e. with duplicated values – Allen Zhang May 28 '19 at 01:19

0 Answers0