0

I'm trying to build a binary tree using the anytree (https://anytree.readthedocs.io/en/latest/) module using a dataframe that has items from Column A in the string text within column B:

|   | Column A    |          Column B          |
|---|-------------|----------------------------|
| 0 | foo         | prelim_foo                 |
| 1 | bar         | nz(0, prelim_bar)          |
| 2 | beyond      | iif(foo>1, foo - bar, bar) |
| 3 | recognition | bar - beyond               |

I want to create a list from Column A based on whether or not any item from Column A is present in column B, desired output is something like:

|   | Column A    |          Column B          |  Column C     |
|---|-------------|----------------------------|---------------|
| 0 | foo         | prelim_foo                 | [foo]         |
| 1 | bar         | nz(0, prelim_bar)          | [bar]         |
| 2 | beyond      | iif(foo>1, foo - bar, bar) | [foo, bar]    |
| 3 | recognition | bar - beyond               | [bar, beyond] |

I referenced these articles (Read data from a file and create a tree using anytree in python, Read data from a pandas DataFrame and create a tree using anytree in python) to create a preliminary tree node structure however I'm having trouble distilling the contents of Column B down into usable nodes for branches beyond the 2nd level.

I can detect if Column B contains an item from Column A:

df['AinB'] = df['Column B'].str.contains('|'.join(df['Column A']), case=False)

but cannot find a way to look upwards in the series that is Column A to place in a python list on the same row as Column B.

Ultimately I'd like to use these lists to build a tree similar to this:

foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
     └── recognition

or maybe I'm not thinking properly about where recognition fits into a Parent/Child node structure and it should be organized like this:

foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
recognition
├── bar
└── beyond

0 Answers0