I'm trying to build a binary tree using the anytree (https://anytree.readthedocs.io/en/latest/) module using a dataframe that has items from Column A in the string text within column B:
| | Column A | Column B |
|---|-------------|----------------------------|
| 0 | foo | prelim_foo |
| 1 | bar | nz(0, prelim_bar) |
| 2 | beyond | iif(foo>1, foo - bar, bar) |
| 3 | recognition | bar - beyond |
I want to create a list from Column A based on whether or not any item from Column A is present in column B, desired output is something like:
| | Column A | Column B | Column C |
|---|-------------|----------------------------|---------------|
| 0 | foo | prelim_foo | [foo] |
| 1 | bar | nz(0, prelim_bar) | [bar] |
| 2 | beyond | iif(foo>1, foo - bar, bar) | [foo, bar] |
| 3 | recognition | bar - beyond | [bar, beyond] |
I referenced these articles (Read data from a file and create a tree using anytree in python, Read data from a pandas DataFrame and create a tree using anytree in python) to create a preliminary tree node structure however I'm having trouble distilling the contents of Column B down into usable nodes for branches beyond the 2nd level.
I can detect if Column B contains an item from Column A:
df['AinB'] = df['Column B'].str.contains('|'.join(df['Column A']), case=False)
but cannot find a way to look upwards in the series that is Column A to place in a python list on the same row as Column B.
Ultimately I'd like to use these lists to build a tree similar to this:
foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
└── recognition
or maybe I'm not thinking properly about where recognition fits into a Parent/Child node structure and it should be organized like this:
foo
├── foo
bar
├── bar
beyond
├── foo
└── bar
recognition
├── bar
└── beyond