I have 2 columns which are in a dataframe and need to be converted to a bipartite graph: the entries of the columns should be vertices, with an edge for each row, joining the two vertices in that row. I am having issues in doing that.
My data looks like this:
Col1 Col2 9d7051e2 da48d749 611cebdb 93ef5eb4 758f1c7b 6acae826 d09360ac a33fe922
I tried two methods to convert this dataframe to bipartite graph:
Method #1
G = nx.from_pandas_edgelist(df_train, source='Donor ID', target='Project ID',
edge_attr=True, create_using=None)
Method#2
G = nx.Graph()
G.add_nodes_from(df_train['Donor ID'], bipartite=0)
G.add_nodes_from(df_train['Project ID'], bipartite=1)
G.add_edges_from(
[(row['Donor ID'], row['Project ID']) for idx, row in df_train.iterrows()])
In either case, bipartite.is_bipartite(G) returns FALSE.
Not sure what I am missing here. Any help on the same would be appreciated.
Few things which I tried to resolve the issue:
- Find duplicate records (col1+col2) and removed duplicate ones
- Make sure there are no missing values in both the table so that it confirms that there is an edge for every record.
- When I tried with a subset of records out of 1Million records, it worked. So as I understand its because of some faulty data but how to find all faulty records and how to fix it is challenging.
- The data type is Object. Tried to convert it to string but not able to do so. Any help will be appreciated.
Nothing above helped.
Please guide me what I am missing here or how can I further debug the issue.