I have a dataset with samples 0 and 1 data. Here each Id represents a sample no and 0 or 1 represents if the keyword(on the left: Water, Soil, etc) exists in the publication. The regional columns on the right (eg. Africa, Asia) say where the paper was published from, however, there are overlaps between regions(eg same publication has multiple country affiliations)
1. What kind of statistical tool I will need to find the correlation between the region (Europe, Africa, Asia) and the keywords (eg. water, Soil, waste, etc)*
2. What kind of statistical tool I will need to find if region influences the keywords?

Obtain the number of times a topic is addressed in any paper for each country. This will result in a table with countries as columns and topics covered as rows. Now a chi-square test can be done to check for dependence.
Another method could be to find mutual information using joint and marginal probabilities for topics and countries.
– Curious Dec 15 '23 at 14:17