I have a large dataset containing country names and names of musicians like this, with more than 50.000 rows:
| Country | Musician |
|---|---|
| australia | Jimmy Barnes |
| australia | Grinspoon |
| england | Giles |
| united states of america | Bob Dylan |
| united states of america | Hamlet |
| united states of america | Rick Astley |
| sweden | Judith |
| united states of america | The Beatles |
| jamaica | JPM |
| germany | Ruslana |
| russia | Ruslana |
| ukraine | Ruslana |
| united states of america | Possessed |
| france | Georges Brassens |
| greece | Jacques Brel |
| france | Dionysis Savvopoulos |
| greece | Dionysis Savvopoulos |
| france | Léo Ferré |
| greece | Léo Ferré |
| united states of america | Ulali |
| united states of america | Zozobra |
| colombia | Aterciopelados |
| colombia | Carlos Vives |
| colombia | Shakira |
| united kingdom | The Smiths |
| united kingdom | Morrissey |
I would like to use pandas (as this data is in a dataframe) to determine if there is a correlation between the two columns, i.e. whether the country suggests which musician is named. Is this at all possible or am I completely wrong? The contigency table is 11949 rows × 190 columns if that is relevant. Thanks!