I have a data frame comprising more than two dozen variables, all of which are binary (0/1) with <5% missing data. These variables can be classified into groups that pertain to different aspects of health. For one group, disease for example, 0/1 represents the answer to a yes/no question: do you have a given disease? Another group, is based on ordinal questions (ie. how many days of the week to you perform a given activity: 1,2,3,4 or 5+?), which are transformed into binary variables (ie. 0=answer 1-3; 1=answer 4-5+). A third group are based on physical measures, and are transformed into binary variables based on established cutpoints, or an arbitrary one (ie. within 1st quartile or not).
I would like to perform some exploratory analyses (ie. partial correlation analysis, factor analysis, etc.) to look at the relationships among these variables. My understanding is that a phi correlation would be more appropriate for the first group type described, while a tetrachoric correlation would be more appropriate for the latter two. For generating a correlation matrix on all of my variables, is one more appropriate to use over the other, or should I be considering a different approach. Preliminary partial correlation networks using a phi correlation matrix look much more expected (disease groups cluster together, biological similar variables are connected) as compared to what results from a tetrachoric correlation matrix (more of a hairball in which seemingly everything is connected).
A third group are based on physical measures, and are transformed into binary variablesIf you have those original unbinned scale variables why wouldn't you just compute Pearson correlation for that group of variables? Without binning them. Tetrachoric correlation is the (inferred) Pearson correlation, after all. https://stats.stackexchange.com/a/186026/3277 – ttnphns Sep 11 '17 at 13:59