What are the standard methods for training machine learning algorithms on data, issued from multiple studies?
In conventional meta-analysis one usually uses weighted average to combine estimates obtained in separate studies, $$ M=\frac{\sum_{i=1}^kW_iY_i}{\sum_{i=1}^kW_i}, $$ where the weights are determined by the number of samples in the study or its variance (fixed-effect model), or may incorporate possible uncontrolled variation between studies (random effects model.) See, e.g., A basic introduction to fixed-effect and random-effects models for meta-analysis or Introduction to Meta-Analysis.
Suppose now that I have data issued from different studies (e.g., the sequences of measurements obtained for each sample), and I want to train a ML algorithm on these data. The simplest option would be to pool all the sample data into a single dataset, which is analogous to the fixed effect model for the average. However, how would one proceed in order to take into account heterogeneity? (i.e., what would be the analogue to the random-effects model.)
Remark
An alternative approach is performing ML for each study separately and then carrying out meta-analysis of the results - e.g., in classification tasks one could use false discovery rate as the analysis variable (for which the weighted averages are calculated.)