It often happens, in practical applications, that two or more sample variances are available and it is desired to add them. For simplicity, assume just two sample variances and assume they are as you describe in your first sentence. Also assume the respective population variances are unknown. (If they were known, there would be no reason to bother with the sample variances.) Each sample variance is gamma distributed:
$$s_i^2 \thicksim \Gamma(k_i,\theta_i)$$
where $k_i$ are the respective shape parameters, equal to $\nu_i$/2, and $\theta_i$ = 2$\sigma_i^2/\nu_i$ are the respective scale parameters.
The sum will be gamma distributed iff the two scale parameters are equal. Since the population parameters are unknown, so are the scale parameters. Hence the sum is either a sample from a finite mixture of gamma distributions, as @whuber elegantly showed or a sample from an infinite sum of gamma distributions, as per Moschopoulos’s paper (P.G. Moschopoulos, The Distribution of the Sum of Independent Gamma Random Variates, Ann. Inst. Statist. Math. 37 (1985), Part A, 541-544). The latter would likely be the case. These are very interesting, but not so useful in practical applications.
So the Welch-Satterthwaite approximation consists of defining the sum of the two sample variances as being a sample from a fictitious gamma distribution, with its shape and scale parameters computed from those of the two constituent sample variance gamma distributions. But you do not know the two population variances, so you substitute the respective sample variances. Then degrees of freedom is computed from one of several equations: I have seen at least 3 inequivalent equations for this.
So does it work? Yes and no. The intention is to obtain a gamma distributed sum, so the square root would be $\chi$ distributed, like our customary sample standard deviations, and then it would be feasible to use critical t values, construct confidence intervals, etc. The approximation fails if the sum is essentially just the larger summand. I have seen a paper where one population variance is known, and used in the Welch-Satterthwaite approximation, along with the sample variance for the other variate. I find this to be puzzling and the associated degrees of freedom can be rather large. However, used reasonably, with sample variances that are not far different, the Welch-Satterthwaite approximation has some utility.
For more information, go here: http://www-personal.umd.umich.edu/~fmassey/gammaRV/
and click on subsection 4.1 under the Welch-Satterthwaite approximation. This will download a 3 page document that shows how the approximation arises.
But the WS formula stated there is still weird for other reasons...
– Andrew NC Aug 24 '17 at 14:31