2

In the lab, we have cell cultures growing in bioreactors. Every day we measure the concentration of a set of metabolites and amino acids in the media with a cell culture analyser. Along with this, we also count the number of cells in the sample every day to calculate the cell growth dynamics.

Some metabolites are consumed by the cells and some are produced during the time-course of the culture. We wish to assess the relationship between the consumption or production of any of these compounds and cell growth. For example, if the consumption of a specific metabolite from the media has a significant effect on increasing the cell count.

Is there any way to do this in order to find the most significant compounds associated with cell growth?

I have been suggested that this should be a machine-learning application but I do not really know how to proceed from there and what libraries or workflow I need to use. I have also read about partial least squares regression, using cell count as a response variable, but I am not sure if this is the case here.

Any suggestions or comments that can help me would be very appreciated


As suggested in the comments, I add some additional information that might help to understand what I am trying to do.

The aim of this study is to improve the growth, viability and adhesion of the cells in the media by changing its formulation.

For any experiment related to this, I would say we will have always 2 treatments. A control media and a test media. The test media will have changes in the recipe that we want to evaluate. We have 6 bioreactors available for this and only one cell type. The cells are in culture for 5 days only and during the exponential growth phase. The bioreactors are closed systems and the medium does not change during the culture. Every day, we get the concentration of gluc, lac, gln, glu, NH4+, Na+, K+, and Ca++, and also values of pH, pO2 and pCO2. The same sample also goes to an HPLC to get the concentration of all the amino acids in the media. The number of cells is counted in a nucleocounter.

This is a summary of what we can do in the lab now. Before proceeding with the experimental design of a bigger experiment, I would like to understand what statistical analysis I should keep in mind for this and what can or can´t be done to get significant and valuable results here.

Would it be correct to consider the number of cells as a dependent variable in this case? If so, could it be possible to assess the relationship between the number of cells and any of the compounds measured in the media?

e.g., since glucose is consumed by the cells constantly, I would expect that the changes in glucose concentration recorded are significantly associated with cell growth. If I have a reasonable number of replicates, I believe that it would be possible to find out, e.g., which amino acids are more important.

Probably this is not straightforward for me since I could not find a similar analysis or something easy to understand. But any comments about this could answer some of my questions and help me to build a method for the analysis of this kind of data.

  • Please edit the question to add some information that could help you get a useful answer. In particular: How many bioreactors are there? Are there different treatments or cell types associated with the bioreactors? How many metabolites and amino acids are you measuring? Over how many days? Are the cells always in an exponential growth phase (sometimes called "log phase"), or do they reach some type of stationary/saturation level (or even die off) during the course of the study? Please provide that information by editing the question, as comments are easy to overlook and can be deleted. – EdM Feb 11 '23 at 20:28
  • Thanks for updating the question. Please also say more about the nature of the bioreactors. Are these closed cell-culture systems like flasks where the medium is introduced at the beginning and only changed occasionally if at all over the 5 days? Or are they circulating systems where cells are in the reactor and some fresh medium is always being provided? – EdM Feb 13 '23 at 14:53
  • Thanks to you for taking the time to answer. These are closed cell-culture systems and the medium does not change during the 5 days. I guess you are asking because if the medium changes, the analysis would be different. If we have a bigger process in the future, it would be very interesting to know how to take that into account too – CorteZero Feb 14 '23 at 13:52

1 Answers1

2

The statistical issues here are relatively minor once you take into account subject-matter knowledge. What you are mainly looking for are effects of experimental changes in medium components upon cell proliferation, and hints from changes in medium composition about what might be limiting proliferation.

I'd recommend focusing on intelligently chosen experimental manipulations, for which the outcome variable would be the cell proliferation rate or the number of cells after your 5 days. That is simple to evaluate with standard statistical methods, as there is one major outcome and only a few experimental manipulations. The Technical Perspective by the Pollards, Molecular Biology of the Cell 2019; 30: 1359-1368, should point you in the correct direction for those statistical analyses.

The changes in medium composition over time, however, can help design those experimental manipulations. Let's examine them in detail.

Let's say that you have a system with mammalian cells attached to beads suspended in culture medium, similar to Chen et al., Cytotherapy 2015; 17: 163-173. Based on data similar to what's in that paper and basic principles of cell biology, you can make some simplifying assumptions that allow for analysis of changes of different types of components in the culture medium over time. Let's assume that the cultures are humidified and gassed at constant oxygen and carbon dioxide pressures.

Volume

Even at cell confluence you are unlikely to have more than $10^6$ cells per milliliter of medium. If individual cells have volumes of $10^{-11}$ liter (10 picoliters), then the total volume within cells per milliliter of total fluid is 10 microliters, only about 1% of total volume. Within about 1%, we can ignore changes in extracellular volume as cells increase in number.

Cell number

Let $N(t)$ be the total number of cells per milliliter at time $t$. If the cells are in an exponential proliferation phase as you indicate, then comparing time $t_1$ against $t_0$ gives:

$$N(t_1)=N(t_0) 2^{(t_1-t_0)/\tau}, $$

where $\tau$ is the cell-number doubling time. Based on cell-number values from consecutive sampling, you can estimate the value of $\tau$:

$$\tau = \frac{t_1-t_0}{\log_2 N(t_1)-\log_2 N(t_0)}.$$

In practice, the cell-doubling times aren't always constant through 5 days. They tend to be longer at early times as cultures become established, shorten at intermediate times in true exponential proliferation phase, then lengthen as cells approach confluence or limits of nutrients.

Don't over-react to small variations in estimates of $\tau$ between time periods, however. Although the estimates of total numbers of cells per milliliter can be large, those estimates are typically based on actual counts of just a few dozens to hundreds of cells. The actual number of counts limits the precision of of those estimates. If counts are Poisson-distributed, then the variance in the number of counts $n$ equals the mean and the coefficient of variation (standard deviation divided by the mean) can be estimated as $1/\sqrt n$. An estimate of $N$ based on only 100 actual cell counts thus has an inherent 10% standard error.

Inorganic components

Sodium, potassium, calcium and chloride are taken up by cells from the culture medium but can't be converted to other forms. Intracellular sodium and chloride concentrations tend to be lower than extracellular, and total intracellular calcium (as opposed to free cytoplasmic calcium) is typically not too far from extracellular, so there shouldn't be much change in your sodium and calcium measurements.

Intracellular potassium concentration, however, is typically 30 to 50 times that of extracellular (about 150 mmol/liter intracellular, versus 3 to 5 extracellular). At confluence with a total cell volume of 10 microliters per milliliter of culture medium, the cells will contain 1.5 micromole of potassium, out of 3 to 5 micromole intially in the milliliter of medium. Depending on details of your system there could be substantial changes in extracellular medium potassium over time, particularly at late times.

For changes over time, assume that intracellular potassium concentration $K_c$ and average cell volume $v_c$ are constant over time. The extracellular potassium as a function of time, $K_e(t)$, is then:

$$K_e(t) = K_e(0) - K_c v_c N(t),$$

giving:

$$K_e(0) - K_e(t) = K_c v_c N(t).$$

As you are measuring $N(t)$ with an expected exponential increase, a log-log plot of the two sides of that last equation should give a reasonable estimate of intracellular potassium per cell, $K_c v_c$:

$$ \log(K_e(0) - K_e(t)) =\log K_c v_c + \log N(t). $$

The errors in these estimates might be large, particularly at early times. The variance in $\log K_c v_c$ is:

$$\text{Var} (\log K_c v_c) = \text{Var} (\log(K_e(0) - K_e(t))) + \text{Var}(\log N(t))$$

The variance of the log of a random variable can often be approximated by the ratio of the variance of the variable to the square of its mean; see this page. If $N(t)$ is based on $n_t$ counts, then the second term in that equation is approximately $1/n_t$; see above. If the variance in an estimate of $K_e$ is $v_K$, the first term is approximately:

$$\frac{2v_K}{(K_e(0) - K_e(t))^2}, $$

which can be pretty large if the change in $K_e$ is small relative to the measurement error. If you want to estimate $\log K_c v_c $ this way via a regression, it would be wise to do a variance-weighted regression.

You don't mention phosphate, which can be limiting. Inorganic phosphate from the medium is incorporated by cells into intracellular nucleotides and nucleic acids, phospholipids and phosphoproteins, leading to surprisingly high total cellular phosphorus equivalent to nearly 100 mmol/liter. An equation like that for potassium above could be informative, as extracellular phosphate concentration must be kept low to avoid precipitation of calcium phosphate.

Organic components

Organic components of the medium can be metabolized into different forms, leading to decreases or increases in extracellular concentrations over time. Say that an organic metabolite $m$ in the medium has a conversion rate of $\gamma_m$ per unit time per cell (positive for production, negative for consumption). Then the instantaneous rate of conversion of $m$ is

$$\frac{dm}{dt}=\gamma_m N(t).$$

Integrating for the change in extracellular $m$ between time $t_0$ and $t_1$ gives:

$$m_e(t_1)-m_e(t_0) = \int_{t_0}^{t_1} \gamma_m N(t_0)2^{t/\tau} dt= \frac{\gamma_m \tau}{\log 2} N(t_0) 2^{(t_1-t_0)/\tau}= \frac{\gamma_m \tau N(t_1)}{\log 2} ,$$

where we assume a constant cell-number doubling time $\tau$ over that time interval. As you are already estimating the doubling time $\tau$ for that time interval, that equation provides an estimate of the conversion rate of metabolite $m$ per cell per unit time, $\gamma_m$. If metabolism per cell is in a steady state throughout your 5 days, that conversion rate should be relatively constant across your 5 time intervals.

The principles above show what limits the precision of a $\gamma_m$ estimate. The actual number of cell counts limits the precision of $N(t)$ and $\tau$. The error in measuring $m_e$ relative to the observed difference limits the precision of the left side of the equation.

Look carefully at those conversion rates over time for all of the organic medium components that you are measuring. Pay particular attention to components whose consumption per day per milliliter is a large fraction of the original amount per milliliter. Those components of the medium might become close to undetectable at later times unless their consumption drops as the cells turn to alternate sources. Those data should provide the most useful clues about what components might be limiting proliferation, which you then can investigate in defined experiments.

I don't think that much would be gained by trying to use "machine learning" tools on these extracellular concentration measurements. There are only a few dozen components you are evaluating; the values and plots should be informative on their own and guide you toward defined, definitive experimental manipulations with straightforward interpretations in terms of effects on cell proliferation.

Some examples

With rough numbers close to what Chen et al. reported, here's what you night find after 5 days without a medium change. Assume that $\tau$ is 1 day, a typical value for mammalian cells adapted to culture (one cell division per day), and is constant through the time period, and that you end up with 1 million cells per milliliter of medium. Then, with time in days, the change in extracellular concentration of metabolite $m$ based on the above equation is:

$$m_e(5)-m_e(0)= \frac{\gamma_m \times 1 \times 10^6}{.693}.$$

Glucose. Glucose conversion is about -5 micromoles per million cells per day. Extracellular glucose concentration would drop by over 7 micromoles/milliliter, or 7 millimol/liter. That's an appreciable fraction even of a high-glucose Dulbecco medium.

Lactate. Depending on the cell type and culture conditions, much of that glucose can be metabolized to lactate, with 2 lactate per glucose. At 7.5 micromoles lactate production per million cells per day, you could end up with over 10 millimol/liter lactate. The associated fixed (non-carbonic) acid production would acidify the medium to an extent depending on the medium's acid/base buffering capacity.

Glutamine. Chen et al. reported glutamine conversion of -2 micromoles per million cells per day, for a drop under the above assumptions of 3 millimol/liter in extracellular glutamine over 5 days. Much of that will show up as extracellular ammonium ion. Be aware that glutamine can be converted spontaneously to pyrrolidonecarboxylic acid and ammonium even without cells under culture conditions, so you might need to have a cell-free control to evaluate the cellular contribution to glutamine and ammonia changes.

Other amino acids. "Essential" amino acids can't be synthesized by mammalian cells. Follow them carefully. If your culture medium contains serum then this gets more complicated, as the cells in principle could hydrolyze the serum proteins to acquire those amino acids.

EdM
  • 92,183
  • 10
  • 92
  • 267