3

First, please know that I am a novice in this field and that it is thus very likely that you will regard this question as basic or flat stupid. Please be gentle and don't be afraid to criticize. I am asking this so I can learn. Also, don't feel afraid to discuss in comments if you can't give a straight answer; all input is useful.

I am currently working in a project where we have fractionated samples in order to dig deeper into the proteome. The fractionated samples have been analysed with shotgun LC-MS and then searched with a contemporary software using a whole proteome database and to the reversed data base in order to estimate FDR. The search result is cut-off so that the highest scoring sequences are included with an estimated FDR of 1%. In this project we have 150 samples in which we can expect to find certain proteins in almost all samples, while others are unlikely to be found in any of them (but we don't know which yet). If we were to decrease the search space to include only relevant proteins (which as previously stated are not known to us yet), we would boost our rate of true positives included after FDR cut-off, as we decrease the search space which noise randomly might match to with a high score (right?). As we can't know which ones are relevant beforehand, would it be possible to determine relevant proteins with a preliminary search? That is, to do as follows:

  1. Search against whole proteome with a certain FDR cut-off (possibly larger than 1%)
  2. Make a new data base including only proteins that has been found in any of the samples.
  3. Search data against new data base.

To me it seems like there is a possibility that this could increase the number of true positives that are found, but I also feel like I might be missing something crucial. Please, enlighten me: Is this methododology unbiased and valid? What risks may there be? Anything more one has to think about before attempting something like this?

The LC-MS was a top 10 shotgun MSMS using a quadrupole-Orbitrap instrument. The search was done using MaxQuant where fragmentation spectra are use to identify MS1 peaks, which intensity is registered as a quantitative datum. All files were searched together but are configured to be separated in the results file in regards to both sample and fraction. We will likely not perform additional analysis of these samples.

gringer
  • 14,012
  • 5
  • 23
  • 79

0 Answers0