4

I have some data on Copy Number Variation (SNP chip) for a population of samples.

In particular, I have a set of samples (considered as cases) which display a specific disease phenotype, and another set (considered as controls) which do not. The cases have not a matched-control. All the controls are taken as a random sample from the population that does not present the disease.

For both cases and controls I have the copy number of some regions. How can I compute the copy number change in cases, such as in GDC?

Should I use (as described in the previously linked page):

performing tangent normalization, which subtracts variation that is found in a set of normal samples

and do you know of any tool to perform this computation?

The data I have is formatted in this way:

Sample_ID   chrom      start           end  CN
Sample11       19   11991477      12133823   1
Sample11        2   52260564      52431658   1
Sample12        7    5721757       5896192   3
Sample13       10    2269963       2473585   3
gc5
  • 1,783
  • 18
  • 32

1 Answers1

3

GISTIC does exactly this, and is very possibly the tool used in your link above.

The input files need to be in a different format than what you currently have, but if you're using a SNP6 array, there are guides out there that will tell you how to get the proper files from your .cel files.

Jared Andrews
  • 535
  • 2
  • 6
  • Thanks. Unfortunately the data has been presented to me in this format. I could try to get the original .cel files but I think they'll be unavailable. Do you know of any methodology to get the same result but with the data format I provided in the question? – gc5 May 24 '18 at 14:13
  • Well, you could hack together something simple in python or R that takes all the controls, creates lists of amplified/deleted regions, then removes the CNVs in the cases that intersect those regions (maybe by a minimum amount, like 30%). It kind of depends on how perfect you want your analysis to be. Removing the regions that are "variable" in the controls is pretty easy. – Jared Andrews May 24 '18 at 15:40
  • I think it may be the only solution in this case. Thanks – gc5 May 24 '18 at 15:52