3

I am looking for a publicly available matched tumor-normal sample. I need Illumina fastq reads (or an aligned bam file, since I could extract the reads from it) from a tumor and a matching, non-tumor control set from the same patient. Ideally, I would also have a truth set of known variants in the sample, but I can live without that.

I checked the Next Generation Cancer Knowledge Network but although they do seem to have appropriate bam files, which I could convert to fastq, all of the ones I found were "controlled access" and getting access seems very complicated.

Is there something like the public NA12878 sample for tumor-normal analyses?

terdon
  • 10,071
  • 5
  • 22
  • 48
  • 1
    The TCGA does provide the raw fastq too, and for some patients one has sample from the tumor and from a healthy region. But I am not sure it is public as without registration – llrs Apr 20 '18 at 15:11
  • @Llopis as far as I can tell, the registration process is incredibly convoluted and involves getting the "Signing officer" of the organization (PI, CEO, whatever) to register the organization with two separate governmental bodies (eRA and then dbGAP) and then ask for access. It might be possible for me to ask my boss to do this, but I would much prefer to find more easily accessible data. – terdon Apr 20 '18 at 15:37
  • 2
    Most patient data is stored in dbGAP because of privacy issues, so you'll struggle to find anything that's unrestricted. – heathobrien Apr 20 '18 at 15:49
  • @heathobrien yeah, which is why I've been struggling :) I was hoping there was a Henrietta Lacks analogue for the modern day or that NIST had released a tumor-normal pair or something. – terdon Apr 20 '18 at 15:54
  • I thought the HeLa genome got pulled down for exactly that reason. Is it available again? – heathobrien Apr 20 '18 at 16:03
  • Not that I know of, no. Sorry, I was thinking of her cells and how they've been used for so many years, not about her genome. I come from the era before NGS. – terdon Apr 20 '18 at 16:08
  • 1
    If you work at a university, getting access to dbGap protected data is no where near as difficult as it seems. Getting registered on the system originally took about an hour of my time and about half a week of waiting. When I apply for a new dataset it takes me about half an hour per project. – Ian Sudbery Apr 24 '18 at 16:47
  • @IanSudbery unfortunately, I am no longer in academia, so I would need to ask the CEO of the small startup I work at to do this, and he barely has enough time to breathe as it is. – terdon Apr 24 '18 at 16:58

1 Answers1

3

Please take a look at

An open access pilot freely sharing cancer genomic data from participants in Texas

Although I have not worked on this paper but author claimed that both tumor and normal data is publicly available. In case if you still need something more, than you can work on virtual normal data. And there are several papers which provide machnine learning techniques to overcome the deficiency of normal data (not much efficient but still they give considerable results). Hope this information will help you. Thanks

Lot_to_learn
  • 530
  • 3
  • 14
  • Argh, this looks perfect, but the site seems to be down! I have emailed the author of the paper to see if it's just a temporary glitch. Thanks! – terdon Apr 25 '18 at 13:43
  • The site is no longer maintained, but I contacted the authors and they very helpfully put it back up for me! – terdon Apr 26 '18 at 23:35
  • @terdon: Then you should look for virtual normal database than or, if interested, machine learning techniques. – Lot_to_learn Apr 30 '18 at 06:22