What is the ICGC normalized_read_count?

Question

I downloaded gene expression data (exp_seq) from the ICGC file browser.

For each sample and gene, the file contains a normalized_read_count.

What is that value? I couldn't find any information on the ICGC website. The values are definitly too low for TPM.

score 5 · Accepted Answer · answered Jan 10 '18 at 14:57

5

By reading this thread on seqanswers and by comparing the data to TCGA, I figured out

raw_read_count is the read count which you use as input for e.g. DESeq2. It has been estimated using RSEM
normalized_read_count is equivalent to the scaled_estimate from TCGA. This is the estimated fraction of transcripts made up by a given gene, as estimated by RSEM. Multiplying this value with 1e6 yields the TPM.

answered Jan 10 '18 at 14:57

Gregor Sturm

1

Glad you could find it yourself! Many thanks for posting the answer – llrs Jan 10 '18 at 15:36
Thank you for your answer. This is an essential information for a user of this data. I wonder why this information on the ICGC website itself though. I found a page describing the columns on github https://github.com/icgc-dcc/dcc-docs/blob/master/docs/dictionary/release-20/sequencing-based-gene-expression-expseq-primary-file-p.md. It does not explicitly say what normalized_read_count are. Also the page corresponds to one of the older release (20), not the current one (28). – user345394 Dec 30 '21 at 17:48

1 Answers1