3

I am new to Bioinformatics and I am exploring the refGene.txt files from the UCSC genome annotation database for several species.

My question concerns the Dec. 2011 (GRCm38/mm10) assembly of the mouse genome. I have seen that the Human one (hg39) contains both coding and non-coding transcripts. But the Mouse (mm10) RefGene.txt only contains coding transcripts.

Why is that?

Source of the file: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz

jorvaor
  • 49
  • 4
  • Is there a reason you're using refseq rather than the more complete annotations from Ensembl/Gencode? – Devon Ryan Apr 02 '19 at 11:31
  • Yes, it is an exercise in a course. I am to use bash shell and gawk to extract some information from several refGene.txt files. The question about the absence of non-coding transcripts in mouse is not part of the exercise, but have puzzled me. I have been looking for info about it, but I haven't a clear explanation. – jorvaor Apr 02 '19 at 12:02

1 Answers1

1

It wasn't the correct file. I downloaded the file again from UCSC, compared it with the file that I was using, and they were different. This new file contained both information from coding and non-coding transcripts.

To all the kind people that took interest in my problem, I am sorry.

Edited 30/12/2020: To add a bit of clarity; the link in the question is the correct link from which to download the correct file. The file with which I was working from the beginning was the wrong one, but I don't really know where I got it from the first time.

jorvaor
  • 49
  • 4