I am using STAR to align RNA-seq reads to a reference genome. Before the alignment, I need to generate an index of the reference genome. I use the following code to generate the index successfully:
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir output/index/star --genomeFastaFiles ref.fa --sjdbGTFfile ref.gtf --sjdbOverhang 100
This works fine. However, I would like to keep my reference genome compressed to save disk space. So I am trying the following command:
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir output/index/star --genomeFastaFiles ref.fa.gz --readFilesCommand "gunzip -c" --sjdbGTFfile ref.gtf --sjdbOverhang 100
but I get the following error:
EXITING because of INPUT ERROR: the file format of the genomeFastaFile: ref.fa.gz is not fasta: the first character is '' (31), not '>'.
Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).
I am using the readFilesCommand successfully with compressed RNA-seq fastq files. Does anybody know if there is a similar option to use compressed references? Is there a workaround using Unix commands (maybe piping?) or do I need to decompress the reference, index it and then compress it again?