11

I’m trying to create a CRAM file that stores its path to the FASTA reference as a relative path, rather than an absolute path, so that I can move the files around. Unfortunately I can’t get this to work; I was expecting the following to work:

⟩⟩⟩ samtools view -C -T ../reference/ref.fa -o output.cram input.bam

However, the resulting file contains an absolute path in its header:

⟩⟩⟩ samtools view -H output.cram
…
@SQ     SN:1    LN:249250621    M5:hash     UR:/absolute/path/to/data/mapped/../reference/ref.fa
…

As a result, I am unable to open the file via a different path mount that results in different absolute paths, and I can’t move the file (+ its reference) around, or to different machines.

I know that I could set the REF_PATH environment variable or specify -T when reading the file but I would like to avoid this (the result file needs to be readable by IGV, launched by users who don’t know how to set environment variables).

Is there a way of creating a CRAM file that stores a relative path to its reference?

Konrad Rudolph
  • 4,845
  • 14
  • 45
  • Can you reheader the file? I'm curious if stripping the absolute path works that way. – Devon Ryan Aug 03 '18 at 20:03
  • BTW, ideally the hash should suffice to looking up the reference (assuming IGV supports this). – Devon Ryan Aug 03 '18 at 20:09
  • 2
    I have a feeling this should be raised on hts-specs then. What you’re trying to do should really be supported. – Devon Ryan Aug 06 '18 at 08:52
  • 1
    This would seem like a rather glaring oversight which won't help with the adoption of CRAM at all, as it effectively makes CRAM unportable which somewhat limits it's utility for an archive format. – Matt Bashton Aug 06 '18 at 11:07
  • 1
    The primary way of finding the right reference sequences for a CRAM file is via the M5 hash, which is far more useful in an archival context than a file path on someone else's potentially long-gone filesystem.

    If there's an issue to be raised, it's against htslib which unconditionally makes the file path absolute when adding UR from a -T argument. You would indeed be able to set it arbitrarily when reheadering.

    – John Marshall Oct 23 '18 at 09:49

1 Answers1

1

In your CRAM file, make all of the reference sequences present in the ./ directory. Then make a bash script to make relative links to your reference for whatever mount you are using at the time - ie ln -s TARGET LINK_NAME.

Now your CRAM is portable and you just have to tailor the local working dir to it.

conchoecia
  • 3,141
  • 2
  • 16
  • 40