5

I found some Nanopore MinION data on SRA, which I would like to investigate. I use sratoolkit for Illumina data all the time, but I am not sure how to get the fast5 file from the sra file.

What I have tried is to prefecth it, but then I don't know which command to use for .sra to .fast5 conversion. Not with fastq-dump, I assume?

prefetch -v SRR5286963 --max-size 50000000
benn
  • 3,571
  • 9
  • 28

2 Answers2

3

When I was trying to upload MinION data to SRA, they wouldn't allow me to upload only the raw signal data (i.e. FAST5 files). Every file needed to contain a FASTQ sequence, and this sequence was extracted from the file and stored in the archive. Given this, I don't expect it will be possible to retrieve signal or event information from SRA. I also don't expect this will be possible in the future either, given that the SRA database expects every record to have a FASTQ-like sequence attached.

However, EBI/ENA do allow the upload of raw FAST5 files to the database (e.g. here), but there are no FAST5 reads for the specific single-cell B1a study you have indicated. If you're interested specifically in that study, then I recommend that you ask Miten Jain directly for access to those reads; he should have another method to give you access to them.

ENA is "ready" for ONT reads, but researchers need to upload their reads to ENA in order for that database to be useful.

gringer
  • 14,012
  • 5
  • 23
  • 79
2

I would suggest trying an alternative way, as the FAST5 is a fairly new entry in SRA - 6 public datasets as of today.

You can search the ERX/ERR entry you got from SRA in ENA and either (1) direct download it via ftp or (2) redirect to Galaxy.

Here's a tutorial that might be interesting for you: basically, you first get the archive from ENA, then find the FAST5 files in the top level of the downloaded archive, all to be processed in R using Fast5Summary objects from IONiseR package.

aechchiki
  • 2,676
  • 11
  • 34
  • Thanks, great idea! However, on ENA I can only download fastq or the SRA file or am I missing something? – benn Nov 27 '17 at 15:46
  • I guess there are no fast5 associated with SRR5286963 (simple filtered search), so maybe the easiest solution would be to directly ask the publishers - there is possibility that they just uploaded the fastq (e.g. from poretools extraction) instead of the fast5, because of the archive size. – aechchiki Nov 27 '17 at 16:16