Most Popular

1500 questions
5
votes
1 answer

Is there a standard tool used to convert a VCF to a BEDPE?

Many popular SV callers output a VCF. Unfortunately, there isn't a unified system at the present to label events with the same notation. However, is there a standard method for converting these VCFs to BEDPEs? svtools comes to mind:…
EB2127
  • 1,413
  • 2
  • 10
  • 23
5
votes
1 answer

NaN values after ComBat analysis on TCGA COAD RNAseq

I have FPKM-UQ data from COAD-TCGA. I generated an expression set of this data using: > edata = log2(data + 1) > edata[1:2,1:2] X01240896.3f3f.4bf9.9799.55c87bfacf36 1 8.540967 10 …
Martin
  • 101
  • 3
5
votes
2 answers

HDF5 and BioSQL solutions

I'm looking at better database/storage solutions for NCBI virus data, with all attributes particularly year and country of isolation, together with structural data, possible antibody data, T-cell data and bioinformatics data such as %GC etc... The…
M__
  • 12,263
  • 5
  • 28
  • 47
5
votes
1 answer

PCA vs tSNE in single cell RNA-seq

What makes tSNE being the preferred dimensional reduction for visualization in single cell RNA-seq over PCA? I am aware that tSNE works better at showing local structures and fails to capture global structures of the data. But I think I don't fully…
plat
  • 1,032
  • 5
  • 15
5
votes
3 answers

How can I calculate coverage at single bases using a bam file?

I'm looking for a way to input a vcf or bed file (with specific base positions) and a bam file, and get the coverage at each base position (ie single base bins) using the bam file. I also want the strand information so ideal output would be…
Frances K
  • 51
  • 2
5
votes
3 answers

Best distance parameter for estimating physical interaction between residues in a PDB file

We can calculate the distance between residues in a PDB file regarding different parameters like closest atoms, alpha carbon, beta carbon, centroid and etc. Which one of these parameters are better to show physical interaction between residues in a…
Sara
  • 777
  • 1
  • 6
  • 18
5
votes
2 answers

What does PCA mean on GWAS

I understand what GWAS is and I'm able to perform certain tests with the p-values, etc. But what I am having a hard time wrapping my head around is what PCA on GWAS means. So let's say I have 100,000 individuals and genotype data for 10 million…
Jonathan
  • 341
  • 2
  • 10
5
votes
4 answers

Range overlap python error with genomic regions

I have two files s3.txt : 1 10 20 1 5 20 2 20 30 2 25 30 1 10 50 2 20 60 1 14 17 s4.txt: 1 10 20 2 20 30 I am trying to match col0 of both the files and get rows that fall between range(inclusive of themselves) 10-20…
5
votes
1 answer

Duplicate long hits from PSI-BLAST

I had a protein Refseq ID and I PSI-BLASTed this sequence against Refseq database. We all know that the Refseq is a Reference sequence database and it shouldn't have redundancy. After BLASTing my sequence, at first iteration I got 1000 hits and…
Sara
  • 777
  • 1
  • 6
  • 18
5
votes
2 answers

What does "fetching by region is not available for SAM files" mean?

I am used to gzip/biopython solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file import…
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
5
votes
1 answer

How can I ask snakemake to produce a dag where each node represents a rule?

I am using snakemake to create workflows. It is very convenient to visualise my DAG using snakemake --dag target{sampleA,sampleB,sampleC}File | dot -Tpdf > dag.pdf. The resulting pdf shows all the rules' dependencies to get to the target files.…
Biomagician
  • 2,459
  • 16
  • 30
5
votes
3 answers

Access base aligned to particular reference position

The short version: If I have a SAM record, is there any simple way to retrieve the base aligned to a particular reference position without computing a pileup? The long version: I'm using pysam to write some genotyping code. I have a BAM file with…
Daniel Standage
  • 5,080
  • 15
  • 50
5
votes
1 answer

Displaying soft-clipped nucleotides in samtools tview

There are nicer genomics visualization tools available, but the samtools tview command is almost always my go-to for a quick first look at read alignments. I just brought up the following locus in tview. 81055951 81055961 81055971 81055981 …
Daniel Standage
  • 5,080
  • 15
  • 50
5
votes
1 answer

Splitting fasta file into smaller files based on header pattern

I have to split this fasta files into smaller files and write them into individual files my files >lcl|CP000522.1_prot_ABO13860.1_1 [locus_tag=A1S_3471] [protein=hypothetical protein] [protein_id=ABO13860.1] [location=1..957]…
kcm
  • 1,804
  • 12
  • 27
5
votes
1 answer

Searching tool to calculate phase/switch error rate

I'm looking for a tool which, given a truth vcf file and a test vcf file, calculates the phase/switch error rate. I performed phasing of a vcf using WhatsHap and want to compare the outcome to some ground truth phased vcf I have. I can't find a…
Wouter De Coster
  • 1,324
  • 7
  • 12