Most Popular
1500 questions
5
votes
1 answer
Is there a standard tool used to convert a VCF to a BEDPE?
Many popular SV callers output a VCF. Unfortunately, there isn't a unified system at the present to label events with the same notation.
However, is there a standard method for converting these VCFs to BEDPEs?
svtools comes to mind:…
EB2127
- 1,413
- 2
- 10
- 23
5
votes
1 answer
NaN values after ComBat analysis on TCGA COAD RNAseq
I have FPKM-UQ data from COAD-TCGA.
I generated an expression set of this data using:
> edata = log2(data + 1)
> edata[1:2,1:2]
X01240896.3f3f.4bf9.9799.55c87bfacf36
1 8.540967
10 …
Martin
- 101
- 3
5
votes
2 answers
HDF5 and BioSQL solutions
I'm looking at better database/storage solutions for NCBI virus data, with all attributes particularly year and country of isolation, together with structural data, possible antibody data, T-cell data and bioinformatics data such as %GC etc... The…
M__
- 12,263
- 5
- 28
- 47
5
votes
1 answer
PCA vs tSNE in single cell RNA-seq
What makes tSNE being the preferred dimensional reduction for visualization in single cell RNA-seq over PCA?
I am aware that tSNE works better at showing local structures and fails to capture global structures of the data.
But I think I don't fully…
plat
- 1,032
- 5
- 15
5
votes
3 answers
How can I calculate coverage at single bases using a bam file?
I'm looking for a way to input a vcf or bed file (with specific base positions) and a bam file, and get the coverage at each base position (ie single base bins) using the bam file. I also want the strand information so ideal output would be…
Frances K
- 51
- 2
5
votes
3 answers
Best distance parameter for estimating physical interaction between residues in a PDB file
We can calculate the distance between residues in a PDB file regarding different parameters like closest atoms, alpha carbon, beta carbon, centroid and etc. Which one of these parameters are better to show physical interaction between residues in a…
Sara
- 777
- 1
- 6
- 18
5
votes
2 answers
What does PCA mean on GWAS
I understand what GWAS is and I'm able to perform certain tests with the p-values, etc. But what I am having a hard time wrapping my head around is what PCA on GWAS means.
So let's say I have 100,000 individuals and genotype data for 10 million…
Jonathan
- 341
- 2
- 10
5
votes
4 answers
Range overlap python error with genomic regions
I have two files
s3.txt :
1 10 20
1 5 20
2 20 30
2 25 30
1 10 50
2 20 60
1 14 17
s4.txt:
1 10 20
2 20 30
I am trying to match col0 of both the files and get rows that fall between range(inclusive of themselves) 10-20…
novicebioinforesearcher
- 771
- 1
- 6
- 15
5
votes
1 answer
Duplicate long hits from PSI-BLAST
I had a protein Refseq ID and I PSI-BLASTed this sequence against Refseq database. We all know that the Refseq is a Reference sequence database and it shouldn't have redundancy. After BLASTing my sequence, at first iteration I got 1000 hits and…
Sara
- 777
- 1
- 6
- 18
5
votes
2 answers
What does "fetching by region is not available for SAM files" mean?
I am used to gzip/biopython solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file
import…
Kamil S Jaron
- 5,542
- 2
- 25
- 59
5
votes
1 answer
How can I ask snakemake to produce a dag where each node represents a rule?
I am using snakemake to create workflows. It is very convenient to visualise my DAG using snakemake --dag target{sampleA,sampleB,sampleC}File | dot -Tpdf > dag.pdf. The resulting pdf shows all the rules' dependencies to get to the target files.…
Biomagician
- 2,459
- 16
- 30
5
votes
3 answers
Access base aligned to particular reference position
The short version: If I have a SAM record, is there any simple way to retrieve the base aligned to a particular reference position without computing a pileup?
The long version: I'm using pysam to write some genotyping code. I have a BAM file with…
Daniel Standage
- 5,080
- 15
- 50
5
votes
1 answer
Displaying soft-clipped nucleotides in samtools tview
There are nicer genomics visualization tools available, but the samtools tview command is almost always my go-to for a quick first look at read alignments. I just brought up the following locus in tview.
81055951 81055961 81055971 81055981 …
Daniel Standage
- 5,080
- 15
- 50
5
votes
1 answer
Splitting fasta file into smaller files based on header pattern
I have to split this fasta files into smaller files and write them into individual files my files
>lcl|CP000522.1_prot_ABO13860.1_1 [locus_tag=A1S_3471] [protein=hypothetical protein] [protein_id=ABO13860.1] [location=1..957]…
kcm
- 1,804
- 12
- 27
5
votes
1 answer
Searching tool to calculate phase/switch error rate
I'm looking for a tool which, given a truth vcf file and a test vcf file, calculates the phase/switch error rate. I performed phasing of a vcf using WhatsHap and want to compare the outcome to some ground truth phased vcf I have. I can't find a…
Wouter De Coster
- 1,324
- 7
- 12