Most Popular

1500 questions
8
votes
2 answers

What is 'k' in sequencing?

When a DNA sequence is sequenced, I've only ever dealt with A,T,C,G and N which indicates un-identifiable bases. However, I came across a 'k' recently and I had asked another researcher who gave me an answer for what 'k' represents but I don't quite…
Jonathan
  • 341
  • 2
  • 10
7
votes
1 answer

How to simulate "base error rate" in art_illumina?

I'd like to simulate 10% sequencing error using art_illumina. The simulator doesn't have a parameter that I can just give the 10%, but it has this: -qs --qShift the amount to shift every first-read quality score by -qs2 --qShift2 the amount to…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
7
votes
0 answers

Minimizing particle-flow grid mapping time of a protein surface using spherical-coordinates and differentiation encoding

This is an interesting problem - I was wondering if anyone has a creative solution. So I have a vector of vertices representing atoms in a protein, as well as 6 variables containing the absolute minimum/maximum bound of the set at each direction. I…
7
votes
1 answer

Trouble using biomaRt to retrieve hgnc symbols from Ensembl transcript ids

I have a matrix of gene counts which I'm going to use as input for DESeq. Right now, each gene is labeled by its Ensemble transcript ID, but I'd like to convert these to their HGNC symbols before I input them into DESeq for analysis. I'm attempting…
J0HN_TIT0R
  • 541
  • 1
  • 4
  • 7
7
votes
3 answers

How does one construct a cladogram of intraspecies relationships?

I have SNP data from several cultivars of rice which I have used to produce alignments, but I don't think that the usual models and algorithms used for generating phylogenetic trees are appropriate, because these cultivars are not the result of…
twmccart
  • 123
  • 1
  • 5
7
votes
1 answer

how to set database other than nr for remote blast+ search

I am attempting to run a BLAST search remotely using BLAST+. I can get search to work correctly at the command line with the following commands: blastp -query proteins.fasta -remote -db nr -out proteins_nr.txt -outfmt 6 -evalue 1e-30 However, I…
bluescholar1212
  • 421
  • 2
  • 10
7
votes
1 answer

What is a good pipeline for using public domain exomes as controls?

I'm currently attempting association analysis with an extremely small set of patient exomes (n=10), with no control or parental exomes available. Downloading the ExAC VCF of variant sites (http://exac.broadinstitute.org/downloads) or the 1000G…
carsweshau
  • 71
  • 2
7
votes
4 answers

How to correlate two zero inflated bedgraph-like signals?

This question pertains to iCLIP, but it could just as easily be ChIP-seq or ATAC-seq or mutation frequencies. I have iCLIP read counts across the transcriptome and I wish to know if the signals are correlated - that is, where one of them is high,…
Ian Sudbery
  • 3,311
  • 1
  • 11
  • 21
7
votes
2 answers

Finding homologs of a protein sequence

I have a refseq ID of a protein from E.coli and I want to find homologs of this protein. I ran Blast against refseq database but I got a lot of sequences most of which were from Ecoli again. I decided to run PSI-Blast to get more divergent species,…
Sara
  • 777
  • 1
  • 6
  • 18
7
votes
3 answers

GRCh38 vcf file with common cancer mutations

Is there a vcf file on the GRCh38 assembly with common cancer mutations I can download somewhere? Maybe from one of the big international cancer genomics consortia? By common, I mean whichever mutations have been found recurrent in different types…
719016
  • 2,324
  • 13
  • 19
7
votes
1 answer

Why are my kallisto and salmon results differing so much just for lncRNA transcripts?

I am running some analysis on an RNA-seq dataset. I have a list of transcripts that are potential lncRNA for which I ran both Kallisto and Salmon aligners. The input data for index building and quantification includes mRNA as well as these potential…
7
votes
2 answers

Are phylogenetic tree construction algorithms any different than general clustering algorithms?

Are phylogenetic tree construction algorithms any different from clustering algorhithms? I suspect the answer is no. Of course phylogenetic tree construction uses biological knowledge, e.g special distance metrics, but does it brings anything new to…
Ahmed Abdullah
  • 367
  • 2
  • 8
7
votes
1 answer

Get RefSeq accession numbers with versions

Google searching for NM_002084 gives the following result: NM_002084.4 This, I assume, is the latest version v4, hence the .4 suffix. Searching for previous versions I get the following results, along with notes saying it was updated or…
zx8754
  • 1,042
  • 8
  • 22
7
votes
2 answers

How to select a cutoff for interaction confidence in STRINGdb?

I have a list of 100 genes that are called as hits in a genetic screening. I want to have a network of the interactions between the proteins of these 100 genes. I am using both STRINGdb web and its R API. In both situations you can select the…
plat
  • 1,032
  • 5
  • 15
7
votes
2 answers

Definition of "seed" in sequence alignment

I would like to know what is meant by "seed" for various sequence aligners. How is it important?
user3138373
  • 420
  • 1
  • 5
  • 13