Most Popular

1500 questions
3
votes
1 answer

Protein SCA and DCA

Could you please suggest suitable R packages for statistical coupling and direct coupling analysis of protein sequences? I know there are Python packages for those: SCA and DCA. I was looking for similar packages in R so that I can continue with my…
Gyrtle
  • 31
  • 5
3
votes
2 answers

How to merge .fastq.qz files into a single .fastq.gz with their same id without losing any content in parallel

I have a large number of .fastq.gz files of different lane and reads. I have to merge them each reads group files into single .fastq.gz files. **eg: 1st…
Nitha
  • 73
  • 1
  • 1
  • 6
3
votes
1 answer

How to learn/get started bioinformatics?

I know this is a broad question and I'm sorry if this is not the right place to post (this is my first post ever), but is there some sort of roadmap for bioinformatics? I'm an undergraduate student and I have a strong background in statistics,…
barzilay
  • 31
  • 2
3
votes
1 answer

Database for germline copy number variations in cancer

I am interested in looking at germline copy number variations in individuals that are at high risk of developing cancer. Are there any databases where I can look, if a CNV seen in our test case has been seen/reported earlier in a patient with…
Aprasad
  • 31
  • 2
3
votes
1 answer

Merging transcriptomes coming from different experiments

I'm planning to build a transcriptome by pooling all existing transcriptomes in SRA for a non-model species (which has no reference genome) to study differentially expressed genes and the like. It might worth mentioning that the transcriptomes I…
LinuxBlanket
  • 309
  • 1
  • 10
3
votes
2 answers

Frequency of specific viral sequence in .BAM or .fastq

I was wondering if anyone had any experience 'counting' the frequency of a particular viral sequence in an individual's fastq sequence. So basically, counting the number of occurrences of a particular sequence given someone's fastq file. The thing…
h3ab74
  • 836
  • 5
  • 14
3
votes
3 answers

CDS length for each human gene

Does anyone know where and how could I download a list of all human genes and the length of the coding sequence for each gene? Is it possible to do this on the NCBI site, ensembl?
3
votes
2 answers

what is the difference between local identity and global identity for homolog finding?

We believe that if after running blast, the global identity between a resulting sequence and our query is at least 30%, we can say that two sequences are homologs. what is the difference between local identity and global identity and how can we…
Sara
  • 777
  • 1
  • 6
  • 18
3
votes
2 answers

How can I find mutations associated with disease in human histone residues?

I would like to look if there are mutations in residues of human histones associated with any disease. For instance, if a mutation in residue K6 (lysine 6) of histone H2A1A is associated with any human disease, (with PUBMED evidence, preferably),…
plat
  • 1,032
  • 5
  • 15
3
votes
2 answers

Dataset: Locations of regulatory sequences in the human genome?

I am looking for the positions of annotated regulatory sequences (promoters, enhancers and suppressors) in the human genome. I looked at Ensembl regulatory Build and PAZAR but I am not used to look for datasets and I failed to find what I was…
Remi.b
  • 203
  • 1
  • 8
3
votes
2 answers

Handling sample identity in aggregated 10x libraries?

cellranger aggr can combine multiple libraries (samples), and appends each barcode with an integer (e.g. AGACCATTGAGACTTA-1). The sample identity is not recorded in the combined matrix.mtx…
Peter
  • 2,634
  • 15
  • 33
3
votes
1 answer

Techniques for analyzing and quantifying sample bleed through in genotyping with Illumina

I am looking at the presence of viral genotypes within individual samples within an assay. Often times there is a sample whose read counts are firing off the charts and this sample tends to "bleed through" to the other samples. I have recently…
quantik
  • 255
  • 1
  • 9
3
votes
2 answers

How do you download stuff from NCBI fast?

I have 2 .csv files that contains a list of accession codes. For example for this experiment a .csv file will…
0x90
  • 1,437
  • 9
  • 18
3
votes
1 answer

Proteins with one SS bond?

I would like to find proteins with exactly one SS bond. Is there a database where I can search this? I've tried advanced search on https://www.rcsb.org/, but no such option, at least I could not find it.
Jake B.
  • 205
  • 1
  • 4
3
votes
0 answers

What could cause differing counts of R1 and R2 in Paired End Sequencing (RNASEQ)

I recently finished mapping an RNAseq run using STAR2.3.0 and noticed the read1 and read2 counts are different, according to samtools flagstat. The map% is ~80% but the R1 R2 counts are: R1=139283692 R2=137472495 This usually does not happen for PE…
d_kennetz
  • 631
  • 5
  • 17