Most Popular
1500 questions
4
votes
3 answers
Finding orthologues using BLAST on the NCBI database
I'm an informatics student who has essentially zero knowledge of biology. I BLASTed my gene and have 1000s of results with very low E values. Where do I go from here if I want to find orthologues?
dark1thought
- 41
- 2
4
votes
1 answer
Compare two networks of the same genes between two species
I had a set of genes for honey bees. I found the homologues of those genes in the house fly and created a network of the honey bee genes and a network of the homologous fly genes using STRING.
I know that I could do a topological comparison between…
The Last Word
- 297
- 1
- 7
4
votes
1 answer
What is "block-compressed" file in samtools?
SAMtools returned an error message for my gzipped genome FASTA:
Indexed block-compressed FASTA file cannot be handled
The source code for the message is here.
// check if the it is a valid block-compressed file
if (IOUtil.isBlockCompressed(path,…
SmallChess
- 2,699
- 3
- 19
- 35
4
votes
1 answer
Problem: "Pair-end" reads scRNA seq data (Drop-seq)
In case of Drop-seq, we have paired end data.
Read 1: Cell code + UMI (unique molecule identifier)
Read 2: The transcript information
But I have a problem/doubt with the sample I am working on.
The sample I am using is the following (Check the…
jayesh kumar
- 51
- 2
4
votes
1 answer
Error correction within the long read
I am going to get some data from plasmid sequencing to identify SNPs on the plasmids. What is done in the lab is the following:
The plasmids are purified by size.
We amplify the plasmids using the phi29 polymerase. The polymerase will go through…
Praderas
- 143
- 6
4
votes
2 answers
Group genes by functional categories suming expression values
Using the count of rpkm values from genes in a metagenome sample, I want to group these genes into established categories (for example KEGG or COG). For each sample, my goal is to determine which categories are better represented in each…
F.Lira
- 143
- 4
4
votes
2 answers
Collapse cell barcodes distribution within 1 Hamming distance
I have a barcode distribution from single-cell data, e.g:
11612552 TCCTGAGCACTGCATAACTCAA
9349711 GCTACGCTACTGCATAAGTCCA
8343678 CAGAGAGGCTAAGCCTGCACAT
8161950 CGTACTAGTCTCTCCGCGGCTA
8102383 TCCTGAGCGTAAGGAGCAGATC
7110298…
gc5
- 1,783
- 18
- 32
4
votes
3 answers
How to convert Bed file to fasta file?
I have Bed file containing start and end of a sequence, and I need to convert it to fasta format, any recommendations?
Nour El-Islam Awad
- 41
- 1
- 2
4
votes
1 answer
Codon usage analysis for whole genomes
I am new to bioinformatics. So if these questions seem you to a bit childish please forgive me.
I have two queries.
I am intending to perform a codon usage analysis followed by correspondence analyses for multiple microbial whole genomes of one…
Furqan
- 87
- 3
4
votes
1 answer
Error given while trying to index a BAM file with Samtools Index - NO COOR?
I am currently working on my own Metagenomic pipeline, utilizing Bowtie 2 to map. Bowtie 2 outputs a SAM file, which I convert to a .BAM and sort it using Samtools. When I try to utilize Samtools to index my .BAM file it gives me this…
Haley
4
votes
2 answers
How to align output of grep --color=always? (To QC fasta/fastq files)
Grepping out short sequences from a fasta or fastq file is a really useful way to look at sequencing data. Using the option --color=always makes this even more useful, as you can visualize where the sequences appear in sequencing reads. For…
conchoecia
- 3,141
- 2
- 16
- 40
4
votes
1 answer
In calculating the retention index, why do we use the character state with the lowest frequency?
To calculate the retention index for a phylogenetic tree, we use the following formula:
$$\frac{\text{maximum number of steps on tree - number of steps on the tree}}
{\text{maximum number of steps on tree - minimum number of steps in the data}}$$
To…
Namenlos
- 317
- 1
- 8
4
votes
1 answer
Machine learning using protein-sequences
I'm participating in a bioinformatics machine-learning seminar at my university. The main task is predicting binary classification of protein-protein interactions using sequence data as input.
One of the subtasks is familiarization with the dataset…
Olli B.
- 43
- 4
4
votes
1 answer
Tool to show DNA sequence and allowing upload of own graph data
Background
We want to be able to load (or request) data for a genome including the sequence and gene annotation (bacteria). Then, we want to load our own annotation which should be displayed as a line plot:
position score
1 5
2 …
KingBoomie
- 149
- 3
4
votes
1 answer
Why are my Chi-squared test results different from those in a published table?
I recently read the paper “A novel long non-coding RNA linc-ZNF469-3 promotes lung metastasis through miR-574-5p-ZEB1 axis in triple negative breast cancer”. In this I see Table1 showing correlation of Linc-ZNF469-3 with different features in TNBC…
stack_learner
- 1,262
- 14
- 26