Highest Voted Questions - Bioinformatics Stack Exchange

4

votes

3 answers

Finding orthologues using BLAST on the NCBI database

I'm an informatics student who has essentially zero knowledge of biology. I BLASTed my gene and have 1000s of results with very low E values. Where do I go from here if I want to find orthologues?

asked Nov 21 '18 at 15:43

dark1thought

41
2

4

votes

1 answer

Compare two networks of the same genes between two species

I had a set of genes for honey bees. I found the homologues of those genes in the house fly and created a network of the honey bee genes and a network of the homologous fly genes using STRING. I know that I could do a topological comparison between…

networks

asked Nov 14 '18 at 00:45

The Last Word

297
1
7

4

votes

1 answer

What is "block-compressed" file in samtools?

SAMtools returned an error message for my gzipped genome FASTA: Indexed block-compressed FASTA file cannot be handled The source code for the message is here. // check if the it is a valid block-compressed file if (IOUtil.isBlockCompressed(path,…

asked Nov 13 '18 at 10:04

SmallChess

2,699
3
19
35

4

votes

1 answer

Problem: "Pair-end" reads scRNA seq data (Drop-seq)

In case of Drop-seq, we have paired end data. Read 1: Cell code + UMI (unique molecule identifier) Read 2: The transcript information But I have a problem/doubt with the sample I am working on. The sample I am using is the following (Check the…

asked Nov 13 '18 at 04:45

jayesh kumar

51
2

4

votes

1 answer

Error correction within the long read

I am going to get some data from plasmid sequencing to identify SNPs on the plasmids. What is done in the lab is the following: The plasmids are purified by size. We amplify the plasmids using the phi29 polymerase. The polymerase will go through…

asked Nov 08 '18 at 09:11

Praderas

143
6

4

votes

2 answers

Group genes by functional categories suming expression values

Using the count of rpkm values from genes in a metagenome sample, I want to group these genes into established categories (for example KEGG or COG). For each sample, my goal is to determine which categories are better represented in each…

python

asked Nov 07 '18 at 18:11

F.Lira

143
4

4

votes

2 answers

Collapse cell barcodes distribution within 1 Hamming distance

I have a barcode distribution from single-cell data, e.g: 11612552 TCCTGAGCACTGCATAACTCAA 9349711 GCTACGCTACTGCATAAGTCCA 8343678 CAGAGAGGCTAAGCCTGCACAT 8161950 CGTACTAGTCTCTCCGCGGCTA 8102383 TCCTGAGCGTAAGGAGCAGATC 7110298…

asked Nov 05 '18 at 18:05

gc5

1,783
18
32

4

votes

3 answers

How to convert Bed file to fasta file?

I have Bed file containing start and end of a sequence, and I need to convert it to fasta format, any recommendations?

file-formats

asked Nov 05 '18 at 11:11

Nour El-Islam Awad

41
1
2

4

votes

1 answer

Codon usage analysis for whole genomes

I am new to bioinformatics. So if these questions seem you to a bit childish please forgive me. I have two queries. I am intending to perform a codon usage analysis followed by correspondence analyses for multiple microbial whole genomes of one…

asked Nov 02 '18 at 18:03

Furqan

87
3

4

votes

1 answer

Error given while trying to index a BAM file with Samtools Index - NO COOR?

I am currently working on my own Metagenomic pipeline, utilizing Bowtie 2 to map. Bowtie 2 outputs a SAM file, which I convert to a .BAM and sort it using Samtools. When I try to utilize Samtools to index my .BAM file it gives me this…

asked Sep 21 '18 at 00:39

Haley

4

votes

2 answers

How to align output of grep --color=always? (To QC fasta/fastq files)

Grepping out short sequences from a fasta or fastq file is a really useful way to look at sequencing data. Using the option --color=always makes this even more useful, as you can visualize where the sequences appear in sequencing reads. For…

asked Oct 26 '18 at 22:58

conchoecia

3,141
2
16
40

4

votes

1 answer

In calculating the retention index, why do we use the character state with the lowest frequency?

To calculate the retention index for a phylogenetic tree, we use the following formula: $$\frac{\text{maximum number of steps on tree - number of steps on the tree}} {\text{maximum number of steps on tree - minimum number of steps in the data}}$$ To…

asked Oct 25 '18 at 03:49

Namenlos

317
1
8

4

votes

1 answer

Machine learning using protein-sequences

I'm participating in a bioinformatics machine-learning seminar at my university. The main task is predicting binary classification of protein-protein interactions using sequence data as input. One of the subtasks is familiarization with the dataset…

asked Oct 13 '18 at 15:13

Olli B.

43
4

4

votes

1 answer

Tool to show DNA sequence and allowing upload of own graph data

Background We want to be able to load (or request) data for a genome including the sequence and gene annotation (bacteria). Then, we want to load our own annotation which should be displayed as a line plot: position score 1 5 2 …

asked Oct 11 '18 at 11:40

KingBoomie

149
3

4

votes

1 answer

Why are my Chi-squared test results different from those in a published table?

I recently read the paper “A novel long non-coding RNA linc-ZNF469-3 promotes lung metastasis through miR-574-5p-ZEB1 axis in triple negative breast cancer”. In this I see Table1 showing correlation of Linc-ZNF469-3 with different features in TNBC…

asked Oct 09 '18 at 09:30

stack_learner

1,262
14
26

Most Popular