Most Popular

1500 questions
4
votes
3 answers

Finding orthologues using BLAST on the NCBI database

I'm an informatics student who has essentially zero knowledge of biology. I BLASTed my gene and have 1000s of results with very low E values. Where do I go from here if I want to find orthologues?
4
votes
1 answer

Compare two networks of the same genes between two species

I had a set of genes for honey bees. I found the homologues of those genes in the house fly and created a network of the honey bee genes and a network of the homologous fly genes using STRING. I know that I could do a topological comparison between…
The Last Word
  • 297
  • 1
  • 7
4
votes
1 answer

What is "block-compressed" file in samtools?

SAMtools returned an error message for my gzipped genome FASTA: Indexed block-compressed FASTA file cannot be handled The source code for the message is here. // check if the it is a valid block-compressed file if (IOUtil.isBlockCompressed(path,…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
4
votes
1 answer

Problem: "Pair-end" reads scRNA seq data (Drop-seq)

In case of Drop-seq, we have paired end data. Read 1: Cell code + UMI (unique molecule identifier) Read 2: The transcript information But I have a problem/doubt with the sample I am working on. The sample I am using is the following (Check the…
4
votes
1 answer

Error correction within the long read

I am going to get some data from plasmid sequencing to identify SNPs on the plasmids. What is done in the lab is the following: The plasmids are purified by size. We amplify the plasmids using the phi29 polymerase. The polymerase will go through…
Praderas
  • 143
  • 6
4
votes
2 answers

Group genes by functional categories suming expression values

Using the count of rpkm values from genes in a metagenome sample, I want to group these genes into established categories (for example KEGG or COG). For each sample, my goal is to determine which categories are better represented in each…
F.Lira
  • 143
  • 4
4
votes
2 answers

Collapse cell barcodes distribution within 1 Hamming distance

I have a barcode distribution from single-cell data, e.g: 11612552 TCCTGAGCACTGCATAACTCAA 9349711 GCTACGCTACTGCATAAGTCCA 8343678 CAGAGAGGCTAAGCCTGCACAT 8161950 CGTACTAGTCTCTCCGCGGCTA 8102383 TCCTGAGCGTAAGGAGCAGATC 7110298…
gc5
  • 1,783
  • 18
  • 32
4
votes
3 answers

How to convert Bed file to fasta file?

I have Bed file containing start and end of a sequence, and I need to convert it to fasta format, any recommendations?
4
votes
1 answer

Codon usage analysis for whole genomes

I am new to bioinformatics. So if these questions seem you to a bit childish please forgive me. I have two queries. I am intending to perform a codon usage analysis followed by correspondence analyses for multiple microbial whole genomes of one…
Furqan
  • 87
  • 3
4
votes
1 answer

Error given while trying to index a BAM file with Samtools Index - NO COOR?

I am currently working on my own Metagenomic pipeline, utilizing Bowtie 2 to map. Bowtie 2 outputs a SAM file, which I convert to a .BAM and sort it using Samtools. When I try to utilize Samtools to index my .BAM file it gives me this…
Haley
4
votes
2 answers

How to align output of grep --color=always? (To QC fasta/fastq files)

Grepping out short sequences from a fasta or fastq file is a really useful way to look at sequencing data. Using the option --color=always makes this even more useful, as you can visualize where the sequences appear in sequencing reads. For…
conchoecia
  • 3,141
  • 2
  • 16
  • 40
4
votes
1 answer

In calculating the retention index, why do we use the character state with the lowest frequency?

To calculate the retention index for a phylogenetic tree, we use the following formula: $$\frac{\text{maximum number of steps on tree - number of steps on the tree}} {\text{maximum number of steps on tree - minimum number of steps in the data}}$$ To…
Namenlos
  • 317
  • 1
  • 8
4
votes
1 answer

Machine learning using protein-sequences

I'm participating in a bioinformatics machine-learning seminar at my university. The main task is predicting binary classification of protein-protein interactions using sequence data as input. One of the subtasks is familiarization with the dataset…
Olli B.
  • 43
  • 4
4
votes
1 answer

Tool to show DNA sequence and allowing upload of own graph data

Background We want to be able to load (or request) data for a genome including the sequence and gene annotation (bacteria). Then, we want to load our own annotation which should be displayed as a line plot: position score 1 5 2 …
KingBoomie
  • 149
  • 3
4
votes
1 answer

Why are my Chi-squared test results different from those in a published table?

I recently read the paper “A novel long non-coding RNA linc-ZNF469-3 promotes lung metastasis through miR-574-5p-ZEB1 axis in triple negative breast cancer”. In this I see Table1 showing correlation of Linc-ZNF469-3 with different features in TNBC…
stack_learner
  • 1,262
  • 14
  • 26