Most Popular

1500 questions
4
votes
1 answer

Biopython: resseq doesn't match pdb file

I have a PDB file, and I need to extract its residue sequence numbers (resSeq's). Based on manual inspection of the first few lines of the PDB file (pasted below), I would think that resSeq's should begin with 22, 23. However, Biopython's PDB module…
GingerBadger
  • 191
  • 1
4
votes
2 answers

Why are Minimap2 alignments different with CIGAR generation flag?

I am using Minimap2 (v2.26-r1175) in Linux to generate a sequence alignment between the Streptomyces coelicolor A3(2) chromosome (ref.fa) and the Mycobacterium tuberculosis chromosome (query.fa). My desired output is a PAF (Pairwise mApping Format)…
Gawain
  • 315
  • 1
  • 10
4
votes
2 answers

BioPython bootstrap is not reliable?

Here i will show you a minimal working example of code and as you can see the support values for the tree is always 100. I am using synthetic sequences of 100bp for 6 elements. The sequences have been generated at random choosing from ATCG for each…
Mirko
  • 257
  • 5
4
votes
2 answers

RNAseq: Z score, Intensity, and Resources

I'm very new to bioinformatics in general, and I'm trying to understand some basic concepts. I have RNAseq data, and bioinformatics people tell me that intensities cannot be compared across patients. So there are all of these pipelines to compare…
julianstanley
  • 401
  • 3
  • 9
4
votes
2 answers

How to create a phylogenetic tree from diverse mitochondrial genomes

I would like to create a phylogenetic tree for the most species in my dataset. I'm starting with around 1200 species, but since it's not good practice to align short and long sequences I tried filtering only for the species with the [15k-18k bp]…
Mirko
  • 257
  • 5
4
votes
1 answer

How can I incorporate wildcards in Snakemake in an R script?

I am facing the following issue: I have a rule in Snakemake that looks something like this: rule somerule: input: tables = expand("results/{tables}_table.txt", tables = ["1", "2"]), output: edit= expand("results/{tables}_edited.txt", tables…
Classy Q
  • 43
  • 2
4
votes
1 answer

Asterisk (*) calculation method

Does anyone here know how to calculate a value for the asterisk (*) code that appears in substitution matrices? From my observation, to all pairs with a one asterisk, the lowest value from the matrix is set. For ** it's 1. Example: BLOSUM62. But,…
maciejwww
  • 227
  • 1
  • 14
4
votes
2 answers

Where to download a file with major and minor alleles at every position?

I want a list of all variants, i.e. sites which are known to vary between human to human. For example, it should ideally cover all sites in here, but without samples. I don't want a giant reference panel with many samples, nor a .fa which has no…
BigMistake
  • 543
  • 11
4
votes
0 answers

Linking GenBank records to biosamples (and vice versa) using edirect

Assume that I wish to find all complete human mitochondrial genome records on GenBank (or rather, NCBI nuccore) that also have an entry in NCBI's Biosample database. MYQUERY="(mitochondrion[TITLE] OR mitochondrial[TITLE]) \ AND complete…
4
votes
1 answer

How to get cytoband and gene level copy number from genome wide SNP array copy number data?

I have (human) Illumina genome wide SNP array copy number data. For each SNP genomewide, I have Log R Ratio (LRR) and B Allele Frequency (BAF). What tool(s) can I use to get the integer copy numbers (either -3 to +3 or 0 to inf) for each cytoband…
Sylvia Rodriguez
  • 257
  • 1
  • 10
4
votes
2 answers

How to analyse qualitatively the penetration ability of particles in spheroids using fluorescent z-stacks?

0 I have to establish the penetration profile of particles in tumour spheroids. For this I have spheroids composed of cells which were exposed to particles, both of which are fluorescently labeled. The spheroids were then imaged using a confocal…
Timi
  • 51
  • 3
4
votes
2 answers

Question about umap using different numbers of pca components as initialization

I am new to the scRNA-seq field and I have been doing some experiments of visualization of UMAP using different numbers of PCA components for initialization. The process involves projecting scRNA-seq data (count matrix) onto various numbers of PCA…
Zack
  • 43
  • 3
4
votes
1 answer

Can not launch bcftools using python's subprocess module, as it only accepts first command of commands list

I am trying to remove samples from a chromosome vcf file. I wrote a function that takes chromosome number and a list of samples to remove. When I try to run bcftools using subprocess module it only runs bcftools, as if I was running…
YKY
  • 171
  • 5
4
votes
0 answers

MergeBamAlignment error

I doing the alignment of samples following the GATK pipeline, and doing the MergeBamAlignment,like this: MergeBamAlignment \ -ALIGNED $path/file.unsorted.bam \ -UNMAPPED $path/file.unmapped.bam \ -O $path/file.merged.bam \ -R…
Rita Soares
  • 101
  • 2
4
votes
1 answer

blasting a refseq protein does not show the protein in the result set

Can anyone explain me, why I don't find a specific protein with a blast that was took before from the NCBI refseq database? Specifically, I was trying to blast the protein with the accession number "NP_420767" and its sequence, respectively, however…