Most Popular

1500 questions
4
votes
2 answers

ROC analysis, CAFA 2 experiment

My question is about ROC curves used in CAFA2 experiment. In this paper they used the ROC analysis for the term-centric evaluation. In order to perform ROC curve analysis we should have a continuous variable and a classifier (categorical) variable.…
Amine
  • 91
  • 2
4
votes
4 answers

In GFF3, annotating more than one protein-coding gene (i.e. polycistronic) contained in a eukaryotic mRNA

Below is an example of a simple GFF3 file: 1 T1 gene 3631 4605 . + . ID=ATNG01010 1 T1 mRNA 3631 4605 . + . ID=ATNG01010.1;Parent=ATNG01010 1 T1 exon 3631 3913 . + . …
l0110
  • 292
  • 1
  • 10
4
votes
1 answer

Intersecting two different files with one "master" file based on different columns

I have the following sets of data: file1: 1 15776220 15776240 GTGACCAGCAGGTGTCTCTG 16855676 16855696 CTGTCCAGCAGAGGGCGGTG file2: 1 15776231 2 5008 G:5002 A:6 1 15776239 3 5008 C:3358 A:14 G:1636 file3: 1…
rishi
  • 353
  • 1
  • 8
4
votes
2 answers

Cross-reference with PDB database

I have a list of several thousand proteins and their UNIPROT IDs. I'm looking for an efficient method of cross-referencing it against the PDB tertiary structure database, and get a list of those proteins with a tertiary structure in the PDB…
Adrian Smith
  • 357
  • 1
  • 7
4
votes
1 answer

Inspection of gene expression in scRNA-seq data

I am running the data preprocessing pipeline for scRNA-seq data presented here. 3.8.6.1 Gene expression In addition to removing cells with poor quality, it is usually a good idea to exclude genes where we suspect that technical artefacts may have…
gc5
  • 1,783
  • 18
  • 32
4
votes
1 answer

replacing SNPs with missing calls with a specific string

I have a big file containing 27 columns and nearly 6 million rows. The following is a little example of my file head data 0.65 0.722222 1.0 0.75 0 0.35 0.277778 0.0 0.25 0 0 0.666667 0.75 0.5 0.5625 0 …
Anna1364
  • 516
  • 2
  • 8
4
votes
2 answers

Using column 2 of one file to match with two columns of another file, and append

I have file 1 like following: 1 15776220 15776240 GTGACCAGCAGGTGTCTCTG 16855676 16855696 CTGTCCAGCAGAGGGCGGTG And file 2 as following 1 15776231 2 5008 G:5002 A:6 1 16855677 2 5008 A:5003 C:5 I am…
rishi
  • 353
  • 1
  • 8
4
votes
3 answers

How to compare transcriptomic profiles of two cell types (single cell RNA-seq)?

I found this interesting Single RNA-seq data set in GEO, but I am not sure how to analyze it appropriately. They have deposited transcriptomic profiles of human and mouse pancreatic islets (pancreatic cells: Beta cells, Delta, etc). The problem I…
MEhsan
  • 157
  • 3
4
votes
1 answer

Installing RnBeads via bioconductor - .onLoad failed in loadNamespace(), call: NULL

I am trying to install RnBeads from bioconductor but the installation of its dependancy TxDb.Hsapiens.UCSC.hg19.knownGene fails. The error is: biocLite("RnBeads") BioC_mirror: https://bioconductor.org Using Bioconductor 3.6 (BiocInstaller 1.28.0), R…
Bluescreen
  • 116
  • 6
4
votes
1 answer

Using Python, how to convert a pandas DataFrame into a VCF?

Let's say I have a pandas dataframe with fields CHROM, POS, ALT, REF. In this special case, I also wouldn't care about IDor FILTER, INFO could be blank (or meaningless) and we'll write QUAL as each 40. import pandas as pd example_dict =…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
4
votes
1 answer

changing color key range to specified range in heatmap.2 function

I have a tab separated text file as shown below a 0.311 0.510 0.123 0.002 0.001 0.417 0.572 0.074 0.169 b 0.324 0.592 0.070 0.028 0.028 0.535 0.535 0.127 0.113 I am trying to use heatmap.2 function from gplots…
user3138373
  • 420
  • 1
  • 5
  • 13
4
votes
1 answer

Duplicate gene symbol handling in GEO gene expression data

I have downloaded a gene expression data from GEO database (GSE3268) in which in some of its rows there are duplicate gene symbols. For example TP53 exists in two rows with different expression values and different GenBank Accession IDs. They have…
Majid
  • 143
  • 5
4
votes
1 answer

Determine if a gene is mitochondrial or not

I need to determine whether a gene is mitochondrial or not for C. Elegans automatically from its name by running a regular expression on a dozen of thousands of gene names. Currently I am having gene symbols like: clec-190, clec-189, 21ur-8912,…
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
4
votes
1 answer

Where can I get the population allele frequency vcf file?

I want to use GATK to estimate cross-sample contamination for Whole Genome Sequencing data. The specific tool is ContEst and it is run with: java -jar GenomeAnalysisTK.jar \ -T ContEst \ -R reference.fasta \ -I:eval tumor.bam \ -I:genotype…
gc5
  • 1,783
  • 18
  • 32
4
votes
2 answers

Finding a single open reading frame with ribosomal binding site, using Biopython

I'm given a Fasta file, containing a large DNA(over 115,000 long) sequence, and I am tasked with finding a single large open reading frame contained within the DNA sequence using Biopython. I'm aware this has been asked before consistently however…
daenwaels
  • 41
  • 1