Most Popular
1500 questions
4
votes
2 answers
ROC analysis, CAFA 2 experiment
My question is about ROC curves used in CAFA2 experiment. In this paper they used the ROC analysis for the term-centric evaluation. In order to
perform ROC curve analysis we should have a continuous variable and a classifier (categorical) variable.…
Amine
- 91
- 2
4
votes
4 answers
In GFF3, annotating more than one protein-coding gene (i.e. polycistronic) contained in a eukaryotic mRNA
Below is an example of a simple GFF3 file:
1 T1 gene 3631 4605 . + . ID=ATNG01010
1 T1 mRNA 3631 4605 . + . ID=ATNG01010.1;Parent=ATNG01010
1 T1 exon 3631 3913 . + . …
l0110
- 292
- 1
- 10
4
votes
1 answer
Intersecting two different files with one "master" file based on different columns
I have the following sets of data:
file1:
1 15776220 15776240 GTGACCAGCAGGTGTCTCTG 16855676 16855696 CTGTCCAGCAGAGGGCGGTG
file2:
1 15776231 2 5008 G:5002 A:6
1 15776239 3 5008 C:3358 A:14 G:1636
file3:
1…
rishi
- 353
- 1
- 8
4
votes
2 answers
Cross-reference with PDB database
I have a list of several thousand proteins and their UNIPROT IDs. I'm looking for an efficient method of cross-referencing it against the PDB tertiary structure database, and get a list of those proteins with a tertiary structure in the PDB…
Adrian Smith
- 357
- 1
- 7
4
votes
1 answer
Inspection of gene expression in scRNA-seq data
I am running the data preprocessing pipeline for scRNA-seq data presented here.
3.8.6.1 Gene expression
In addition to removing cells with poor quality, it is usually a good idea to exclude genes where we suspect that technical artefacts may have…
gc5
- 1,783
- 18
- 32
4
votes
1 answer
replacing SNPs with missing calls with a specific string
I have a big file containing 27 columns and nearly 6 million rows. The following is a little example of my file
head data
0.65 0.722222 1.0 0.75 0
0.35 0.277778 0.0 0.25 0
0 0.666667 0.75 0.5 0.5625
0 …
Anna1364
- 516
- 2
- 8
4
votes
2 answers
Using column 2 of one file to match with two columns of another file, and append
I have file 1 like following:
1 15776220 15776240 GTGACCAGCAGGTGTCTCTG 16855676 16855696 CTGTCCAGCAGAGGGCGGTG
And file 2 as following
1 15776231 2 5008 G:5002 A:6 1 16855677 2 5008 A:5003 C:5
I am…
rishi
- 353
- 1
- 8
4
votes
3 answers
How to compare transcriptomic profiles of two cell types (single cell RNA-seq)?
I found this interesting Single RNA-seq data set in GEO, but I am not sure how to analyze it appropriately.
They have deposited transcriptomic profiles of human and mouse pancreatic islets (pancreatic cells: Beta cells, Delta, etc). The problem I…
MEhsan
- 157
- 3
4
votes
1 answer
Installing RnBeads via bioconductor - .onLoad failed in loadNamespace(), call: NULL
I am trying to install RnBeads from bioconductor but the installation of its dependancy TxDb.Hsapiens.UCSC.hg19.knownGene fails. The error is:
biocLite("RnBeads")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R…
Bluescreen
- 116
- 6
4
votes
1 answer
Using Python, how to convert a pandas DataFrame into a VCF?
Let's say I have a pandas dataframe with fields CHROM, POS, ALT, REF. In this special case, I also wouldn't care about IDor FILTER, INFO could be blank (or meaningless) and we'll write QUAL as each 40.
import pandas as pd
example_dict =…
ShanZhengYang
- 1,691
- 1
- 14
- 20
4
votes
1 answer
changing color key range to specified range in heatmap.2 function
I have a tab separated text file as shown below
a 0.311 0.510 0.123 0.002 0.001 0.417 0.572 0.074 0.169
b 0.324 0.592 0.070 0.028 0.028 0.535 0.535 0.127 0.113
I am trying to use heatmap.2 function from gplots…
user3138373
- 420
- 1
- 5
- 13
4
votes
1 answer
Duplicate gene symbol handling in GEO gene expression data
I have downloaded a gene expression data from GEO database (GSE3268) in which in some of its rows there are duplicate gene symbols.
For example TP53 exists in two rows with different expression values and different GenBank Accession IDs.
They have…
Majid
- 143
- 5
4
votes
1 answer
Determine if a gene is mitochondrial or not
I need to determine whether a gene is mitochondrial or not for C. Elegans automatically from its name by running a regular expression on a dozen of thousands of gene names. Currently I am having gene symbols like:
clec-190, clec-189, 21ur-8912,…
Nikita Vlasenko
- 2,558
- 3
- 26
- 38
4
votes
1 answer
Where can I get the population allele frequency vcf file?
I want to use GATK to estimate cross-sample contamination for Whole Genome Sequencing data.
The specific tool is ContEst and it is run with:
java
-jar GenomeAnalysisTK.jar \
-T ContEst \
-R reference.fasta \
-I:eval tumor.bam \
-I:genotype…
gc5
- 1,783
- 18
- 32
4
votes
2 answers
Finding a single open reading frame with ribosomal binding site, using Biopython
I'm given a Fasta file, containing a large DNA(over 115,000 long) sequence, and I am tasked with finding a single large open reading frame contained within the DNA sequence using Biopython.
I'm aware this has been asked before consistently however…
daenwaels
- 41
- 1