Count genomic ranges

Question

I have a set of genomic ranges that are potentially overlapping. I want to count the amount of ranges at certain positions using R.

I'm Pretty sure there are good solutions, but I seem to be unable to find them.

Solutions like cut or findIntervals don't achieve what I want as they only count on one vector or accumulate by all values <= break.

Also countMatches {GenomicRanges} doesn't seem to cover it.

Probably one could use Bedtools, but I don't want to leave R.

I could only come up with a hilariously slow solution

# generate test data
testdata <- data.frame(chrom = rep(seq(1,10),10),
                       starts = abs(rnorm(100, mean = 1, sd = 1)) * 1000,
                       ends = abs(rnorm(100, mean = 2, sd = 1)) * 2000)

# make sure that all end coordinates are bigger than start
# this is a requirement of the original data
testdata <- testdata[testdata$ends - testdata$starts > 0,]

# count overlapping ranges on certain positions
count.data <- lapply(unique(testdata$chrom), function(chromosome){
    tmp.inner <- lapply(seq(1,10000, by = 120), function(i){
        sum(testdata$chrom == chromosome & testdata$starts <= i & testdata$ends >= i)
    })
    return(unlist(tmp.inner))
})

# generate a data.frame containing all data
df.count.data <- ldply(count.data, rbind)

# ideally the chromosome will be columns and not rows
t(df.count.data)

So you only want overlaps where the start position is not contained within your interval range? What's the actual biological context to this? — Devon Ryan, Jun 13 '17 at 13:58
You could start by optimizing the code by using data.table (and probably using vapply instead of lapply). Also if one nucleotide overlaps between ranges then you count that too? — llrs, Jun 13 '17 at 14:02
@DevonRyan no what i want is that the respective counting position is with in the range, which is a genomic deletion (hence the biological context) — sargas, Jun 13 '17 at 14:55
@Llopis yes i want that counted as well. The list contains overlapping areas and i want to know how many of them are overlapping a certain position. Which is then used later for plotting. — sargas, Jun 13 '17 at 14:56
Hi sargas, thanks for your question and welcome to Bioinformatics Stack Exchange. If you have any additional story/context associated with your question (such as a desire to count genomic deletions), it can be helpful to include that in your question. This makes it easier to understand the question, and helps answerers to solve the problem you have rather than the question you're asking (see the XY problem for more information). — gringer, Jun 13 '17 at 19:27

score 6 · Accepted Answer · answered Jun 13 '17 at 14:04

6

GenomicRanges::countOverlaps seems to be what you’re after:

position_range = GRanges(position$chrom, IRanges(position, position, width = 1))
ranges_at_position = countOverlaps(position_ranges, granges)

answered Jun 13 '17 at 14:04

Konrad Rudolph

4,845
14
45

Count genomic ranges

1 Answers1