I have a set of genomic ranges that are potentially overlapping. I want to count the amount of ranges at certain positions using R.
I'm Pretty sure there are good solutions, but I seem to be unable to find them.
Solutions like cut or findIntervals don't achieve what I want as they only count on one vector or accumulate by all values <= break.
Also countMatches {GenomicRanges} doesn't seem to cover it.
Probably one could use Bedtools, but I don't want to leave R.
I could only come up with a hilariously slow solution
# generate test data
testdata <- data.frame(chrom = rep(seq(1,10),10),
starts = abs(rnorm(100, mean = 1, sd = 1)) * 1000,
ends = abs(rnorm(100, mean = 2, sd = 1)) * 2000)
# make sure that all end coordinates are bigger than start
# this is a requirement of the original data
testdata <- testdata[testdata$ends - testdata$starts > 0,]
# count overlapping ranges on certain positions
count.data <- lapply(unique(testdata$chrom), function(chromosome){
tmp.inner <- lapply(seq(1,10000, by = 120), function(i){
sum(testdata$chrom == chromosome & testdata$starts <= i & testdata$ends >= i)
})
return(unlist(tmp.inner))
})
# generate a data.frame containing all data
df.count.data <- ldply(count.data, rbind)
# ideally the chromosome will be columns and not rows
t(df.count.data)