6

Let's say I have following example GRanges:

> library(GenomicRanges)
> gr = GRanges(
+     seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+     ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
+     strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
+     score = 1:10,
+     GC = seq(1, 0, length=10))
> gr
GRanges object with 10 ranges and 2 metadata columns:
    seqnames     ranges strand |     score                GC
       <Rle>  <IRanges>  <Rle> | <integer>         <numeric>
  a     chr1 [101, 111]      - |         1                 1
  b     chr2 [102, 112]      + |         2 0.888888888888889
  c     chr2 [103, 113]      + |         3 0.777777777777778
  d     chr2 [104, 114]      * |         4 0.666666666666667
  e     chr1 [105, 115]      * |         5 0.555555555555556
  f     chr1 [106, 116]      + |         6 0.444444444444444
  g     chr3 [107, 117]      + |         7 0.333333333333333
  h     chr3 [108, 118]      + |         8 0.222222222222222
  i     chr3 [109, 119]      - |         9 0.111111111111111
  j     chr3 [110, 120]      - |        10                 0
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

It is straightforward to subset this GRanges as one would subset an R data.frame, e.g.

> gr$score
 [1]  1  2  3  4  5  6  7  8  9 10

However, I am writing a function where the user would pass through a string for the metadata column to subset e.g.

subsetter = function(granges, column){
    return(granges$column)
}

I try to use the function as follows:

> subsetter(gr, score)

which outputs NULL

This obvious doesn't work, as it will subset the gr by column, not score.

> subsetter(gr, score)
NULL

If this was a data.frame, I could subset as follows:

gr[[(score)]]

but this gives the error

Error in gr[[(score)]] : this S4 class is not subsettable

Are there any other ways to subset a GRanges besides using $?

ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
EB2127
  • 1,413
  • 2
  • 10
  • 23

3 Answers3

6
subsetter = function(gr, cname) {
    return(mcols(gr)[[cname]])
}

You can then use things like subsetter(gr, "GC") and subsetter(gr, "score").

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
5

In such cases it is best to check the str of an object gr. Then we could see that meta data is just a dataframe inside S4 object:

gr@elementMetadata
# DataFrame with 10 rows and 2 columns
#        score        GC
#    <integer> <numeric>
# 1          1 1.0000000
# 2          2 0.8888889
# ...

Once we know how to access the dataframe, we can then subset the same way as we would subset a dataframe:

gr@elementMetadata[, "score" ]

I don't see the point of wrapping this into a function, but if it is needed:

subsetter <- function(gr, cname) gr@elementMetadata[, cname ]

# test the function
subsetter(gr, "score")
# [1]  1  2  3  4  5  6  7  8  9 10

Edit

Alternatively, we could use GenomicRanges::elementMetadata() function to extract dataframe then subset:

subsetter <- function(gr, cname) elementMetadata(gr)[, cname ]
zx8754
  • 1,042
  • 8
  • 22
  • 1
    While possible, I’d generally avoid accessing S4 object slots: unless explicitly documented, they should be considered an implementation detail that may change without warning. Use the public accessors instead (in this case, mcols). – Konrad Rudolph Jun 04 '18 at 13:58
  • @KonradRudolph good point, added alternative solution. – zx8754 Jun 04 '18 at 14:33
  • 1
    Ah … well for that function the documentation says not to use it. ;-) – Konrad Rudolph Jun 04 '18 at 14:35
  • @KonradRudolph I see, should have read the manuals first. Thank you. – zx8754 Jun 04 '18 at 14:36
4

In addition to Devon’s answer, you can use non-standard evaluation to make this work with R symbol names rather than string arguments:

subsetter = function (gr, cname) {
    cname = as.character(substitute(cname))
    mcols(gr)[[cname]]
}

Now using subsetter(gr, score) works analogously to gr$score, which may be desirable. But, likewise, you’ll no longer be able to use a variable:

var = 'score'
subsetter(gr, var) # fails, same as `gr$var`
Konrad Rudolph
  • 4,845
  • 14
  • 45