1

This might seem like a duplicate of this question, but in fact I want to expand the original question.

I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median)

My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):Annotes are centering in the middle of the group rather than plotting on the appropriate boxplot

Community
  • 1
  • 1
Ratnanil
  • 1,544
  • 14
  • 35

1 Answers1

1

is it what you want?

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

enter image description here

MLavoie
  • 9,277
  • 40
  • 37
  • 54
  • exactly what I needed, thank you! Can I post a followup question? Please request deletion of this comment if this is bad practice. Otherwise: Using a factor to determin the position of the text in relation to the group median leads to unwanted behaviour if the values are far apart. Using a fixed value (e.g. median(x)+5) makes the function only usable for one range of values. Is there a way to determin the y value of the text within the stat_summary() command? – Ratnanil Jan 17 '16 at 15:16
  • thanks for accepting the answer !to your comment, I am not sure, but I am personally more manual, but if you want to put label exactly where you want geom_text() is probably the best option – MLavoie Jan 17 '16 at 15:59
  • Alternative: instead of using a Multiple of the median per subgroup is there a way to access the median of the whole dataset and use a multiple of that value? – Ratnanil Jan 19 '16 at 09:00
  • I think it's possible; take a look at this http://docs.ggplot2.org/dev/geom_boxplot.html and scroll down, you will see an example where you can draw a boxplot with your own computations – MLavoie Jan 19 '16 at 10:02