0

I have this following data frame, df for which I desire to plot a histogram.

     x
1   -28313937
2   -218616099
3   -18406124
4   20307666
5   31985283
6   41429217
7   46488567
8   47690792
9   51127321
10  53168291
11  55247883
12  -49200409
13  33398814
14  36198419
15  42765257
16  45857195
17  43870899
18  50557988
19  49574516
20  52317786
21  50769743

I use the following piece of code for plotting the histogram,

R_hist <- ggplot(df, aes(x=x)) + 
geom_histogram(binwidth=.5, colour="black", fill="white") + 
geom_vline(aes(xintercept=mean(x, na.rm=T)), color="violet", linetype="dashed", size=1)

When I tried to call the object R_hist, I get an Error : cannot allocate vector of size 4.1 Gb In addition: Warning messages: 1: In seq.default(round_any(range[1], size, floor), round_any(range[2], : Reached total allocation of 4021Mb: see help(memory.size)

Could someone please let me know why the histogram is not being plotted as it should here

Thanks.

Amm
  • 1,629
  • 4
  • 17
  • 25
  • Can you make your problem reproducible? – Roman Luštrik Jan 29 '14 at 14:56
  • 3
    You're trying to plot a bar for every value between `-218616099` and `55247883` in 0.5 increments... do you want 21 bars with a height indicated in `x`? ... FWIW, that is a vector of 500 million values, which winds up being too large to allocate. – Justin Jan 29 '14 at 14:57
  • @RomanLuštrik Reproducible in what sense? I tried using a different name for the graph object still got the same error though – Amm Jan 29 '14 at 14:58
  • @Justin Thanks for your comment. Yes, indeed I want 21 bars with height indicated in x – Amm Jan 29 '14 at 14:59
  • 1
    Give us the data and the code you use to plot. Here are some tips on how to do that: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik Jan 29 '14 at 15:00
  • @RomanLuštrik see my answer for a method to grab the data provided. – Justin Jan 29 '14 at 15:05
  • @Justin I wanted it to be pedagogical. :) – Roman Luštrik Jan 29 '14 at 16:54

1 Answers1

1

as indicated in the comments, you're trying to plot a histogram with a bar from the min to max value in df$x.

Instead, use geom_bar and stat='identity':

# grab the data provied
df <- read.table('clipboard')

# switch the names cause it'll bug me
df$y <- df$x
df$x <- row.names(df)

# plot using some identifier (row.names in this case)
ggplot(df, aes(x=x, y=y)) + geom_bar(stat='
Roman Luštrik
  • 67,056
  • 24
  • 151
  • 191
Justin
  • 41,121
  • 9
  • 89
  • 109
  • Thanks for your suggestion – Amm Jan 29 '14 at 15:09
  • How can I make a boxplot for this data in ggplot excluding the negative values. `boxplot(df)` plots the entire data – Amm Jan 30 '14 at 10:14
  • 1
    @Amm I strongly encourage you to read a few intro to R guides. Specifically you want to look into subsetting. However, in this instance, you would use `boxplot(df[df$x>0,])` – Justin Jan 30 '14 at 14:25
  • Thanks for the tip. I will look into subsetting. – Amm Jan 30 '14 at 14:28