2

I have DNA segment lengths (relative to chromosome arm, 251296 entries), as such:

0.24592963
0.08555043
0.02128725
...

The range goes from 0 to 2, and I would like to make a continuous relative frequency plot. I know that I could bin the values and use a histogram, but I would like to show continuity. Is there a simple strategy? If not, I'll use binning. Thank you!

EDIT:

I have created a binning vector with 40 equally spaced values between 0 and 2 (both included). For simplicity's sake, is there a way to round each of the 251296 entries to the closest value within the binning vector? Thank you!

Johnathan
  • 1,747
  • 4
  • 21
  • 29

1 Answers1

3

Given that most of your values are not duplicated and thus don't have an easy way to derive a value for plotting on the y-axis, I'd probably go for a density plot. This will highlight dense segment lengths i.e. where you have lots of segment lengths occurring near each other.

d <- c(0.24592963, 0.08555043, 0.02128725)
plot(density(d), xlab="DNA Segment Length", xlim=c(0,2))

enter image description here

Nathan S. Watson-Haigh
  • 4,923
  • 2
  • 17
  • 19
  • Hi! Thank you for your quick response. I did it and it looks nice. However, I would like some segments near the value to show up. So, I think that I use relative histogram AND plot the density function. – Johnathan May 12 '15 at 03:37