What would be the best way to sample from Cantor distribution? It only has cdf and we can't invert it.
Asked
Active
Viewed 3,877 times
23
-
5Actually, someone asked it on Mathematics : http://math.stackexchange.com/questions/1115907/can-you-simulate-from-a-cantor-distribution – RUser4512 Aug 12 '16 at 15:25
-
1Here are some interesting follow-up questions: what is the standard deviation? What is the moment-generating function? How do they compare to their counterparts for the Uniform$(0,1)$ distribution? :-) – whuber Aug 12 '16 at 16:20
-
9I like the infinite loop you guys have created by referencing the math.stackexchange post, which links back here :p – Tasos Papastylianou Aug 13 '16 at 02:12
1 Answers
27
Easy: sample from a Uniform$(0,1)$ distribution and recode from binary to ternary, interpreting each "1" as a "2". (This is the inverse probability transform approach: it does indeed invert the CDF!)
Here is an R implementation, written in a way that ought to port readily to almost any computing environment.
binary.to.ternary <- function(x) {
y <- 0
x <- round(2^52 * x)
for (i in 1:52) {
y <- y + 2*(x %% 2)
y <- y/3
x <- floor(x/2)
}
y
}
n <- 1000
x <- runif(n)
y <- binary.to.ternary(x)
plot(ecdf(y), pch=".")
whuber
- 322,774
-
4Earlier this year I started a slightly fuller implementation at https://github.com/Henrygb/CantorDist.R with functions
rCantor(),qCantor(),pCantor()and a less meaningfuldCantor()– Henry Aug 12 '16 at 19:06 -
4@Henry What would
dcantorimplement? As Tim notes, this distribution has no density. It does not have any discrete atoms, either. It's the archetypal example of a continuous but not absolutely continuous distribution. (I like the implementation ofqcantor, BTW--it's likely fast by virtue of its exploitation of matrix multiplication.) – whuber Aug 12 '16 at 19:08 -
It can attempt to give a discrete probability with
dCantor(x, continuous=FALSE)which is the change in the cumulative probability over a step of $2^{-32}$, which usually gives zero but for example positive for $x=1/4$. But withcontinuous=TRUEit attempts a density which is then zero or rarely infinite. I accept neither is meaningful in this context, but I included thedfunction for completeness. For me a more useful topic would be improvingpCantor– Henry Aug 12 '16 at 19:40 -
@Henry I would suggest first improving
qCantor. It loses precision for values near $0$ and $1$. To regain the lost precision, first reduce the calculation to arguments between $0$ and $1/2$. Then scale the argument into the range $[1/2, 1)$ before computing its digits, and scale back when converting to ternary. Once you have that,qCantorwill be an invaluable help in testingpCantor: comparepCantor(qCantor(x))-xto zero for a wide selection of argumentsx. – whuber Aug 12 '16 at 20:54 -
@Henry
pCantorhas a subtle bug: floating point imprecision in computingmcan seriously alter the calculated digits. First adding $3^{-32}$ tomshould solve the problem, as inm <- floor(((as.matrix(q) + 3^(-bits)) %*% 3^(1:bits)) %% 3). This change alone will give you a much improved version. Compare the two versions by plotting the "round-trip" discrepancy:curve(pCantor(qCantor(x)) - x, 0, 1, n=2^9+1). Coupling this with the change I previously recommended will solve the problems for values close to $1$. – whuber Aug 12 '16 at 22:43 -
1We must keep in mind that we're only dealing with a finite approximation to the actual distribution. Let's say we had 10 ternary digit precision numbers (in practice they'll be longer), and we generated 0.0222020002 to "represent" a variable whose digits extend further. While the same comment applies to any real-valued r.v. with a continuous rv all the "represented" values the finite length approximation could stand for are also "in the set". In the actual Cantor distribution, almost all the "continuations" of that ten-digit sequence are not in the set. ...ctd – Glen_b Aug 13 '16 at 03:48
-
ctd... it's a bit like drawing a Koch snowflake -- we can "represent" it by say taking a fourth iterate of the process of putting a triangular bump in each straight line, and it looks kind of neat, but that image we generate isn't the snowflake and doesn't share its properties (e.g. it only has a finite perimeter). Whether this matters for this case depends on what you're trying to do with the numbers you generate; in some situations you may want to consider that when the next digits after that last digit might potentially matter you may need to then generate further digits. – Glen_b Aug 13 '16 at 03:52
-
1@Glen_b The RNG described here generates full double-precision values--indeed, with more precision than most RNGs found in
R. Every method of generating random values from a non-discrete distribution is a "finite approximation." A true random variate from a distribution with no atoms almost surely is not computable or even representable in finite space. What might be of more practical interest in this case is that unsophisticated attempts to implement the PDF and CDF can lose substantial precision in the conversions between binary and ternary. – whuber Aug 13 '16 at 16:50 -
2@whuber I clearly acknowledged that every method of generating random numbers is finite precision in my second sentence. That you chose to repeat it and the emphasis that you gave it suggests you missed my actual point there; when I represent a continuous variate to finite precision, the real values that such a finite approximation could represent are "in the set" we want to generate from. When I represent a variable such as this to finite precision, the real values such a finite approximation could represent are almost all not in the set. It's rather a different case. ... ctd – Glen_b Aug 13 '16 at 22:44
-
2ctd ... no criticism of your post was implied; it was a point that readers might overlook, and may want to consider, particularly if they're trying to infer properties of the Cantor set by simulating from it. – Glen_b Aug 13 '16 at 22:51
