6

This is a statistical version of my Math.SE post.

Given natural numbers $b$ and $r$, uniformly randomly choose $b+r$ points within a unit square. Call the $b$ points the blue points and the $r$ points red points. Let $p(b,r)$ be the probability that $H_r$, the convex hull of red points, overlaps with $H_b$, that of the blue points. How do I efficiently estimate $p(b,r)$ via Monte Carlo (or perhaps better) methods?

I can think of averaging the probability over a large number of randomly chosen test cases, but how can I sample the search space in a way that gives me error bounds?

PKG
  • 461

1 Answers1

1

I know it's been a while since @Ganesh posted this question, but hopefully you're still interested in a response.

I've written some R code that does what you want, I think:

library(ggplot2)
library(MASS)

#################################################
#################################################
## Steps:                                      ##
##  1. draw b 'blue' points and r 'red' points ##
##  2. perform LDA                             ##
##  3. check error rate                        ##
#################################################
#################################################

##################################################
# function: drawPoints                           #
#                                                #
# description: draws b + r points uniformly from #
#              a unit square                     #
#                                                #
# inputs: b - number of 'blue' points to draw    #
#         r - number of 'red' points to draw     #
#                                                #
# outputs: data frame containing points          #
##################################################

drawPoints <- function(b,r) {
    x <- runif(b+r)
    y <- runif(b+r)
    class <- c(rep('b',b),rep('r',r))
    return(data.frame(x = x, y = y, class = class))
}

##################################################
# function: checkOverlap                         #
#                                                #
# description: if the data is linearly separable #
#              there is no overlap in the convex #
#              hulls of the different classes    #
#                                                #
# inputs: df - data frame containing classified  #
#              points                            #
#                                                #
# outputs: FALSE if 0 error rate                 #
#          TRUE otherwise                        #
##################################################

checkOverlap <- function(df) {
    disc.anal <- lda(class ~ x + y, data = df)
    return(!identical(predict(disc.anal)$class,df$class))
}

######################################################
######################################################
## Simulate many trials to estimate rate of overlap ##
######################################################
######################################################

trials <- 1000
performance <- rep(as.numeric(NA),10)
for(i in 1:10) {
    results <- replicate(n = trials, expr = checkOverlap(drawPoints(10,i)))
    performance[i] <- prop.table(table(results))['TRUE']
}
qplot(x = 1:length(performance), y = performance,
  xlab = 'Number of Red Points',
  ylab = 'Proportion of Simulations with Overlap',
  main = 'Proportion of Simulations with Overlap, 10 Blue Points')