1

0

I currently have a test data set that has 500k data points. I have an algorithm that process that data and returns some information. In order to establish the statistical significance of the results Id like to run a monte carlo simulation. I would do this by taking the:

  • Kurtosis
  • Std deviation
  • Mean
  • Skewness

And generating a series of randomized data sets, on which I would run my algorithm again.

How would I generated a data-set with the same number of data points that have the exact same kurtosis std deviation mean and skewness?

2 Answers2

1

If I understand you correctly, you assume a normal distribution (+ skew and kurtosis). If this is correct, you can use Fleishman's method. In R you can use the PoisNonNor package and for SAS der is also code available online. For further reading I recommend:

Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.

Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological methods, 17(3), 399.

Mr Pi
  • 1,407
  • 2
    How are you getting a normal distribution with skewness and kurtosis? – Dave Aug 13 '19 at 19:03
  • Maybe I just didn't express myself well, what I meant was a skewed normal distribution but I did not know what the adjective of kurtosis was, so I used the parenthesis – Mr Pi Aug 14 '19 at 06:44
0

If you are able to find the cumulative distribution function of your event, you can then sample random events according to that distribution using inverse transform sampling

HDLX
  • 1
  • 1
    The question is unclear. Your statement is correct but it is not clear that the OP can determine the exact cdf. – Michael R. Chernick Aug 13 '19 at 15:59
  • while I believe I could in theory calculate the cdf, I'm looking for something simpler. As a clarification, this is to determine the statistical significance of a trading algorithm. I apply the strategy to some data and then I want to use montecarlo to determine the stat sig of the initial result, maybe my logic is flawed somewhere? – lucas rodriguez Aug 13 '19 at 16:05