Obligatory caveat: Stats neophyte, R neophyte, trying to learn more.
I have daily traffic data for 3k URLs for the entirety of 2016. There is a broad cyclical seasonal trend that the majority of them share, but the extent to which this seasonality is expressed differs. There is a tremendous positive skew
to the distribution of traffic in the data (here the x axis shows the total amount of traffic).
I want to create six groups of 500 URLs for testing purposes and want to ensure that the variance in seasonality of each group is minimized so I can draw conclusions without running the rest for a year, but I don't really know how to proceed. I initially tried to do so by randomly assigning each URL to one of six groups, but since the skew is so nuts I don't really know how to test to ensure that the seasonality of my resultant groups is significantly different. Here
is a time series showing the variation in seasonality between each randomly generated group--the y axis is the % difference of the month relative to the month before it, which is why January isn't present.
To the eye, this looks like the same general trend is occurring, but there seems to be significant variation at certain points. Anyway, I'm hoping to do something reproducible so I don't want to just trust my gut.
I'm basically looking to apply the methodology found here for my own purposes. In this post, they indicated that they used t-tests to reinforce the assumption that the test groups aren't meaningfully different, but due to my data's skew, I don't think t-tests will tell me anything. I read about transforming the data using log transformation but even after that
the distribution wasn't remotely normal. After that I read about Box-Cox transformations but that got so confusing I couldn't make heads or tails of it--trying to execute it in R assumed I already had a linear model, and so far as I can tell I only have one variable.
Anyway, I'm really banging my head against a wall at this point. I would seriously appreciate any pointers you have. I'd already poked around CV and found stuff like this that didn't really help for reasons explained above. I'm not looking for a bulletproof solution, just something that can reasonably reduce some of the seasonal ambiguity.