There is a library on GitHub called timeseriescv which implements Combinatorial CV. I am trying to use it in conjunction with GridSearchCV. However, unlike normal sklearn cross validators which have a function called "get_n_splits" which returns the number of splits, this package does not have this function. The docstring states: "The samples are decomposed into n_splits folds containing equal numbers of samples, without shuffling. In each cross validation round, n_test_splits folds are used as the test set, while the other folds are used as the training set. There are as many rounds as n_test_splits folds among the n_splits folds." So if I was to implement a function called get_n_splits, how would I approach this if I have n_splits and n_test_splits? Here is a snippet of code:
from timeseriescv.cross_validation import CombPurgedKFoldCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
cv = CombPurgedKFoldCV(n_splits=5, n_test_splits=2)
cv.X = X
cv.y = y
cv.pred_times = train.Pred_Date
cv.eval_times = train.Eval_Date
print(cv.n_splits, cv.n_test_splits)
for i, (train_indexes, test_indexes) in enumerate(cv.split(X=X, y=y, pred_times=train.Pred_Date, eval_times=train.Eval_Date)):
param_grid = {'C': [0.1, 0.5, 0.75, 1, 1.5],
"tol": [1e3, 1e4],
"max_iter": [500, 1000]}
model = LogisticRegression(fit_intercept=True, class_weight="balanced")
gs = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=cv, verbose=0) # [(train_indexes, test_indexes)]
gs.fit(X, y)
Here is the error:
n_splits = cv_orig.get_n_splits(X, y, groups)
AttributeError: 'CombPurgedKFoldCV' object has no attribute 'get_n_splits'