I have a mathematical model with two main "stages" 1. data loading (only a few KB of data) and 2. running a model on the data. I want to implement some tests and checks to ensure the data being fed into the model are valid (correct types, correct values etc.) to ensure the model output is correct.
At what stage in the whole process is the best time to implement these tests - when loading the data or when running the model on the data? Or should tests be run at both stages?
Running tests only at the data loading stage reduces the overall overhead, and guarantees correct model output. Running tests on the model stage results in some duplication of code (as well as extra overhead if tests are already implemented at the data loading stage).
This model will only ever be run once a week by internal users, so there are no external applications or liabilities / dependent parties. However, the model object is available to the user without having to use the (separate) data loading object (so again, data can, but really shouldn't, be passed into the model from the user without having gone through the data loading stage).
Is there a best practice here? Could there be a better way to structure the code to implement the tests?