I'm currently having a heated debate with coworkers on whether it's acceptable to use estimates derived directly from the data as starting parameters for modeling.
For example, if I want to fit a Normal distribution on a dataset, my coworker thinks it's acceptable to compute the mean and std from the dataset, and use these values as starting parameters.
I think that regardless of the method used, it's providing the data to the model twice, and thus unreasonable: doesn't that mean the fitting algorithm will start where you would like it to end up, and a good way to ensure it will not explore the parameter space?
Can you point me to a reference for good practice regarding this particular issue?
Is it acceptable to use actual estimates from the data, not reasonable guesses, as starting parameters (provided I don't try more than one set of starting values, which is already a mistake, I know)
– Dadabazooka Apr 06 '20 at 16:45