Recently I've created a similarity measure specifically designed for high dimensional time series data that contain a low number of observation (measure) occurances. Within this, the equation is meant to accurately assess the relationship between non-linear variables despite the presence of time lag and high variance. These attributes are inherent, almost omnipresent details that often limit the exploration of time series in my emerging biological field; and thus data sets for validation purposes are hard to come by.
The problem: Validation. Niche field. Essentially, I have no gold standard for validation.
External Data sets: Few, if any external data sets resemble both the time series structure and contained variable profile targeted; and this is central as the relationship between time and the nature of my variables is largely driving the investigation. (ie. results from 15 minute intervals vs. common 1 hour intervals are completely different and answering different questions).
Literature: True positive control variables based on supported literature are not so common once the previously mentioned prerequisites are considered. But my search continues as this would allow me to use my own data for validation purposed.
Primary question (to be read as one question): How does one get around this issue? Is it possible to still validate? How can I empirically prove that my equation isn't specified for the benefit of my exact data, but is simultaneously specified and beneficial for this niche field? Any creative ideas on ways to validate within your own data to support a strong conclusion that the measure could be applied for use in future external data sets?
Of note: I thought about synthesizing data sets, but a few statisticians I trust commented that synthetically introducing lags would lead to bias, as time lag within my field is hardly studied, and that doing so would put my analysis at the predisposition that lags are important.