Take as an example the well-known Boston housing dataset. In the original paper, the regression model was used to model the amount people were willing to pay for improved air quality.
But the 'people' whose behaviour was being modelled was never specified. Presumably, it was not limited to 'residents of Greater Boston'. Only implausibly would it be 'humans anywhere on the planet'. Nor, 'all US residents'.
So what inference are we invited by the authors to draw about the scope they intend for their theory? Perhaps 'all residents of large US metro areas'. But - what about other First World metro areas? The first two sections of the Boston paper scrupulously avoid any reference which might suggest that they had a particular 'population' to which their 'sample' was to apply.
There is also the dimension of time - the data came from the 70s, whereas the theory is has no time limitations spelt out.
Plus - going back to first principles - how could the Boston data be treated as a sample of a population including, say, the Chicago metro area when no data from the Chicago area was considered?
Besides, lots of the covariates in the Boston regression weren't samples at all, but totals - so, for example, the proportion of blacks was supplied from census stats.
My question: how can we conceive the Boston data as supporting the geographically unlimited propositions in the paper? Is there a mathematical way of showing equivalence or approximation of the Boston data analysis to other cities? Or is it just assumed as a matter of common sense - such that the paper's authors would be amazed to think anyone could have any doubts on the matter?