2

My mentor wants me to write and submit an academic paper reporting a predictive model, but without any validation score.

Everything I have read in textbooks or the Internet says that this is wrong, but is there any case where only reporting a train score makes sense?

Background

The model was fit "by hand" by someone in our team, using a visual inspection of features extracted from our entire dataset. It is a linear model based on hand-crafted features extracted from some very nonlinear and high-dimensional data. The linear model is based on less than fifty features, but those features were extracted from thousands. We do not have any data left to use as validation.

Neil Slater
  • 28,918
  • 4
  • 80
  • 100
user82841
  • 23
  • 2
  • 1
    Dealing with the request be better covered in https://academia.stackexchange.com/ - however the users of this site can point out the technical reasons why publishing training-set-only results are considered weak for assessing models (e.g. trivial models such as a list of all known inputs and desired outputs will also fit). However, there might be exceptions - e.g. if the model is constrained by underlying theory and has too few free parameters to fit arbitrary data, then a low error is still meaningful and there may even be a way to measure that – Neil Slater Sep 26 '19 at 13:07
  • Could you clarify whether the predictive model is driven by some theoretical equation with only a few free parameters? Or is it a statistical model with many parameters (e.g. a neural network)? – Neil Slater Sep 26 '19 at 13:09
  • It is a linear model based on hand-crafted features extracted from some very nonlinear and high-dimensional data. The linear model is based on less than fifty features, but those features were extracted from thousands. – user82841 Sep 26 '19 at 23:07

3 Answers3

1

You are right: decent journals/conferences are unlikely to accept a paper without a proper evaluation. Moreover the model is handcrafted, which probably means that it's hardly reproducible, right? And also that you can't do cross-validation I guess?

I think one would need pretty strong arguments to justify a contribution which has no scientific validation and cannot be reproduced. The only ways I can think of would be:

  • if the method solves a problem never solved before (or has specific characteristics which make it likely to solve such problem in the future)
  • if some sort of qualitative analysis demonstrates that the method is much better than state of the art approaches
Erwan
  • 25,321
  • 3
  • 14
  • 35
  • 1
    I actually automated the process as best I could and tried cross-validation. The positive results instantly went away, which could be a problem with the method or my implementation. – user82841 Sep 26 '19 at 22:28
  • @user82841: it looks exactly like the manually crafted model was overfit, and it's not that surprising: that's exactly the reason why proper validation is required in ML fields, otherwise many biases can pollute the results. I agree with your intuition, I'm afraid: this kind of result is unlikely to pass peer-review in a serious ML journal/conf. – Erwan Sep 26 '19 at 23:16
  • Our prediction task is novel, but closely related to already-published results. Unfortunately, the method is really not as groundbreaking as the cases you describe. – user82841 Sep 26 '19 at 23:23
  • Just saw your last comment. Yes, that's what I thought. Thank you for your frank perspective. – user82841 Sep 26 '19 at 23:33
1

I'm not sure in which field you work. However, there are academic fields where validation is unusual. A prominent example is Econometrics. The reason is that you usually come up with a theoretical model and try to translate this "data generating process" into a model which can be estimated empirically. Important to say that these model do not aim at making predictions. Their purpose is statistical inference. Usually very simple statistical models (linear regression, logit) are used, since it is easy to see "marginal effects" and variance there.

However, with respect to predictive models, the idea of having no validation set sounds a little strange to me. In any case you should have a look at the relevant literature in your field. A thoughtful literature review will surely give you a good clou.

Peter
  • 7,446
  • 5
  • 19
  • 49
  • We are analyzing medical records using machine learning. In the papers I have seen so far, the train-test split is the most common but many papers are unclear about what kind of validation, if any, was used. I will take your advice and review the literature more carefully. – user82841 Sep 26 '19 at 23:23
0

The most likely issue here is to do with

fifty features, but those features were extracted from thousands

If those features were selected according to a pre-data-analysis theory, and other selections were not considered, then a linear model that fit the data might be strong proof that the theory was plausible.

However, a linear model that fits well due to selection from a large feature set in order to make it fit is very likely to be overfit. You absolutely need a hold-out test data set in this case, as you have used your initial data to form a hypothesis, and have no proof of validity at all.

I cannot advise you whether to submit the paper or not. There may be ways you can word it to make it clear that the work establishes a hypothesis and does not validate it (but without making a song and dance about the lack of rigour in validation, as then you are undermining your own submission).

I think that as long as you do not try to obfuscate the lack of follow up work, and present results so far accurately, then it is a fair submission - it may then get rejected if a reviewer wants to see some validation, or it may get accepted and there will need to be follow up work that either validates or refutes the model in a second paper. That might be your work, it might be another team's.

How good/bad those scenarios are depends on how your field works in general. Perhaps ask with some relevant details on https://academia.stackexchange.com/ to gauge your response, as in some ways this is a people problem - how to please your mentor whilst retaining pride in your work and progressing your career (which in turn depends on a mix of pleasing your supervisor and performing objectively good work).

Your mentor may still be open to discussing the technical merits of the work. Perhaps they have not fully understood the implications that you are seeing for how the model was constructed. However, they might fully understand this, and may be able to explain from their view the merits of publishing at an early pre-validation stage for this project.

Neil Slater
  • 28,918
  • 4
  • 80
  • 100
  • Someone in my team invented a new feature extraction strategy , and later on we discovered that the features correlated with the prediction target. The trouble is, the features themselves have no domain explanation – user82841 Sep 27 '19 at 16:11
  • They features were created only as a first attempt at reducing an extremely large amount of data. To me, it does seem like a coincidence that – user82841 Sep 27 '19 at 16:15
  • they were useful. Should we reach out to a domain expert to find out whether they could be meaningful? – user82841 Sep 27 '19 at 16:18
  • @user82841: Finding an expert seems useful, but I cannot really advise you what to do next. If the features were really extracted using some rule independent of the prediction target, and then seem to have good predictive power, then there is something interesting going on, even if it is just some kind of data leak. One salient question - 50 features would still be a lot if you only had, say, 200 rows. – Neil Slater Sep 27 '19 at 16:36
  • Try comparing your results with 50 random normally distributed varaibles as features, with the same targets . . . getting a comparable result to your predictive model would be the end of the model. Getting a different result points you to there being something different from randomness in the selected features. Although of course you never quite know the p-value of a coincidence that has impacted the initial study, and have no immediate opportunity to resolve that – Neil Slater Sep 27 '19 at 16:37
  • Also, if you repeat the 50 random uncorrelated features test a few hundred times, you might get some sense of how much of a coincidence it would need to be for the feature reduction to have hit something apparently predictive by chance. – Neil Slater Sep 27 '19 at 16:41