0

I'm interviewing for some quantitative researcher positions at some hedge funds, and I've been told that there will be one interview session focused on stats, and one focused on ML, among others. This made me realize that I have a hard time distinguishing between stats and ML because there's such a great deal of overlap, although I think I have some kind of idea of what parts of stats might not be typically apart of ML.

I took stats courses in college and high school, and never were the words "machine learning" mentioned in those courses. In those courses, I recall more of a focus on things (most of which I have forgotten) like hypothesis testing, basic probability and common distributions, confidence intervals, bar/box plots/histograms, and univariate regression.

What part of stats would you consider to not typically be part of ML and are important for practical data inference/analysis/prediction?

  • 4
    It may be easier to answer opposite question: what are typical tasks, or problems, related to machine learning. – Tim Jun 27 '20 at 16:49
  • @Tim I may not understand, but is that asking what part of stats is also commonly used in ML? And then take the complement of that? – user5965026 Jun 27 '20 at 16:51
  • 2
    Simplified view: Supervised ML is a subset of statistical modeling, while unsupervised ML is (mainly) part of multivariate stats. Everything else is stats but not ML, like e.g. inferential stats, mathematical statistics etc. Why didn't you hear the ML stuff in stats? Because talking about algorithms is often considered not too interesting from math point of view. – Michael M Jun 27 '20 at 17:09
  • 2
    Please do report back to us how they draw the line (assuming an NDA doesn’t prevent that). Those topics you say you’ve forgotten—confidence intervals and hypothesis testing, etc—most certainly will come up in the statistics interview, so do review them. – Dave Jun 27 '20 at 17:37
  • 2
    My attempt at definitions delineating the two are at https://fharrell.com/post/stat-ml . You'll see some examples there, e.g. logistic regression and lasso are statistical models and random forest or neural networks are ML. – Frank Harrell Jun 27 '20 at 17:50
  • @Dave Will do. I know the majority of these type of firms will ask questions about linear regression and OLS. I suppose that's more traditional stats, but it's a heavy part of ML now too. Based on looking at some glassdoor reviews, lots of OLS-styled questions, boostrap, cross validation, R^2, decision trees, multiple testing, time series. I would probably group bootstrap, R^2, multiple testing as stats and not ML. Times series is kind of its own thing but I would group it with stats. – user5965026 Jun 27 '20 at 18:05
  • For one example see https://stats.stackexchange.com/questions/422186/motivations-for-experiment-design-in-statistical-learning – kjetil b halvorsen Jan 02 '22 at 13:09
  • With the interview over, how did they distinguish the two? – Dave Jan 02 '22 at 14:57

0 Answers0