1

so we do numerous types of ML experiments using a number of frameworks.

We've ended up writing an awful lot of boilerplate :

  1. read some configuration as to the experiment's parameters, variables, mode and so on
  2. create all the necessary output directories that we'll need for all the different combinations of epoch, parameter, variable etc.

And then we traverse those output directories again to collate and summarise the results

1. and 2. are fragile (i.e. they break when new requirements come in e.g. a new "mode" for the experiment, that alters the directory layout), tedious to write and maintain and, well, boilerplate.

Questions:

  • is there some pypi or conda package that can do some of this dredgery for us?
  • is there a neat design pattern or idiom someone could suggest? E.g. each class that writes to disk knows where it should write to and just creates the directory as needed (like lazy evaluation or something).
  • does anyone instead use an object store e.g. MongoDB instead of saving everything to a multitude of directories? How has that been? I can see that having potential.

Many thanks.

sming
  • 111
  • 3
  • MLFlow's Tracking component seems relevant. – Ben Reiniger Feb 15 '20 at 20:36
  • 1
    See also the older question https://datascience.stackexchange.com/q/1214/55122 – Ben Reiniger Feb 15 '20 at 20:38
  • @BenReiniger thanks for the links. I'm not really after experiment tracking (we're using sacred currently for that), but something that does all the path management/manipulation and directory-creating for you.

    I'm beginning to think that because the "shape" of most experiments on disk is dynamic and bespoke and hence there isn't a library or tool that can do it for you.

    – sming Feb 16 '20 at 21:20
  • 1
    I think part of the expected procedure with experiment trackers is that the filenames and directories don't have to be well-organized: one flat directory with random alphanumeric filenames (or maybe dated filenames) is fine, if you're using the experimenter's database/spreadsheet/whatever to find/sort/load the artifacts. – Ben Reiniger Feb 19 '20 at 16:05
  • 1
    @BenReiniger interesting point. Personally I prefer transparent/logical/human-friendly directory structures as opposed to an opaque one like that (e.g. iTunes...). It's like a safety net in case the tool you're using goes belly-up for some reason (e.g. metadata corruption, version upgrade issues, etc.) - you can always DIY (when the directory structure is logical and clear i.e. human friendly). – sming Feb 19 '20 at 16:46

0 Answers0