7

Software has code repositories and package managers, so installation and maintenance are done in one line (or click).

What are practices for managing data, instead? Apart from manually organized directories with readme files inside, I found a couple solutions:

However, drake is mostly about workflows, not maintenance. And CKAN is server-based solution for which I couldn't find a local "package manager".

What would such a package manager ideally do:

  • downloading datasets from source in one click
  • supporting multiple formats and conversion
  • maintaining the directory structure
  • updating the data from the repositories
  • supporting aliases for datasets
  • holding meta information about the datasets
  • tags

Have you seen anything like this? If not, maybe you'd share your practices of maintaining the data manually?

Anton Tarasenko
  • 3,641
  • 4
  • 20
  • 34

1 Answers1

4

Ecosystem image from Data Packages website

Mark Silverberg
  • 5,184
  • 14
  • 25