63

thanks a lot for your discussions on the original post.

following your suggestions, let me re-phrase a bit :

kdb is known for its efficiency, and such efficiency comes at a terrible price. However, with computational power so cheap this days, there must be a sweet spot where we can achieve a comparable efficiency of data manipulation, at a more reasonable cost.

For instance, if a KDB license cost a small shop $200K per year (I dont know how much it actually cost, do you know?), maybe there is a substitute solution: e.g., we pay 50K to build a decent cluster, storing all data onto a network file system, and parallelize all the data queries. This solution might not be as fast or as elegant as KDB, but it would be much more affordable and most important -- you take full control of it.

what do you think? is there anything like this?

SRKX
  • 11,126
  • 4
  • 42
  • 83
Peter Peter
  • 635
  • 1
  • 6
  • 5
  • 3
    I'm not sure what you mean. The main trick used by KDB is to store the data in columns instead of rows. This has the advantage that if one column is selected all the data can be read in one long read. This can also be done in Python. – Bob Jansen Apr 01 '12 at 10:52
  • 4
  • Storing each tick as a separate record/row is neither sensible nor feasible (as you already mentioned). The most common approach is to split tick data by day and store days of data as plain arrays, either in files or in database LOBs. Such approach makes the amount of data perfectly manageable.
  • possible duplicate of What is the best data structure/implementation for representing a time series?
  • – wburzyns Apr 01 '12 at 18:50
  • 1
    The minimal 2-core setup of kdb+ is actually pretty reasonable: $25K per year including maintenance. – user2303 Apr 19 '13 at 01:53
  • 2
    artic could be an option (it's free software, build on top of MongoDB), but I haven't used it, so I cannot give you an opiniion on that:

    https://github.com/manahl/arctic

    – Juan Ignacio Gil Jan 24 '18 at 09:46