8

Does anyone know of any prior art for non-SQL data structures for high-frequency accounting, whether client, broker, or exchange-side? I'm thinking specifically of the problem of booking individual trade data into proper transactions, with balanced debits and credits. In my own case, I'll be doing this in or directly adjacent to a fast limit-order book, but I can see other reasons for such a beast existing. And yes, I agree that none of the current raft of popular non-ACID nosql engines are at all right for this job. I'm assuming I'm going to need to write this.

A usable answer to this question might be as simple as a link to a paper on the subject of non-SQL or nosql accounting in a high-volume trading context -- I'm obviously using the wrong combinations of search terms, because I'm not finding much yet.

What I'm working on is a project that includes a limit-order book and accounting on each node in a distributed grid or fabric. In my case, the traded instruments could best be described as real options or real derivatives, including some mild exotics. The vast majority of the orders would be initiated by machines, and the data rate looks like it could easily hit 60k trades/sec on each node. (Without going into a longer dissertation, it might help to explain that I'm in Silicon Valley these days; this is obviously for a new market, not any existing one.) See http://en.wikipedia.org/wiki/Real_options_valuation if you haven't run across real options before.

Partial answers, based in part on feedback to this question so far:

  • A purpose-built accounting mechanism would probably be log-structured, append-only, probably using a non-SQL API for insertion speed. The engine itself might be a hypergraph database. If running on multiple nodes, it would need a way of providing summary transactions to the other nodes in a peer-to-peer fashion. The more I dig into this, the more it's starting to look like a distributed hypergraph.

  • In the HFT world, it sounds like the standard procedure is still: Log but do not index the trades, do simple arithmetic ignoring debits and credits, and then synthesize balanced summary transactions to the accounting RDBMS periodically. Run MTM in batch. Is there anything anyone can say about how that "simple math and local logging" is done? I know how we did this in the derivatives world 15 years ago, but frankly it and MTM were both slow and ugly, and involved NFS servers, flat files, and shell scripts. Has nothing changed? ;-)

  • Okay, removing 'accounting' from the search terms just now found me this -- different question at first glance, covering both tick and financial data, but worth reading through -- looks like he had some of the same thoughts: Usage of NoSQL storage in Finance

    • Looks like it would be worth repeating my searches in google, citeseer, etc., substituting "finance" for "accounting".
  • Complex Event Processing (CEP) tries to solve some of the same problems -- it just occurred to me that including CEP in the same searches might be fruitful. The first thing I found was this (skeptical but humorous) article discussing CEP's slow uptake and some of the nosql hype: http://www.hftreview.com/pg/blog/darkstar/read/32333/whats-wrong-with-complex-event-processing

stevegt
  • 463
  • 4
  • 10
  • 2
    Almost all OLTP systems are designed in such a manner that they "synthesize balanced summary transactions to the back office periodically". It's a standard approach. – Alexey Kalmykov Mar 14 '12 at 00:46
  • @AlexeyKalmykov that pretty much sum's it up, – pyCthon Mar 14 '12 at 02:02
  • 1
    Thanks. Huh -- that could be why I'm not finding much prior art for better ways to do this. I thought for sure that things would have improved in the last 15 years. – stevegt Mar 14 '12 at 03:15
  • on a side note there are plenty of NON-sql databases that can for-fill the roll noted above – pyCthon Mar 20 '12 at 05:34
  • Given that exchange traded option markets are slow enough for SQL databases, you can color me skeptical that you are spending brainpower on a problem that will really exist. Obviously I don't know anything about the real-options-like trading you are thinking of, so I could easily be wrong on this topic, but that's my first thought. – Brian B Mar 22 '12 at 16:22
  • Hi I realize it is not directly linked to the question, but you should realize that real option are just corporate bs, and to my knowledge don't serve any actual purpose beside sustaining said bsters. Indeed, a fundamental flaw of them is that if you are long own option, you are short a put on all your competitors move. Now first of all that is never mentioned. And 2nd of all, I don't see htf you can delta hedge such option, so tion pricing has no say in that. That being said the metaphor might be useful to convey some idea to potential customers of your platform – nicolas Mar 26 '12 at 13:11
  • @nicolas: Yes, real options are interesting but orthogonal to this question. But I'm not working with conventional real options anyway; yes, it's a metaphor. Think high volume, micro sized, standardized, hedgeable, and traded via a limit order book. By contrast, a conventional real option is large value, low volume, often only valued (once) by a potential replicator, often not traded as such, not standardized, not listed on any exchange. Conventional real options don't have enough MTM to keep folks honest in their own valuations, leading to some of the abuses you so colorfully cite. ;-) – stevegt Mar 27 '12 at 19:40
  • @stevegt it sounds interesting. I am convinced there are many venues missing to make things we use everyday tradable and allow for smoother life and/or fun. the fact that you are in the valley makes me think it relates to one idea I had ;) – nicolas Mar 28 '12 at 08:43
  • @mepuzza http://quant.stackexchange.com/questions/3718/methods-storing-and-back-testing-tick-data –  Jul 02 '12 at 17:11

2 Answers2

4

I know this is probably a naive answer, but when I started doing data analysis for personal trading I looked for something much faster than SQL. I program in C++ and I found that HDF5 was the answer to all my problems

http://www.hdfgroup.org/HDF5/

It's not accounting oriented, but the nice thing about it is that you can do almost anything with it and it is very fast. A bit of a learning curve though

mepuzza
  • 535
  • 2
  • 6
  • Good call -- I've been looking at HDF5 as a possible candidate for the underlying storage layer. – stevegt Mar 25 '12 at 07:21
  • http://quant.stackexchange.com/questions/3718/methods-storing-and-back-testing-tick-data –  Jul 02 '12 at 17:13
3

I have to think that there are a lot of very fast, very optimized special-purpose accounting engines out there filling this role.

Yes and no. I do not think you are high volume at all - you just have a corporate-level server for the database, not a cheap low-end hosting. I do about 2000 transactions per second on a SQL Server with a mid-range database.

The core will be:

  • Decouple front and back with a message queue anyway.
  • Take trade executions from a FIX backoffice link that reports from clearing / broker.

it seems like a huge waste of data center horsepower when a more modern purpose-built, probably non-SQL accounting engine might be orders of magnitude faster.

There is one thing amiss: SQL has data integrity, while NoSql is often written ignoring data integrity requirements. You can get away with a lack of data integrity for a LOT of stuff, but not with accounting.

You also miss that accounting is a standardized commodity side. Large companies run something like SAP - and want all their data to be in there, regardless of costs. It is not a waste of time to upgrade the one central system doing your payroll, all invoices for the organization, etc. on top of trade accounting.

Also it is a question whether accounting really needs every trade - back office yes, to consolidate and check, but accounting is OK with synthesized balanced summaries. I do not do a lot of trading so far but submit monthly PNL totals with broker statement to my accountant (where it goes straight to my monthly profit / loss and tax calculations). I never will do different , even when volume ramps up - but will consolidate daily or hourly and correlate INTERNALLY, but not for accounting.

chrisaycock
  • 9,817
  • 3
  • 39
  • 110
TomTom
  • 151
  • 3
  • I didn't mention what sort of volume I'm looking at -- but so far, it looks like about 60k trades per second on each node of a distributed fabric, on commodity hardware. And when I say "non-SQL", I'm not thinking of any of the current raft of nosql non-ACID databases; I agree with you that they weren't written with accounting in mind. I'm assuming I'm going to need to write this myself; publicly-available prior art is what I'm looking for here. – stevegt Mar 14 '12 at 07:10
  • 1
    60k trades per second? Are you serious? Not doubting HFT, but if that is more than 2-3 nodes you talk ECN level volume, not trading. You would take over hald a stock market. In this case I would likely go with summaries and distribute stocks per node, with local logfile + summarization. – TomTom Mar 14 '12 at 07:13
  • Yep, 60k trades per second per node would be about right. I hear you regarding local logging + summarization: that's three in favor of that now, no other suggestions yet. I've updated the question a bit to clarify some of the points you raised. And who said anything about stocks? ;-) – stevegt Mar 14 '12 at 07:26
  • @stevegt 60K orders or 60K trades? As user2174 says, 60K trades is exchange-level volume. – chrisaycock Mar 14 '12 at 12:35
  • W;) Just activated my name, so user2174 is nettecture. Anyhow - THis makes no sense, you can nto run 60.000 executions (even ordersa re irrelevant) per second on multiple nodes won all exchanges of the world combined. FOr accounting, noone cares about bids / asks / order changes, only executions are relevant. Real time PNL is NOT entered into accounting books, this is done "regularly" (daily, hrouly) when you set positions market to market. Something is wrong with the 60k number to start with. – TomTom Mar 14 '12 at 12:54
  • I'll update the question with the following details: Being able to get real-time MTM is one of the things I'm hoping for -- I was hoping someone's done it by now. If we were talking stocks, then I'd agree that 60k/sec is not a realistic number. But I've never said I'm trading stocks -- I asked what the HFT guys are doing, which may have led you to believe that. The code I'm working on is for instruments that could best be described as real options or real derivatives. Without going into a long dissertation, it might help to explain that I'm in Silicon Valley these days, not New York. – stevegt Mar 14 '12 at 18:32
  • 1
    But real time MTM is NOT ACCOUNTING. This is risk management and trade control and done WITHOUT database in memory. You talked about accounting. Not about a real time trade position overview. – TomTom Mar 14 '12 at 20:03
  • I've just finished updating the question -- I hope it clarifies things somewhat. The real-time accounting, continuous MTM, and real-time counterparty risk management go together. One reason for getting all of this as close to the limit order book as possible is that, if the matching algorithm can see counterparty risk, then it has the ability to use risk exposures as input to the matching rules. I know this is not like anything that existing markets do; this is for use in a new market. – stevegt Mar 14 '12 at 21:34
  • @NetTecture: I do want to tell you how much I appreciate your feedback to date, incredulity and all. ;-) It's helped immensely in refining the text of the question. I'm still thinking about your bullet point about the message queue decoupling. I think the equivalent in the architecture I have in mind is just local IPC, between order book and accounting processes on the same box, if they are separate processes at all. – stevegt Mar 15 '12 at 01:05
  • Okay, forget what I said about real-time MTM; I'm realizing you're right, it's orthogonal to this question, and is only muddying the waters and making my head hurt. I've simplified the question down to "Non-sql methods for high-frequency accounting". – stevegt Mar 15 '12 at 05:45