2

I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.

I have a query like below

SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0@v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1@c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal` 

This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.

To do this kind of work, which MySQL storage you recommend?

Dongle
  • 600
  • 1
  • 7
  • 18
  • 2
    Which indexes are defined on the table? What is the execution plan? – a_horse_with_no_name Apr 27 '14 at 10:09
  • @a_horse_with_no_name: Hi, I will have indexes as defined in below SO user "Thilows" 's answer. – Dongle Apr 27 '14 at 10:13
  • Is this a flat table (does it have relations with other tables)? If its flat, then the fastest way would be to store it in a cache. If you will only execute one type of query, and your data set is _very large_, then you need to consider an alternative solution with map/reduce; like couchdb which would be a lot faster. – Burhan Khalid Apr 27 '14 at 10:13
  • 1
    If you don't ever update that table, why not store the result of the aggregation in a new table after you inserted everything into the `key_word` table? – a_horse_with_no_name Apr 27 '14 at 10:14
  • Also check out this: [MyISAM versus InnoDB](http://stackoverflow.com/questions/20148/myisam-versus-innodb) – vitro Apr 27 '14 at 10:14
  • @a_horse_with_no_name Could you elaborate on the aggregation idea? I'm not sure how a "pre aggregation" would work when the condition is on hashed_word and the grouping is on indexVal. Changing the list of hashed words would affect the values per indexVal, so I can't see a _great_ way to pre calculate. Do you mean some sort of pre calculated hashed_value count per indexVal so you could use `sum` to aggregate instead of count? – Joachim Isaksson Apr 27 '14 at 10:24

2 Answers2

0

More than the storage engine, having the proper index in place seems important.

In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.

And if the data truly never changes, you could even pre-compute and cache some of those results.

For example

 CREATE TABLE counts  AS SELECT index_val, hashed_word, count(index_val) 
 FROM key_word
 GROUP BY index_val, hashed_word 
Thilo
  • 250,062
  • 96
  • 490
  • 643
0

For SELECT-only queries, ARCHIVE is the fastest storage engine. As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.

Maxim Krizhanovsky
  • 25,260
  • 5
  • 51
  • 86