Fastest MySQL Storage Engine for `Select` queries

Question

I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.

I have a query like below

SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0@v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1@c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal`

This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.

To do this kind of work, which MySQL storage you recommend?

Which indexes are defined on the table? What is the execution plan? — a_horse_with_no_name, Apr 27 '14 at 10:09
@a_horse_with_no_name: Hi, I will have indexes as defined in below SO user "Thilows" 's answer. — Dongle, Apr 27 '14 at 10:13
Is this a flat table (does it have relations with other tables)? If its flat, then the fastest way would be to store it in a cache. If you will only execute one type of query, and your data set is _very large_, then you need to consider an alternative solution with map/reduce; like couchdb which would be a lot faster. — Burhan Khalid, Apr 27 '14 at 10:13
If you don't ever update that table, why not store the result of the aggregation in a new table after you inserted everything into the `key_word` table? — a_horse_with_no_name, Apr 27 '14 at 10:14
Also check out this: [MyISAM versus InnoDB](http://stackoverflow.com/questions/20148/myisam-versus-innodb) — vitro, Apr 27 '14 at 10:14
@a_horse_with_no_name Could you elaborate on the aggregation idea? I'm not sure how a "pre aggregation" would work when the condition is on hashed_word and the grouping is on indexVal. Changing the list of hashed words would affect the values per indexVal, so I can't see a _great_ way to pre calculate. Do you mean some sort of pre calculated hashed_value count per indexVal so you could use `sum` to aggregate instead of count? — Joachim Isaksson, Apr 27 '14 at 10:24

score 0 · Answer 1 · answered Apr 27 '14 at 10:10

0

More than the storage engine, having the proper index in place seems important.

In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.

And if the data truly never changes, you could even pre-compute and cache some of those results.

For example

 CREATE TABLE counts  AS SELECT index_val, hashed_word, count(index_val) 
 FROM key_word
 GROUP BY index_val, hashed_word

answered Apr 27 '14 at 10:10

Thilo

250,062
96
490
643

1

Wouldn't be `hashed_word, index_val` better? (Although I'm not sure MySQL can use an index for an `OR` condition like that) – a_horse_with_no_name Apr 27 '14 at 10:12
@a_horse_with_no_name: Inside my query, I already use `OR` condition inside the `IN` clause right? Sorry I am not good with SQL. – Dongle Apr 27 '14 at 10:14
@a_horse_with_no_name: Maybe. I'm also worried if MySQL can handle the OR, so I put index_val first. Only one way to tell: measure both. – Thilo Apr 27 '14 at 10:14
@a_horse_with_no_name: The index I have is `ALTER TABLE key_word ADD INDEX (hashed_word, indexVal)` – Dongle Apr 27 '14 at 10:15
@Thilo: Yes, the data never change. if do, we can rebuild the tables. – Dongle Apr 27 '14 at 10:16
1

@Dongle: Try both indexes, and get execution plans. See which one works better. – Thilo Apr 27 '14 at 10:16
@Thilo: `hashed_word, index_val` is better, but it takes 5 seconds. that is slow. – Dongle Apr 27 '14 at 10:17
Try pre-computed aggregates then. – Thilo Apr 27 '14 at 10:23
@Thilo: Your new table generated results only with 1 in "count" column. – Dongle Apr 27 '14 at 13:01

score 0 · Answer 2 · answered Apr 28 '14 at 06:44

0

For SELECT-only queries, ARCHIVE is the fastest storage engine. As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.

answered Apr 28 '14 at 06:44

Maxim Krizhanovsky

25,260
5
51
86

ARCHIVE engine is a bad way to store rows for selecting... All rows are compressed and SELECT query will be slow. – AlexeyGorovoy Feb 08 '19 at 15:58

Fastest MySQL Storage Engine for `Select` queries

2 Answers2