29

This query: select count(*) from planner_event takes a very long time to run - so long, I gave up and killed it before it finished. However, when I run explain select count(*) from planner_event, I can see a column in the output with the number of rows (14m).

How come explain can get the number of rows instantly, but count(*) takes a long time to run?

Benubird
  • 475
  • 1
  • 4
  • 8
  • 1
    COUNT(*) without a WHERE cause will cause a table scan on the InnoDB engine.. MyISAM can delivery the count directly because the COUNT is keept in de header file off the table. – Raymond Nijland Aug 31 '17 at 12:56

2 Answers2

30

Explain is using previously gathered statistics (used by the query optimizer). Doing a select count(*) reads EVERY data block.

Here's a cheap way to get an estimated row count:

SELECT table_rows
FROM information_schema.tables
WHERE table_name='planner_event';

Even if you did select count(id), it might still take a very long time, unless you have a secondary index on id (also assuming id is a PRIMARY KEY). Because all data (including Row Data) is stored in B-Tree indexes, performing a select count(PK_COLUMN) is still a considerable amount of IO (needs to reads all data pages). If you have a secondary index on the PK field, it will be able to perform less IO to perform a count.

ijoseph
  • 103
  • 4
Kevin Bott
  • 608
  • 5
  • 9
  • 1
    I_S.TABLES gives you the same estimate that EXPLAIN gives you. – Rick James Aug 30 '17 at 18:31
  • 2
    The query is missing AND TABLE_SCHEMA='my_database', otherwise you will get multiple results back if you have the a table with the same name in another database. – c z Jul 30 '19 at 08:18
  • Incidentally, MySQL should implement counted B-tree indexing to deal with this. Although I guess that would be tricky given @RickJames ' point below about miscounting due to non-committed operations. – ijoseph Jul 11 '20 at 23:24
  • @ijoseph - The stats come from "random probes" into the BTree. This is faster, but less accurate, than reading the next-to-bottom nodes entirely. MySQL developers made choices and tradeoffs. – Rick James Jul 12 '20 at 01:19
5

Explain gets the number from some "statistics" that are used to estimate things for the Optimizer. That number can be far from correct -- I sometimes see it being more than a factor of 2 (higher or lower) than the exact value.

Performing the COUNT(*) on an InnoDB table must scan the table to avoid miscounting records that are busy being inserted/deleted by other connections but not yet "committed". Actually, it is good enough to do a full scan on some index, not necessarily the whole table (which contains the PRIMARY KEY).

How much RAM do you have? What is the value of innodb_buffer_pool_size? It might help if that were about 70% of RAM.

Rick James
  • 78,038
  • 5
  • 47
  • 113