PostgreSQL need VACUUM or ANALYZE schedule to make use of indexes?

Question

I am creating indexes to tables in an existing PostgreSQL database instance. I understand that EXPLAIN ANALYZE followed by the application's SQL command is the easiest way to know whether my indexes are being used.

For example:

EXPLAIN ANALYZE SELECT A,B,C FROM MY_TABLE WHERE C=123;

Would give me back:

Seq Scan on public.my_table (cost=...) <- No index, BAD

And, after creating the index, it would return:

Index Scan using my_index_name on public.my_table (cost=...) <- Index, GOOD

In my optimisation task, I found a relatively big table that suffered the occasional performance problem and created an index for that. That was a perfect example of turning a sequential scan into an index scan as above. It worked.

However, after creating the index, the second query that ranked the slowest on the list, still less than 0.5 seconds that is, the index didn't make any difference! After the index created, it would still do the Seq Scan. The table has a few hundred records though, but likely to grow a few thousand a year.

Digging a little deeper in the PostgreSQL documentation it says it would still use the sequential scan when no significant performance gains were given by using the said index.

Then the scary part: there was a recommendation that you run ANALYZE or have the "Autovacuum" daemon on. This way the database would know the size of tables and decide on query plans properly.

I understand using indexes is fairly basic, so is this absolutely necessary in a production environment? In other words, will PostgreSQL use the index when it's time to use it without need to analyse or vacuum as an extra task?

The answer is: yes. On a table with just a few hundred rows it's not uncommon that a table scan is quicker than an index scan (especially if you are returning most of the rows). Unless you have some very uneven data distribution the optimizer will pick the best plan in most of the cases. See also my answer here: http://stackoverflow.com/a/5203827/330315 If you are (absolutely) sure a query would benefit from an index but the planner isn't picking it, then please post *that* query [including all relevant information](http://wiki.postgresql.org/wiki/SlowQueryQuestions) — a_horse_with_no_name, Sep 19 '13 at 07:47
Thanks for the comment. The query has a simple equals filter on a numeric column and no joins, like the example above. I usually keep things simple in the DB. But it Would be interesting to compare that "explain plan" index scare with other databases, I really did have to check a few times if the column name was right! Statistics collection seems to be enabled by default so I guess this question is not that much relevant in reality. — Zero Distraction, Sep 19 '13 at 08:01
Yes, the default settings for auto vacuum and statistics collection should be good enough to start with. Again: if you have question about a specific query, please post it (maybe as a new question) — a_horse_with_no_name, Sep 19 '13 at 08:03

score 1 · Accepted Answer · answered Nov 16 '13 at 11:49

Short answer "just run autovacuum." Long answer... yes, because statistics can get out of date.

Let's talk about indexes and how/when PostgreSQL decides to use them.

PostgreSQL gets a query in, parses it, and then begins the planning process. How are we going to scan the tables? How are we going to join them and in what order? These are not trivial decisions and trying to find the generally best ways to do things typically means that PostgreSQL needs to know something about the tables.

The first thing to note is that indexes are not always a win. No plan ever beats a sequential scan through a one-page table, and even a 5 page table will almost always be faster with a sequential scan than an index scan. So PostgreSQL cannot safely decide to "use all available indexes."

So the way PostgreSQL decides whether to use an index is to check statistics. Now, these go out of date, which is why you want autovacuum to be updating them. You say your table has a few hundred records and the statics were probably out of date. If PostgreSQL cannot say that the index is a win, it won't use it. A few hundred records is going to be approaching "an index might help" territory depending on how selective the index is in weeding out records.

In your large table, there was probably no question based on existing statistics that the index would help. In your smaller table, there probably was a question and it got answered one way based on the stats it had, and a different way based on newer stats.

Thanks for you answer. When I read the docos, it did look like "Vacuuming" was an extra thing, but it turns out that's the default behaviour. Cheers — Zero Distraction, Nov 20 '13 at 00:45
It was made the default behavior I think in 8.3 iirc. Sometime in 8.x. Before that it had to be configured, an before that it had to be run manually. — Chris Travers, Nov 20 '13 at 07:31

PostgreSQL need VACUUM or ANALYZE schedule to make use of indexes?

1 Answers1