34

Say, I have a table ResidentInfo, and in this table I have unique constraints HomeAddress, which is VARCHAR type. For future query, I gonna add an index on this column. The query will only have operation =, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.

Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress, I should add index on the number column?

I ask this question because I don't know how index works.

Hao
  • 343
  • 1
  • 3
  • 5
  • 2
    Thanks for @Denis pointing out that unique constraint will establish index automatically. – Hao Jun 04 '13 at 18:42
  • According to performance there is one guideline that always applies: test it. It is impossible to get all your usecases from such vague description so when you are asking about speed, test what is fastest for you. There are cases where theoretically suboptimal approach is faster for data you usually process. – omikron Oct 23 '15 at 10:06

2 Answers2

49

For simple equality checks (=), a B-Tree index on a varchar or text column is simple and the best choice. It certainly helps performance a lot.

Of course, a B-Tree index on a simple integer performs better. For starters, comparing simple integer values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...

Since the HomeAddress is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.

If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).

Your table could look like this:

CREATE TABLE resident (
   resident_id serial PRIMARY KEY
  ,address text NOT NULL
   -- more columns
);

CREATE INDEX resident_adr_idx ON resident(address);

This results in two B-Tree indexes. A unique index on resident_id and a plain index on address.

More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.

Community
  • 1
  • 1
Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
  • Thank you so much! This really helps! So, the two B-Tree indexes gonna speed up queries like "SELECT * FROM resident WHERE resident_id=xxxxx;" and also give me an option in case I have to query using address, am I right? – Hao Jun 06 '13 at 02:35
  • @Hao: Correct. Plus, both indexes support more than simple equality checks. – Erwin Brandstetter Jun 06 '13 at 12:23
  • Thank you! As you mentioned, with regard to B-TREE's operations, EnterpriseDB's Hash Pattern Index still has flaw right now, and I may switch to Hash Pattern once they fix it, since I'm only using "=" operation for query. Takes O(1) for Hash and O(nlogn) for B-Tree. – Hao Jun 06 '13 at 17:08
11

In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.

In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:

create index on ResidentInfo (HomeAddress);
Denis de Bernardy
  • 72,128
  • 12
  • 122
  • 148
  • Oh, thanks for pointing that out! But the question still remains there. Will the query becomes faster if I add a number column and use it instead of address? – Hao Jun 04 '13 at 18:25