When choosing index column order, the overriding concern is:
Are there (equality) predicates against this column in my queries?
If a column never appears in a where clause, it's not worth indexing(1)
OK, so you've got a table and queries against each column. Sometimes more than one.
How do you decide what to index?
Let's look at an example. Here's a table with three columns. One holds 10 values, another 1,000, the last 10,000:
create table t(
few_vals varchar2(10),
many_vals varchar2(10),
lots_vals varchar2(10)
);
insert into t
with rws as (
select lpad(mod(rownum, 10), 10, '0'),
lpad(mod(rownum, 1000), 10, '0'),
lpad(rownum, 10, '0')
from dual connect by level <= 10000
)
select * from rws;
commit;
select count(distinct few_vals),
count(distinct many_vals) ,
count(distinct lots_vals)
from t;
COUNT(DISTINCTFEW_VALS) COUNT(DISTINCTMANY_VALS) COUNT(DISTINCTLOTS_VALS)
10 1,000 10,000
These are numbers left padded with zeros. This will help make the point about compression later.
So you've got three common queries:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
What do you index?
An index on just few_vals is only marginally better than a full table scan:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 61 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 2 | VIEW | VW_DAG_0 | 1 | 1000 | 1000 |00:00:00.01 | 61 |
| 3 | HASH GROUP BY | | 1 | 1000 | 1000 |00:00:00.01 | 61 |
| 4 | TABLE ACCESS FULL| T | 1 | 1000 | 1000 |00:00:00.01 | 61 |
select /+ index (t (few_vals)) /
count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 58 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 58 |
| 2 | VIEW | VW_DAG_0 | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 3 | HASH GROUP BY | | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 5 | INDEX RANGE SCAN | FEW | 1 | 1000 | 1000 |00:00:00.01 | 5 |
So it's unlikely to be worth indexing on its own. Queries on lots_vals return few rows (just 1 in this case). So this is definitely worth indexing.
But what about the queries against both columns?
Should you index:
( few_vals, lots_vals )
OR
( lots_vals, few_vals )
Trick question!
The answer is neither.
Sure few_vals is a long string. So you can get good compression out of it. And you (might) get an index skip scan for the queries using (few_vals, lots_vals) that only have predicates on lots_vals. But I don't here, even though it performs markedly better than a full scan:
create index few_lots on t(few_vals, lots_vals);
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 61 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 61 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 4 | TABLE ACCESS FULL| T | 1 | 1 | 1 |00:00:00.01 | 61 |
select /+ index_ss (t few_lots) /count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 13 | 11 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 5 | INDEX SKIP SCAN | FEW_LOTS | 1 | 40 | 1 |00:00:00.01 | 12 | 11 |
Do you like gambling? (2)
So you still need a an index with lots_vals as the leading column. And at least in this case the compound index (few, lots) does the same amount of work as one on just (lots)
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 3 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 3 |
| 5 | INDEX RANGE SCAN | FEW_LOTS | 1 | 1 | 1 |00:00:00.01 | 2 |
create index lots on t(lots_vals);
select /+ index (t (lots_vals)) /count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 5 | INDEX RANGE SCAN | LOTS | 1 | 1 | 1 |00:00:00.01 | 2 | 1 |
There will be cases where the compound index saves you 1-2 IOs. But is it worth having two indexes for this saving?
And there's another problem with the composite index. Compare the clustering factor for the three indexes including LOTS_VALS:
create index lots on t(lots_vals);
create index lots_few on t(lots_vals, few_vals);
create index few_lots on t(few_vals, lots_vals);
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where table_name = 'T';
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_LOTS 47 10,000 530
LOTS_FEW 47 10,000 53
LOTS 31 10,000 53
FEW 31 10 530
Notice that the clustering factor for few_lots is 10x higher than for lots and lots_few! And this is in a demo table with perfect clustering to begin with. In real world databases the effect is likely to be worse.
So what's so bad about that?
The clustering factor is one of the key drivers determining how "attractive" an index is. The higher it is, the less likely the optimizer is to choose it. Particularly if lots_vals aren't actually unique, but still normally have few rows per value. If you're unlucky this could be enough to make the optimizer think a full scan is cheaper...
OK, so composite indexes with few_vals and lots_vals only have edge case benefits.
What about queries filtering few_vals and many_vals?
Single columns indexes only give small benefits. But combined they return few values. So a composite index is a good idea. But which way round?
If you place few first, compressing the leading column will make that smaller
create index few_many on t(many_vals, few_vals);
create index many_few on t(few_vals, many_vals);
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_MANY 47 1,000 10,000
MANY_FEW 47 1,000 10,000
alter index few_many rebuild compress 1;
alter index many_few rebuild compress 1;
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
MANY_FEW 31 1,000 10,000
FEW_MANY 34 1,000 10,000
With fewer different values in the leading column compresses better. So there's marginally less work to read this index. But only slightly. And both are already a good chunk smaller than the original (25% size decrease).
And you can go further and compress the whole index!
alter index few_many rebuild compress 2;
alter index many_few rebuild compress 2;
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_MANY 20 1,000 10,000
MANY_FEW 20 1,000 10,000
Now both indexes are back to the same size. Note this takes advantage of the fact there's a relationship between few and many. Again it's unlikely you'll see this kind of benefit in the real world.
So far we've only talked about equality checks. Often with composite indexes you'll have an inequality against one of the columns. e.g. queries such as "get the orders/shipments/invoices for a customer in the past N days".
If you have these kinds of queries, you want the equality against the first column of the index:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals < '0000000002'
and many_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 12 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 12 |
| 2 | VIEW | VW_DAG_0 | 1 | 10 | 10 |00:00:00.01 | 12 |
| 3 | HASH GROUP BY | | 1 | 10 | 10 |00:00:00.01 | 12 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 10 | 10 |00:00:00.01 | 12 |
| 5 | INDEX RANGE SCAN | FEW_MANY | 1 | 10 | 10 |00:00:00.01 | 2 |
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001'
and many_vals < '0000000002';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 12 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 12 | 1 |
| 2 | VIEW | VW_DAG_0 | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 3 | HASH GROUP BY | | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 5 | INDEX RANGE SCAN | MANY_FEW | 1 | 1 | 10 |00:00:00.01 | 2 | 1 |
Notice they're using the opposite index.
TL;DR
- Columns with equality conditions should go first in index.
- If you have multiple columns with equalities in your query, placing the one with the fewest different values first will give the best compression advantage
- While index skip scans are possible, you need to be confident this will remain a viable option for the foreseeable future
- Composite indexes including near-unique columns give minimal benefits. Be sure you really need to save the 1-2 IOs
1: In some cases it may be worth including a column in an index if this means all the columns in your query are in the index. This enables an index only scan, so you don't need to access the table.
2: If you're licensed for Diagnostics and Tuning, you could force the plan to a skip scan with SQL Plan Management
ADDENDA
PS - the docs you've quoted there are from 9i. That's reeeeeeally old. I'd stick with something more recent