Do both arguments of ST_DWITHIN have to be spatially indexed geographical columns to efficiently check proximity in meters

Question

Here is a snippet from my code:

ST_DWithin(c.convex_hull::geography , cp."geoJson"::geography, 20)

Please note both "convex_hull" and "geoJson" are geometry columns (which is why i am casting to geography). Further, please note that both columns have spatial indexing implemented as follows:

CREATE INDEX idx_bfp_clusters_convex_hull ON public.bfp_clusters USING gist (convex_hull)

and

CREATE INDEX "cp_geoJson_idx" ON public.campaign_pings USING gist ("geoJson")

Do I have to cast geometries to geography in both the CREATE INDEX commands above, or can I cast to geography in just one of the commands (the convex_hull one) to avail the speed benefit of spatial indexing?

Asking this since I would rather not tamper with the INDEX on the "geoJson" since it might affect existing code. Is there a way I can keep the existing index on the "geoJson" column, and add another geography index on the same column?

For more context, here is the aforementioned code snippet in place in a CTE:

clusters_with_coverage AS (
-- Select the cluster_id, convex_hull geometry, and calculate coverage

select

    c.uc_name,

    c.cluster_id,

    c.convex_hull,

    CASE

        -- Use the EXISTS keyword to check if any campaign pings are within 20 meters of the cluster's convex hull

        WHEN EXISTS (

            -- Subquery: Check if there is any campaign ping within 20 meters of the cluster

            SELECT 1

            FROM campaign_ping cp

            -- Use ST_DWithin function to determine if the campaign ping is within 20 meters of the cluster's convex hull

            WHERE ST_DWithin(c.convex_hull::geography , cp.&quot;geoJson&quot;::geography, 20)

        ) THEN 'covered'  -- If campaign pings are within 20 meters, mark coverage as 'covered'

        ELSE 'missed'     -- If no campaign pings are within 20 meters, mark coverage as 'missed'

    END AS coverage

FROM bfp_clusters c  -- Select cluster data from the 'bfp_clusters' table 


)

@IanTurton so it is possible to have two indexes on the same column? one of geometry type, and one of geography type — analyst92, Aug 16 '23 at 11:53
of course, why wouldn't it be? you can have two with the same type postigs doesn't care — Ian Turton, Aug 16 '23 at 12:01
Related: https://gis.stackexchange.com/questions/247113/setting-up-indexes-for-postgis-distance-queries — Vince, Aug 16 '23 at 12:41
Note that "both columns" do NOT have to be indexed -- that's not how indexes work. All you need is the column to be searched to be indexed. — Vince, Aug 16 '23 at 12:58
@Vince I have added a bit of code for context, I think the "column to be searched" in my case is both columns? — analyst92, Aug 16 '23 at 13:02
Nope. JOIN adds TableN to Table0 (which drives the query). If it's a spatial constraint, it's the spatial index on TableN which is leveraged (unless the optimizer rewrites it). The index on Table0 is only used to restrict rows, not in the TableN query (again, unless rewritten) — Vince, Aug 16 '23 at 13:44
@Vince but I have no control over if the optimizer rewrites it, so in that case the safest bet is to include correct spatial indexes on both? Also, in my code above, I am not "joining" in the traditional sense - am I right in understanding that the table "bfp_clusters" = Table0 and "campaign_ping" = TableN according to your naming convention. — analyst92, Aug 16 '23 at 15:47
Your subquery is handled much like a join (it just has less optimization available). Indexes have costs associated with them, in terms of storage and UPDATE overhead. The general rule of thumb is "Don't build unnecessary indexes." — Vince, Aug 16 '23 at 16:20

score 4 · Accepted Answer · edited Aug 17 '23 at 01:04

4

You can test whether an index is being used. The query below will tell you the indexes for the specified table, the number of times they've been scanned, and their size. You can create the geography indexes, run your query, then check the index usage with the following (replace schemaname with your schema and tablename with your table name). If they haven't been scanned, they aren't being used, and you can drop them:

SELECT a.*, pg_size_pretty(pg_relation_size(b.indexrelid)) index_size
FROM   pg_stat_user_indexes a 
JOIN   pg_index b ON a.indexrelid = b.indexrelid 
WHERE  schemaname in ('<schemaname>') AND relname in ('<tablename>');

You can add as many indexes as you like. Note that if you're doing many inserts or updates on the table, the indexes will slow that down, because the updated/inserted row needs to be written to all the indexes.

edited Aug 17 '23 at 01:04

Vince

20,017
15
45
64

answered Aug 16 '23 at 17:21

jbalk

7,505
1
17
39

If I abort a query (due to it taking too long), will index use still be registered? – analyst92 Aug 18 '23 at 09:04
1

That's a good question and I don't know the answer. I would assume that if it was scanned before the query was aborted, it should say it was scanned. You should post this question (after searching for an existing question) in dba stackexchange. It's not a good question for gis stackexchange. – jbalk Aug 18 '23 at 17:35
why can't I simply use EXPLAIN in my query and see which indexes are being used? Why is your code snippet necessary? Thanks. – analyst92 Aug 19 '23 at 04:20
1

It's just a tool for analysis. It's not necessary. If you want to just look at the explain output, do that. – jbalk Aug 21 '23 at 17:14

Do both arguments of ST_DWITHIN have to be spatially indexed geographical columns to efficiently check proximity in meters

1 Answers1