13

It's a table of points. ~1M records

SELECT COUNT(*) as value FROM alasarr_social_mv s; 
Output: 976270

It looks like st_intersects forces to use the spatial indexes but && doesn't.

Sample using ST_Intersects (282ms)

SELECT COUNT(*) as value
FROM alasarr_social_mv 
WHERE ST_Intersects(
  the_geom_webmercator, 
  ST_MakeEnvelope(-410961,4920492,-402305,4926887,3857)
)


Aggregate  (cost=34370.18..34370.19 rows=1 width=0) (actual time=282.715..282.715 rows=1 loops=1)
  ->  Bitmap Heap Scan on alasarr_social_mv s  (cost=5572.17..34339.84 rows=60683 width=0) (actual time=21.574..240.195 rows=178010 loops=1)
        Recheck Cond: (the_geom_webmercator && '0103000020110F0000010000000500000000000000441519C1000000002BC5524100000000441519C1000000C069CB524100000000048E18C1000000C069CB524100000000048E18C1000000002BC5524100000000441519C1000000002BC55241'::geometry)
        Filter: _st_intersects(the_geom_webmercator, '0103000020110F0000010000000500000000000000441519C1000000002BC5524100000000441519C1000000C069CB524100000000048E18C1000000C069CB524100000000048E18C1000000002BC5524100000000441519C1000000002BC55241'::geometry)
        Heap Blocks: exact=4848
        ->  Bitmap Index Scan on alasarr_social_mv_gix  (cost=0.00..5569.13 rows=182050 width=0) (actual time=20.836..20.836 rows=178010 loops=1)
              Index Cond: (the_geom_webmercator && '0103000020110F0000010000000500000000000000441519C1000000002BC5524100000000441519C1000000C069CB524100000000048E18C1000000C069CB524100000000048E18C1000000002BC5524100000000441519C1000000002BC55241'::geometry)
Planning time: 0.192 ms
Execution time: 282.758 ms

Sample using && (414ms)

SELECT COUNT(*) as value
FROM alasarr_social_mv  
WHERE the_geom_webmercator && 
  ST_MakeEnvelope(-410961,4920492,-402305,4926887,3857)

Aggregate  (cost=22535.97..22535.97 rows=1 width=0) (actual time=414.314..414.314 rows=1 loops=1)
  ->  Seq Scan on alasarr_social_mv  (cost=0.00..22444.94 rows=182050 width=0) (actual time=0.017..378.427 rows=178010 loops=1)
        Filter: (the_geom_webmercator && '0103000020110F0000010000000500000000000000441519C1000000002BC5524100000000441519C1000000C069CB524100000000048E18C1000000C069CB524100000000048E18C1000000002BC5524100000000441519C1000000002BC55241'::geometry)
        Rows Removed by Filter: 798260
Planning time: 0.134 ms
Execution time: 414.343 ms

PostGIS version

POSTGIS="2.2.2" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.0, released 2014/04/16" LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.2.2" need upgrade) RASTER (raster procs from "2.2.2" need upgrade) – alasarr 2 mins ago
Evan Carroll
  • 7,071
  • 2
  • 32
  • 58
alasarr
  • 451
  • 4
  • 14
  • 2
    Don't paste screen shots of text into the question. Can you copy and paste those as code? I can't read them to help you. – Evan Carroll Aug 31 '17 at 18:28
  • Done. I think is a bit better now – alasarr Aug 31 '17 at 18:43
  • I tried to backport your new query plans into that. Feel free to upgrade the plans but try to keep the style. – Evan Carroll Aug 31 '17 at 18:49
  • 5
    Have a look at this question. The && operator actually does a bounding box query, whereas, ST_Intersects uses a bounding box query to determine which geometries to test for actual comparison -- so you would expect && to be quicker. However, it is likely that the use of ST_MakeEnvelope to the right of && is making the query planner chose a full table scan for some reason (as is evident from the explain). Try creating the geometry first, in a CTE, and see if you can "fool" the optimizer. – John Powell Aug 31 '17 at 19:19
  • You're right! it works inside a CTE – alasarr Aug 31 '17 at 19:28

1 Answers1

19

This sort of finding comes up fairly often, and it is a bit obscure, so is worth restating. If you define a geometry within a function that uses it, such as ST_Intersects or && (which ST_Intersects uses under the hood), then the query planner chooses a full table scan, as "it" has no knowledge of the outcome of the geometry creation function, ie, ST_MakeEnvelope in this case.. If you define the geometry you want to check for intersection in a CTE, then the optimizer is dealing with a known quantity, and will use a spatial index, if available.

So, rewritting your query as:

WITH test_geom (geom) AS 
   (SELECT ST_MakeEnvelope(-410961,4920492,-402305,4926887,3857))
  SELECT COUNT(*) as value
    FROM alasarr_social_mv mv, test_geom tg 
   WHERE ST_Intersects(mv.the_geom_webmercator, tg.geom)

will now use a spatial index. Likewise, && will now use an index to check for a bounding box, and, (while I can't test against your data), should be faster than ST_Intersects.

Interestingly, in your query, ST_Intersects is using a bitmap scan (not a gist) index, while && is using no index. So, both queries will be quicker with the CTE, but && should now be quicker than ST_Intersects.

There is more explanation of what is going on in this question and its answers/comments.

EDIT: To make this explicit, if you look at the definition of ST_Intersects in postgis.sql (which is called by CREATE EXTENSION postgis and found in the contrib directory of your Postgres install), you will see:

---- Inlines index magic
CREATE OR REPLACE FUNCTION ST_Intersects(geom1 geometry, geom2 geometry)
    RETURNS boolean
    AS 'SELECT $1 OPERATOR(&&) $2 AND _ST_Intersects($1,$2)'
    LANGUAGE 'sql' IMMUTABLE ;

including the comment: inlines index magic.

John Powell
  • 13,649
  • 5
  • 46
  • 62