I have two global PostGIS polyline layers, one of which is a subset of the other. The subset defines where there are issues with the system being mapped.
I am trying to map the likelihood of a polyline being present, but no polylines from the subset. I.E., the likelihood that the system is performing as expected. I've tried Krigging, interpolation, zonal stats, etc and have working code using a grid of 1 deg lat lon
--create the grid and convert to vector. I've also tried this using a raster grid.
create table fishnet (geom geometry, id bigserial primary key) insert into fishnet (geom) select rast::geometry
from ST_MakeEmptyCoverage(1, 1, 720, 360, -180, 90, (360)/(720)::float, (-180)/360::float, 0., 0., 4326) rast
create index on fishnet using gist (geom)
--The query subtracts the length of the subset polyines in each grid from the main
--polyine and divides by the sum of their lengths to get an average. This is faster than using
--st_difference to get the length of the main polyline layer that doesn't intersect the subset layer
explain analyse select greatest(sum(st_length(st_intersection(geom, g_geom))-st_length(st_intersection(geom, g_geom)))/sum(st_length(st_intersection(geom, g_geom))+st_length(st_intersection(geom, b_geom))), 0.0) as perc_conn, geom
FROM fishnet a join
(select a.id,
st_collect(a.geom) as g_geom, st_collect(b.geom) as b_geom
from
(select a.geom, a.id from good_polylines a inner join
(select tmp.* from
(select a.id, t_id, date_added from events a
inner join
(select min(date_added) as date_added, id from events_processed group by id) dp
on a.id = dp.id
union
select id, t_id, start_event from asp_cgi_err) tmp
inner join t_list t on t.t_id=tmp.t_id
where sys_type = 'SYS_TYPE') foo on a.id = foo.id where foo.date_added >= '20230201'::timestamp without time zone and foo.date_added < '20230217'::timestamp without time zone + interval '1 day') a
left join bad_polylines b on a.id = b.id group by a.id) b
on st_intersects(geom, g_geom) group by geom
My problem is I need to create the map on demand and the query takes 30 minutes to run with a small sample of 4k polylines. All tables have spatial indexes and primary keys.
The output from explain analyse is
"GroupAggregate (cost=16655846.98..590800409.83 rows=259200 width=128) (actual time=1205194.109..1968037.472 rows=17105 loops=1)"
" Group Key: a.geom"
" -> Sort (cost=16655846.98..16670126.96 rows=5711990 width=184) (actual time=1205188.303..1214805.832 rows=41579 loops=1)"
" Sort Key: a.geom"
" Sort Method: external merge Disk: 11969064kB"
" -> Nested Loop (cost=15035.89..14491977.35 rows=5711990 width=184) (actual time=964.756..1085082.337 rows=41579 loops=1)"
" -> GroupAggregate (cost=15035.61..119301.07 rows=22037 width=83) (actual time=961.596..7909.971 rows=764 loops=1)"
" Group Key: a_1.id"
" -> Nested Loop Left Join (cost=15035.61..55812.00 rows=32381 width=17854) (actual time=237.184..7322.351 rows=4067 loops=1)"
" -> Merge Join (cost=15035.19..41816.51 rows=7029 width=17353) (actual time=236.687..7049.449 rows=764 loops=1)"
" Merge Cond: (a_1.id = foo.id)"
" -> Index Scan using good_polylines_id_idx on good_polylines a_1 (cost=0.41..26621.04 rows=22104 width=17353) (actual time=0.839..1663.204 rows=22152 loops=1)"
" -> Sort (cost=15034.78..15052.30 rows=7008 width=32) (actual time=141.302..142.031 rows=782 loops=1)"
" Sort Key: foo.id"
" Sort Method: quicksort Memory: 86kB"
" -> Hash Join (cost=14275.49..14587.15 rows=7008 width=32) (actual time=139.164..139.536 rows=782 loops=1)"
" Hash Cond: (foo.t_id = t.t_id)"
" -> HashAggregate (cost=14260.82..14398.23 rows=13741 width=72) (actual time=139.006..139.197 rows=804 loops=1)"
" Group Key: foo.id, foo.t_id, (min(id_events_processed.date_added))"
" Batches: 1 Memory Usage: 465kB"
" -> Append (cost=13558.01..14157.76 rows=13741 width=72) (actual time=133.985..138.693 rows=804 loops=1)"
" -> Hash Join (cost=13558.01..13896.68 rows=13720 width=72) (actual time=133.984..138.469 rows=786 loops=1)"
" Hash Cond: (foo.id = id_events_processed.id)"
" -> Sort (cost=12064.24..12120.01 rows=22309 width=272) (actual time=112.825..113.940 rows=22129 loops=1)"
" Sort Key: foo.id"
" Sort Method: quicksort Memory: 2497kB"
" -> Subquery Scan on foo (cost=6624.53..7628.44 rows=22309 width=272) (actual time=59.568..75.443 rows=22129 loops=1)"
" -> Unique (cost=6624.53..7405.35 rows=22309 width=272) (actual time=59.564..71.488 rows=22129 loops=1)"
" -> Sort (cost=6624.53..6680.30 rows=22309 width=272) (actual time=59.563..66.408 rows=22256 loops=1)"
" Sort Key: t_list_new.t_id, t_list_new.start_event, t_list_new.end_event, t_list_new.event_time, t_list_new.perc, t_list_new.lc, t_list_new.orig_geom, t_list_new.dest_geom, t_list_new.orig, t_list_new.dest, t_list_new.id, t_list_new.url, t_list_new.last_timestamp"
" Sort Method: external merge Disk: 6128kB"
" -> Append (cost=0.00..2188.73 rows=22309 width=272) (actual time=0.015..8.644 rows=22256 loops=1)"
" -> Seq Scan on t_list_new (cost=0.00..896.65 rows=2365 width=260) (actual time=0.014..1.341 rows=2365 loops=1)"
" -> Seq Scan on t_list_old (cost=0.00..957.44 rows=19944 width=277) (actual time=0.010..5.682 rows=19891 loops=1)"
" -> Hash (cost=1492.23..1492.23 rows=123 width=36) (actual time=21.126..21.129 rows=1080 loops=1)"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 89kB"
" -> HashAggregate (cost=1122.96..1491.00 rows=123 width=36) (actual time=17.378..20.860 rows=1080 loops=1)"
" Group Key: id_events_processed.id"
" Filter: ((min(id_events_processed.date_added) >= '2023-02-01 00:00:00'::timestamp without time zone) AND (min(id_events_processed.date_added) < '2023-02-18 00:00:00'::timestamp without time zone))"
" Batches: 1 Memory Usage: 3601kB"
" Rows Removed by Filter: 23548"
" -> Seq Scan on id_events_processed (cost=0.00..740.48 rows=38248 width=36) (actual time=0.021..3.670 rows=38269 loops=1)"
" -> Seq Scan on t_list_err (cost=0.00..54.97 rows=21 width=48) (actual time=0.074..0.164 rows=18 loops=1)"
" Filter: ((start_event >= '2023-02-01 00:00:00'::timestamp without time zone) AND (start_event < '2023-02-18 00:00:00'::timestamp without time zone))"
" Rows Removed by Filter: 1113"
" -> Hash (cost=13.40..13.40 rows=102 width=6) (actual time=0.144..0.145 rows=102 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 12kB"
" -> Seq Scan on t_ids t (cost=0.00..13.40 rows=102 width=6) (actual time=0.011..0.124 rows=102 loops=1)"
" Filter: (sys_type = 'SYS_TYPE'::text)"
" Rows Removed by Filter: 10"
" -> Index Scan using bad_polylines_id_idx on bad_polylines b (cost=0.42..1.92 rows=7 width=528) (actual time=0.240..0.348 rows=5 loops=764)"
" Index Cond: (id = a_1.id)"
" -> Index Scan using fishnet_geom_idx on fishnet a (cost=0.28..651.94 rows=26 width=120) (actual time=17.677..1409.874 rows=54 loops=764)"
" Index Cond: (geom && (st_collect(a_1.geom)))"
" Filter: st_intersects(geom, (st_collect(a_1.geom)))"
" Rows Removed by Filter: 1023"
"Planning Time: 8.424 ms"
"Execution Time: 1969143.215 ms"
Note, I had to move my postgres data directory, which is 15 gb, from /root to /home because the query with explain analyse used all available space, 5-6 gb, on root. I am using postgres-13 with rhel8.
explain analyseon the query? – Ian Turton Feb 17 '23 at 17:12