I am new to PostgreSQL and PostGIS. The first project I am working on is the first I am struggeling with and the one I would like to discuss.
I have 2 stops tables I call for simplicity tbl_stops_1 and tbl_stops_2
tbl_stops_1
feed_index | integer | | not null | | plain | |
stop_id | text | | not null | | extended | |
stop_name | text | | | | extended | |
stop_lat | double precision | | | | plain | |
stop_lon | double precision | | | | plain | |
...
the_geom | geometry(Point,4326) | | | | main | |
Indexes:
"stops_pkey" PRIMARY KEY, btree (feed_index, stop_id)
"stop_geom_idx" gist (the_geom)
Triggers:
stop_geom_trigger BEFORE INSERT OR UPDATE ON gtfs.stops FOR EACH ROW EXECUTE PROCEDURE gtfs.stop_geom_update()
Row count:
select count(1) from gtfs.stops;
count
--------
523887
tbl_stops_2
feed_index | integer | | not null | | plain | |
seqno | bigint | | not null | nextval('gtfs.stops_zhv_seqno_seq'::regclass) | plain | |
...
the_geom | geometry(Point,4326) | | | | main | |
SeqNo | text | | | | extended | |
Type | text | | | | extended | |
DHID | text | | | | extended | |
Parent | text | | | | extended | |
Name | text | | | | extended | |
Latitude | text | | | | extended | |
Longitude | text | | | | extended | |
...
Indexes:
"stops_zhv_pkey" PRIMARY KEY, btree (feed_index, seqno)
Foreign-key constraints:
"stops_zhv_feed_index_fkey" FOREIGN KEY (feed_index) REFERENCES gtfs.feed_info(feed_index) ON DELETE CASCADE
Triggers:
stop_zhv_geom_trigger BEFORE INSERT OR UPDATE ON gtfs.stops_zhv FOR EACH ROW EXECUTE PROCEDURE gtfs.stop_zhv_geom_update()
Row count:
select count(1) from gtfs.stops_zhv;
count
--------
849453
Those 2 stops tables are imported into PostgreSQL from different sources. Though they have similarities, they are not the same. In particular tbl_stops_1 lacks in information in the tbl_stops_1.stop_name column. Those rows missing information I like to combine with information from the column tbl_stops_2.DHID like this.
tbl_stops_1.stop_name = tbl_stops_2.DHID.stringConCat(tbl_stops_1.stop_name);
How to match those two stops tables?
The values of lat and lon are quite similar in those two stops tables. I can match a stop row from table tbl_stops_1 with a stop row in table tbl_stops_2 like this.
SELECT * FROM tbl_stops_1 JOIN tbl_stops_2 ON ST_DWithin(tbl_stops_1.the_geom, tbl_stops_2.the_geom, ???);
What am I aiming for?
At the end, I would like to have a view or table where the above string concatenation is done for stops that match because their geographic coordinates are close to each other.
Edit no. 1
- I updated the definition to that the stops tables are more alike and * added row counts.
- I like to have a
the_geomcolumn in both tables to facilitate this project. Anyhow, the function to import the CSV data into the database does not like to do this yet. I'll touch base.
Edit no. 2
- I updated the definition to that the stops tables are more alike and * added row counts.
- I updated the matching query using
JOIN
Open Issue
- How make the
JOINwork and combine it with the string concatenation?
ST_Distancequery cannot use an index, and will therefore be very inefficient. You especially don't want to join to the table without the geometry column, and you want aJOINto do so. Please [Edit] the Question to contain row counts for each table and explain what the query actually does. – Vince May 22 '23 at 11:13indexand that theST_Distancequery cannot use it? What arerow counts? I'll asap edit the question. Cheers anyways! – dancesWithCycles May 22 '23 at 12:03JOINthe stops tables already now? – dancesWithCycles May 23 '23 at 12:07JOINquery I am using now takes ages and returns zero rows. I am wondering what is the unit of the distance value in the ST_DWidth() function: meter, degree, aso.? Can you suggest anything for optimization? – dancesWithCycles May 25 '23 at 14:06