18

I see an ever increasing list of new functions in PostGIS, some of which involve GEOS (e.g. ST_ClusterKMeans). Some functions (e.g. those in pgrouting) rely on other libraries (e.g. BGL).

My impression is that many of these underlying libraries (often in C/C++) do not handle buffers management between the memory and secondary memories/storage/disks.

So do the PostGIS functions on top of them work on large data sets that cannot be stored in the physical (or virtual) memory?

If so, where do these buffer management capabilities come from (from the point view of implementation)?

Regina Obe
  • 10,503
  • 1
  • 23
  • 28
tinlyx
  • 11,057
  • 18
  • 71
  • 119

1 Answers1

12

No, most of these "higher order analysis" functions do not have any special handling for data sets that are larger than can fit in memory. If you run them on such data sets, you'll just OOM the backend.

For a while we avoided making such functions, but as RAM got larger by default and people wanted more analysis and relatively few of them ever hit memory limits, the benefits/drawbacks equation has shifted in favour of "just do it".

The oldest of these functions, ST_Union() was originally built to not be memory bound, at (very high) expense in performance. You can still use the original function, ST_MemUnion(), which (confusingly) actually uses less memory, since the "mem" means "memory safe".

Other functions, like ST_Buffer(), the various clusters, will OOM if you feed them with enough data.

Paul Ramsey
  • 19,865
  • 1
  • 47
  • 57
  • 2
    "Out of memory (OOM) is an often undesired state of computer operation where no additional memory can be allocated for use by programs or the operating system." -- wikipedia – Martin F Jan 11 '17 at 18:40