31

Does Postgres have any features to support aging-out old records?

I want to use Postgres for logging, as a sort of queue, where records (log events) older than two weeks are automatically deleted.

GG.
  • 125
  • 1
  • 8
Basil Bourque
  • 10,806
  • 20
  • 59
  • 92
  • Use partitioning. Just deleting records will not reduce the table size on disk. – sivann Jul 14 '15 at 06:19
  • @sivann Doesn't your recommendation assume his issue is disk size? If his problem is query execution time it seems like deletion might be a possibly simpler answer than partitioning, no? (still trying to understand this myself) – stephenmm Jul 14 '15 at 18:13
  • 1
    The problem is simply that log records are no longer interesting after a certain amount of time. Deleting old rows should make room for new fresh rows. – Basil Bourque Jul 14 '15 at 22:57

2 Answers2

25

There is no feature built in to delete rows automatically on a time-based regime (that I would know of).

You could run a daily (you decide) cron-job to schedule simple DELETE commands or use pgAgent for the purpose.

Or you could use partitioning with weekly partitions. That makes deleting very cheap: just keep the latest two weeks and drop older partitions.

Before Postgres 10, partitioning was always based on inheritance. The new declarative partitioning has a lot of advantages and is improved further with every release. Inheritance still provides one key advantage, though. The manual:

Some operations require a stronger lock when using declarative partitioning than when using table inheritance. For example, adding or removing a partition to or from a partitioned table requires taking an ACCESS EXCLUSIVE lock on the parent table, whereas a SHARE UPDATE EXCLUSIVE lock is enough in the case of regular inheritance.

Meaning, adding or removing a partition does not interfere with readers and writers of other partitions with inheritance, while everything is blocked with declarative partitioning (when going through the parent table). It's a very fast operation, though.

With inheritance, create partitions that inherit from a "master" table, lets call it log. Create a RULE or a TRIGGER on the master table that redirects INSERTs to the partition of the current week based on system time.

Either way, you can always log to the master table log. Create partitions ahead of time. Make that several weeks ahead to be sure and run a weekly cron job that adds future child tables ...

There are code examples in the manual for inheritance and declarative partitioning..

Related answer with a plpgsql function creating tables for inheritance automatically:

The related solution recreates a RULE to redirect INSERTs. A trigger function could write to the current partition dynamically ...

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • 1
    Update: Postgres 10 and later makes such partitioning easier. The declarative partitioning feature lets us specify the date range when defining the partition. So we no longer need create the RULE or TRIGGER as discussed in the middle of this Answer. To be clear, the cron jobs to create and delete the partitions as discussed in this Answer are still necessary. – Basil Bourque Mar 26 '20 at 00:10
  • This is interesting. It seems to me though the drawbacks to using partitions on smaller tables (e.g. tables that aren't otherwise large enough to warrant partitioning) potentially outweigh deletions being "cheap". – Madbreaks Aug 07 '20 at 21:41
  • 1
    @BasilBourque: Thank you for the update. I integrated declarative partitioning in the answer now. – Erwin Brandstetter Aug 08 '20 at 02:25
  • 1
    @Madbreaks: Yes, the turning point depends on the sum total of typical operations. Very small tables are unlikely candidates. Keeping only one or two partitions at all times might tip the scales towards partitioning, as that negates most of the performance drawbacks. Also, declarative partitioning has become better with every release. It starts to shine with Postgres 12 (better yet, 13). – Erwin Brandstetter Aug 08 '20 at 02:26
  • 1
    @Madbreaks Partitioning has numerous restrictions and trade-offs, and should be used only as a last resort. While it certainly can solve serious problems, use it judiciously. Be sure to do some deep study so you know the pros and cons. And keep up-to-date as this is an area of rapid development amongst the Postgres team. – Basil Bourque Aug 08 '20 at 05:41
  • @ErwinBrandstetter If I want to have a common mechanism for both Windows and Linux, which would be the best option to create one. I am shipping postgres for my application as both exe and rpm. – abyin007 Feb 22 '21 at 09:45
  • @abyin007: Not sure. pgagent is available for Windows and Linux ... – Erwin Brandstetter Feb 24 '21 at 12:24
3

It sounds like you use Postgres as time-series database. In that case, you might want to consider TimescaleDB, which is a time-series DB based on Postgres and is thus fully compatible with Postgres.

In Timescale, you first need to define a hypertable (the primary Timescale data structure), then you can configure a retention policy using add_retention_policy.

theDmi
  • 131
  • 3