1

It looks like I messed up a database migration and while all sequences are there, they would start at 1, which triggers errors like:

django.db.utils.IntegrityError: duplicate key value violates unique constraint "django_admin_log_pkey" DETAIL: Key (id)=(2) already exists.

Is there a query/script I could run that would run across all tables in the database, look at any columns tied to a sequence and reset those sequences to max(column) + 1?

Using PostgreSQL v11.

d33tah
  • 389
  • 1
  • 2
  • 10

1 Answers1

0

I see you are using Django.

I implemented a command to resolve these issues for all databases in your settings file.

import psycopg2
from django.conf import settings
from django.core.management.base import BaseCommand
from django.db import connections

def dictfetchall(cursor): """Return all rows from a cursor as a dict""" columns = [col[0] for col in cursor.description] return [ dict(zip(columns, row)) for row in cursor.fetchall() ]

class Command(BaseCommand): help = "Resets sequencing errors in Postgres which normally occur due to importing/restoring a DB"

def handle(self, *args, **options):
    # loop over all databases in system to figure out the tables that need to be reset
    for name_to_use_for_connection, connection_settings in settings.DATABASES.items():
        table_name = connection_settings['NAME']
        host = connection_settings['HOST']
        user = connection_settings['USER']
        port = connection_settings['PORT']
        password = connection_settings['PASSWORD']

        # connect to this specific DB
        conn_str = f"host={host} port={port} user={user} password={password}"

        conn = psycopg2.connect(conn_str)
        conn.autocommit = True

        select_all_table_statement = f"""SELECT *
                                FROM information_schema.tables
                                WHERE table_schema = 'public'
                                ORDER BY table_name;
                            """
        # just a visual representation of where we are
        print('-' * 20, table_name)
        try:
            not_reset_tables = list()
            # use the specific name for the DB
            with connections[name_to_use_for_connection].cursor() as cursor:
                # using the current db as the cursor connection
                cursor.execute(select_all_table_statement)
                rows = dictfetchall(cursor)
                # will loop over table names in the connected DB
                for row in rows:
                    find_pk_statement = f"""
                        SELECT k.COLUMN_NAME
                        FROM information_schema.table_constraints t
                        LEFT JOIN information_schema.key_column_usage k
                        USING(constraint_name,table_schema,table_name)
                        WHERE t.constraint_type='PRIMARY KEY'
                            AND t.table_name='{row['table_name']}';
                    """
                    cursor.execute(find_pk_statement)
                    pk_column_names = dictfetchall(cursor)
                    for pk_dict in pk_column_names:
                        column_name = pk_dict['column_name']

                    # time to build the reset sequence command for each table
                    # taken from django: https://docs.djangoproject.com/en/3.0/ref/django-admin/#sqlsequencereset
                    # example: SELECT setval(pg_get_serial_sequence('"[TABLE]"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "[TABLE]";
                    try:
                        reset_statement = f"""SELECT setval(pg_get_serial_sequence('"{row['table_name']}"','{column_name}'), 
                                                coalesce(max("{column_name}"), 1), max("{column_name}") IS NOT null) FROM "{row['table_name']}" """
                        cursor.execute(reset_statement)
                        return_values = dictfetchall(cursor)
                        # will be 1 row
                        for value in return_values:
                            print(f"Sequence reset to {value['setval']} for {row['table_name']}")
                    except Exception as ex:
                        # will only fail if PK is not an integer...
                        # currently in my system this is from django.contrib.sessions
                        not_reset_tables.append(f"{row['table_name']} not reset")

        except psycopg2.Error as ex:
            raise SystemExit(f'Error: {ex}')

        conn.close()
        print('-'*5, ' ALL ERRORS ', '-'*5)
        for item_statement in not_reset_tables:
            # shows which tables produced errors, so far I have only
            # seen this with PK's that are not integers because of the MAX() method
            print(item_statement)

        # just a visual representation of where we are
        print('-' * 20, table_name)

ViaTech
  • 101
  • 1
  • in the autocommit mode this script can corrupt the sequences continuity because any other user can get a new value between the SELECT to know the last value and the reseting of these values. The only way to do this is to avoid any user to acces the PG cluster while running this script. However, attempting to recover the "holes" in a sequence is a concern of no practical interest and you should never rely on such a concept when designing a relational database. – SQLpro Sep 14 '20 at 12:05
  • @SQLpro can you show a real use example of how placing autocommit to true here would be bad? I suppose if an entry was being stored while this command was running it could produce errors, but I have used this command in a few postgres projects in Django to fix errors with sequencing and have never run into an issue. I naturally take the server down when commands like this are being run so access is not available, but I suppose I am not fully seeing the issue you describe. – ViaTech Sep 14 '20 at 13:15
  • Here is a solution to a similar question: https://stackoverflow.com/a/34109293 – Laenka-Oss Jul 22 '22 at 02:46