0

Edit: filed a bug report at https://jira.mariadb.org/browse/MDEV-14493

Mariadb nodes freeze occasionally with the following error in the logs, any clues?

(drupal app, db freezes on clearing application cache)

[Warning] WSREP: Failed to apply app buffer: seqno: 903792, status: 1
Nov 23 21:42:55 websrv4 mysqld[1725]: #011 at 
galera/src/trx_handle.cpp:apply():351
Nov 23 21:42:55 websrv4 mysqld[1725]: Retrying 2th time
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 
[Warning] WSREP: BF applier failed to open_and_lock_tables: 1927, fatal: 0 
wsrep = (exec_mode: 1 conflict_state: 5 seqno: 903792)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 
[ERROR] Slave SQL: Error executing row event: 'Connection was killed', 
Internal MariaDB error code: 1927
  • Just checking, is your application adhering to the Galera limitations such as the requirement that all tables must have an explicit primary key? From experience, Galera will die if you have a table without a primary key, and then delete data from it. "Clearing application cache" sounds like something that could be deleting rows. – dbdemon Nov 26 '17 at 16:21
  • its drupal so probably yes. and only under load, and only sometimes (almost daily). cache is cleared anyway every 3 hours or less. – a_at_cyber_dot_training Nov 26 '17 at 16:32
  • Is your application configured to access more than one DB node, e.g. through a DB proxy (like MaxScale or ProxySQL)? If so, make sure to use a read-write splitter to avoid writing to more than one node, as this is known for causing deadlocks, as seemed to be the problem in your first crash log that you uploaded to the MariaDB Jira. – dbdemon Nov 26 '17 at 21:59
  • dbdemon, If you add your answer separately I'll mark it solved after a few days with no crashes. – a_at_cyber_dot_training Nov 27 '17 at 03:50
  • So did you have any more crashes? – dbdemon Dec 05 '17 at 14:41
  • unfortunately yes, it was a bug in mariadb that is supposed to be fixed in 10.2.11, haven't checked yet, the workaround of disabling nagios also worked. – a_at_cyber_dot_training Dec 11 '17 at 05:05

1 Answers1

0

I understand from the comments that your application is writing to more than one DB node. This is known for causing deadlocks, and this appears to be the underlying problem in your first crash log that you uploaded to the MariaDB Jira. If you're using a DB proxy (like MaxScale or ProxySQL), then make sure to use a read-write splitter so you write to only one node, but read from all.

dbdemon
  • 6,351
  • 4
  • 19
  • 38