Server Admin Log/Archive 65

2023-04-30

14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Host down T335640
14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Host down T335640
08:06 elukey: powercycle ores1002 (mgmt console tty not usable, host frozen)

2023-04-29

23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: Maint
23:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: Maint
22:54 rzl@cumin2002: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P47290 and previous config saved to /var/cache/conftool/dbconfig/20230429-225457-rzl.json

2023-04-28

22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for new frack nodes - pt1979@cumin2002"
22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for new frack nodes - pt1979@cumin2002"
22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: setup
20:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: setup
20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: setup
20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: setup
20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:16 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:16 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:07 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
19:07 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
17:50 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@d56b7fb]: (no justification provided) (duration: 00m 10s)
17:50 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@d56b7fb]: (no justification provided)
15:39 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:23 jynus: update schema for backup1-codfw (mediabackups) T327157
15:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2519
15:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2519
14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1004']
14:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1004']
14:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1004']
14:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1004']
13:21 vgutierrez: import haproxy 2.7.7 on apt.wm.o thirdparty/haproxy27 for bullseye
12:36 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:35 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:35 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:34 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:31 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:30 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:29 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
12:29 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
12:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new server sretest1003 - jclark@cumin1001"
12:06 jclark@cumin1001: START - Cookbook sre.dns.netbox
10:43 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS bullseye
10:28 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
10:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:25 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
10:13 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS bullseye
10:11 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS bullseye
10:01 vgutierrez: restarting varnish on cp5017 and cp5025 to drop port 80 - T322774
09:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
09:55 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
09:42 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS bullseye
09:31 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS bullseye
09:24 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:13 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
09:11 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
08:57 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS bullseye
08:47 jnuche@deploy1002: Installing scap version "4.51.0" for 593 hosts
08:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS buster
08:23 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
08:14 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
08:11 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
07:57 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS buster
07:55 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS buster
07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
07:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
07:41 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
07:37 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
07:23 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS buster
07:22 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS buster
07:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
07:00 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
06:46 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS buster
05:57 XioNoX: push pfw policies - T335554
05:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
05:29 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
05:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 393731
05:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 393731
04:08 eileen: config revision changed from b33fa934 to 2eef4039
03:16 ejegg: SmashPig upgraded from db9fa965 to a9fa7a2c
03:08 ejegg: payments-wiki upgraded from 91582d93 to 61951572
03:05 eileen: config revision changed from 98f2afbb to b33fa934
02:55 eileen: civicrm upgraded from b4a05476 to e7904ea6
02:13 eileen: civicrm upgraded from 601d223e to b4a05476

2023-04-27

22:17 zabe@deploy1002: Finished scap: T334295 (duration: 06m 58s)
22:10 zabe@deploy1002: Started scap: T334295
20:29 TheresNoTime: close UTC late backport window
20:27 samtar@deploy1002: Finished scap: Backport for [[gerrit:912884|[cawikisource] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912888|[cawiktionary] Add a wordmark (Vector 2022) (T331823)]] (duration: 07m 19s)
20:21 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:912884|[cawikisource] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912888|[cawiktionary] Add a wordmark (Vector 2022) (T331823)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:912884|[cawikisource] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912888|[cawiktionary] Add a wordmark (Vector 2022) (T331823)]]
20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:912874|[cawikibooks] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912877|[cawikinews] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912880|[cawikiquote] Add a wordmark (Vector 2022) (T331823)]] (duration: 09m 43s)
20:11 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:912874|[cawikibooks] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912877|[cawikinews] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912880|[cawikiquote] Add a wordmark (Vector 2022) (T331823)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:10 samtar@deploy1002: Started scap: Backport for [[gerrit:912874|[cawikibooks] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912877|[cawikinews] Add a wordmark (Vector 2022) (T331823)]], [[gerrit:912880|[cawikiquote] Add a wordmark (Vector 2022) (T331823)]]
19:27 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@bc37201]: (no justification provided) (duration: 00m 10s)
19:27 ejegg: payments-wiki upgraded from 7fa25437 to 91582d93
19:27 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@bc37201]: (no justification provided)
19:16 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@f162f4d]: Deploying T333001 on platform_eng Airflow instance. (duration: 12m 01s)
19:04 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@f162f4d]: Deploying T333001 on platform_eng Airflow instance.
18:47 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.6 refs T330212
18:37 jhuneidi@deploy1002: Finished scap: Backport for gerrit:911804Replace references to actionsToolbar (T335469) (duration: 16m 10s)
18:22 jhuneidi@deploy1002: jhuneidi and jforrester: Backport for gerrit:911804Replace references to actionsToolbar (T335469) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
18:21 jhuneidi@deploy1002: Started scap: Backport for gerrit:911804Replace references to actionsToolbar (T335469)
17:51 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=1) for host kafkamon1003.eqiad.wmnet with OS bullseye
17:39 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage
17:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage
17:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@5a46db1] (releasing): (no justification provided) (duration: 00m 40s)
17:27 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@5a46db1] (releasing): (no justification provided)
17:14 hnowlan@deploy1002: Finished deploy [restbase/deploy@a08f56d]: Deploying new wikis: T333272 T334460 T334741 T335020 (duration: 03m 29s)
17:11 hnowlan@deploy1002: Started deploy [restbase/deploy@a08f56d]: Deploying new wikis: T333272 T334460 T334741 T335020
17:06 mutante: deploy2002 - armed the keyholder (sudo keyholder arm and enter passphrase from deployment-key-passphrase in pwstore) - monitoring alert should resolve - T335435
17:01 herron@cumin1001: START - Cookbook sre.ganeti.reimage for host kafkamon1003.eqiad.wmnet with OS bullseye
16:56 volans: uploaded python3-wmflib_1.2.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
16:20 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kafkamon1003.eqiad.wmnet
16:20 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
16:19 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
16:05 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafkamon1003.eqiad.wmnet on all recursors
16:05 herron@cumin1001: START - Cookbook sre.dns.wipe-cache kafkamon1003.eqiad.wmnet on all recursors
16:05 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
16:01 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
15:59 herron@cumin1001: START - Cookbook sre.dns.netbox
15:59 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host kafkamon1003.eqiad.wmnet
15:58 vgutierrez: restarting varnish on cp5018 and cp5026 to drop port 80 - T322774
15:55 jbond: upload puppetboard_4.3.0-1_all.deb to bookworm-wikimedia
15:37 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
15:35 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
15:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:35 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:34 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
15:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
15:34 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
15:33 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
15:33 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
15:32 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
15:29 krinkle@deploy1002: Synchronized wmf-config/mc.php: Ia174ea2b0645 (duration: 06m 05s)
15:25 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
15:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:22 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
15:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
15:22 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:22 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:21 claime: repooled mw2331.codfw.wmnet - T335486
15:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2331.codfw.wmnet
15:21 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2331.codfw.wmnet
15:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:20 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:20 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:18 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:17 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:14 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:13 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:10 vgutierrez: restarting varnish on cp5019 and cp5027 to drop port 80 - T322774
15:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
14:58 claime: repooling mw2330.codfw.wmnet - T335487
14:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2330.codfw.wmnet
14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2330.codfw.wmnet
14:56 Lucas_WMDE: UTC afternoon backport+config window done
14:55 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:912290Add language codes cal and tpv to wmgExtraLanguageNames (T308062) (duration: 07m 55s)
14:49 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and noa: Backport for gerrit:912290Add language codes cal and tpv to wmgExtraLanguageNames (T308062) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:912290Add language codes cal and tpv to wmgExtraLanguageNames (T308062)
14:46 ejegg: payments-wiki upgraded from f30bc859 to 7fa25437
14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:912815lowiki: Use Western style (0-9) numerals (T335345) (duration: 08m 53s)
14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:912815lowiki: Use Western style (0-9) numerals (T335345) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:37 ejegg: disabled fundraising job ingenico_recurring_fill_scheme_ids (it's all done)
14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:912815lowiki: Use Western style (0-9) numerals (T335345)
14:36 vgutierrez: restarting varnish on cp5020 and cp5028 to drop port 80 - T322774
14:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:911308Close cnwikimedia (T274083) (duration: 11m 05s)
14:29 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
14:28 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
14:28 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:28 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:27 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
14:27 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
14:27 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
14:26 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
14:26 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
14:26 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
14:25 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
14:25 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:911308Close cnwikimedia (T274083) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
14:25 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
14:25 moritzm: restarting apache/FPM on mw canaries to pick up curl update
14:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:911308Close cnwikimedia (T274083)
14:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:912337labtestwiki: disable cirrus completion index (duration: 09m 31s)
14:13 moritzm: installing curl security updates on buster
14:12 lucaswerkmeister-wmde@deploy1002: dcausse and lucaswerkmeister-wmde: Backport for gerrit:912337labtestwiki: disable cirrus completion index synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:912337labtestwiki: disable cirrus completion index
14:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye
14:05 samtar@deploy1002: Finished scap: Backport for gerrit:910056Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088) (duration: 38m 35s)
14:00 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
13:59 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
13:59 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
13:59 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
13:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
13:35 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
13:33 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye
13:31 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
13:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
13:28 samtar@deploy1002: samtar and cmelo: Backport for gerrit:910056Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:26 samtar@deploy1002: Started scap: Backport for gerrit:910056Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088)
13:20 samtar@deploy1002: Finished scap: Backport for gerrit:910055metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088) (duration: 15m 07s)
13:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
13:06 samtar@deploy1002: samtar and cmelo: Backport for gerrit:910055metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:05 samtar@deploy1002: Started scap: Backport for gerrit:910055metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088)
13:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
12:56 vgutierrez: restarting varnish on cp5021 and cp5029 to drop port 80 - T322774
12:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
12:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
12:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye
12:12 moritzm: imported puppet 5.5.22-2+deb13u3 to bookworm-wikimedia T330495
11:56 jbond: upload python3-pypuppetdb_3.1.0-1_all.deb to bookworm
11:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 23951
11:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 23951
11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 54994
11:41 krinkle@deploy1002: Synchronized wmf-config/: I195978 (duration: 06m 29s)
11:14 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 54994
11:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:09 vgutierrez: restarting varnish on cp5022 and cp5030 to drop port 80 - T322774
11:07 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:03 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:00 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:33 vgutierrez: restarting varnish on cp5023 and cp5031 to drop port 80 - T322774
10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
10:04 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-cache1001.eqiad.wmnet with OS bullseye
10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
09:55 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
09:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1001.eqiad.wmnet with OS bullseye
09:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
09:41 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
09:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2330.codfw.wmnet with reason: PSU failure
09:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2330.codfw.wmnet with reason: PSU failure
09:40 claime: depooling mw2330.codfw.wmnet for HW troubleshooting - T335487
09:39 godog: delete all 2023 replica=unset blocks from thanos - T335406
09:37 claime: depooling mw2331.codfw.wmnet for HW troubleshooting - T335486
09:36 vgutierrez: restarting varnish on cp5024 and cp5032 to drop port 80 - T322774
09:34 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
09:29 moritzm: imported prometheus-rsyslog-exporter to bookworm-wikimedia T330495
09:29 moritzm: imported wmf-certificates to bookworm-wikimedia T330495
09:14 vgutierrez: restarting varnish on cp4037 and cp4045 to drop port 80 - T322774
09:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@fb6f0ea] (releasing): (no justification provided) (duration: 00m 40s)
09:10 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@fb6f0ea] (releasing): (no justification provided)
09:09 godog: restart thanos-compact on thanos-fe2001 - T335406
09:06 moritzm: uploaded debdeploy 0.0.99.13+deb12u1 to bookworm-wikimedia T330495
09:00 godog: delete overlapping block 01GY1CQ4EAKRV9BQ8D9JB1VWGJ from thanos - T335406
08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
08:39 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
08:24 vgutierrez: restarting varnish on cp4038 and cp4046 to drop port 80 - T322774
08:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
08:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
07:50 apergos: UTC morning backport and config training window complete
07:45 jnuche@deploy1002: Finished scap: Backport for gerrit:911796Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410) (duration: 08m 33s)
07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15169
07:38 jnuche@deploy1002: thiemowmde and jnuche: Backport for gerrit:911796Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
07:37 jnuche@deploy1002: Started scap: Backport for gerrit:911796Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410)
07:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15169
07:23 moritzm: uploaded debmonitor-client 0.3.2-1+deb12u1 to bookworm-wikimedia T330495
05:56 XioNoX: Configure 1:1 NAT for new fr-tech hosts - T335441
05:51 XioNoX: downgrade SGIX RS BGP sessions to non-primary
00:01 zabe@deploy1002: Finished scap: T334295 (duration: 06m 53s)

2023-04-26

23:54 zabe@deploy1002: Started scap: T334295
23:32 zabe@deploy1002: Finished scap: Backport for gerrit:912379Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451) (duration: 07m 07s)
23:26 zabe@deploy1002: zabe and nray: Backport for gerrit:912379Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
23:25 zabe@deploy1002: Started scap: Backport for gerrit:912379Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451)
22:06 samtar@deploy1002: Finished scap: Backport for gerrit:910110interwiki: update URL to XTools (duration: 09m 43s)
21:57 samtar@deploy1002: musikanimal and samtar: Backport for gerrit:910110interwiki: update URL to XTools synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:56 samtar@deploy1002: Started scap: Backport for gerrit:910110interwiki: update URL to XTools
21:39 brett: Re-enable Puppet on LVS[4008-4010] - T263797
21:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
21:00 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=labweb
20:37 jhuneidi@deploy1002: Finished scap: Backport for gerrit:911952Set Vector 2022 as default skin on Polish Wikipedia (T335311) (duration: 09m 22s)
20:29 jhuneidi@deploy1002: jhuneidi and jdrewniak: Backport for gerrit:911952Set Vector 2022 as default skin on Polish Wikipedia (T335311) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:28 jhuneidi@deploy1002: Started scap: Backport for gerrit:911952Set Vector 2022 as default skin on Polish Wikipedia (T335311)
19:47 brett: Disable Puppet on LVS[4008-4010] for rollout of LVS maglev hashing scheduler - T263797
19:16 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@eb07d71]: fetch_conda: path globs must not be quoted (duration: 00m 27s)
19:15 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@eb07d71]: fetch_conda: path globs must not be quoted
19:10 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@5f2ec35]: repoint shebang lines of conda env (duration: 00m 23s)
19:10 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@5f2ec35]: repoint shebang lines of conda env
18:34 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@ba52b43]: replace python env deployment method with conda env from gitlab (duration: 00m 24s)
18:33 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@ba52b43]: replace python env deployment method with conda env from gitlab
18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.6 refs T330212 (duration: 06m 04s)
18:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.6 refs T330212
17:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5016
17:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:37 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
17:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
17:31 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
17:24 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5016
17:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5015
17:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:23 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
17:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
17:21 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
17:17 robh@cumin1001: START - Cookbook sre.dns.netbox
17:13 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5015
17:12 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5014
17:12 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:12 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5014 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
17:11 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5014 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
16:56 robh@cumin1001: START - Cookbook sre.dns.netbox
16:50 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5014
16:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5013
16:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:48 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5013 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
16:46 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5013 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
16:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5013
16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2002.codfw.wmnet with OS bullseye
16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS bullseye
16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS bullseye
16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS bullseye
16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS bullseye
16:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:17 vgutierrez: restarting varnish on cp4039 and cp4047 to drop port 80 - T322774
16:10 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:08 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
16:05 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
15:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
15:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
15:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
15:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
15:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
15:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
15:43 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
15:34 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
15:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
15:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
15:20 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@5061681]: (no justification provided) (duration: 00m 20s)
15:19 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@5061681]: (no justification provided)
15:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
15:14 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS bullseye
15:14 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS bullseye
15:13 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS bullseye
14:45 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS bullseye
14:41 vgutierrez: restarting varnish on cp4040 and cp4048 to drop port 80 - T322774
14:34 cgoubert@deploy1002: Finished scap: Backport for gerrit:911792Revert "debug.json: List primary DC servers first" (duration: 08m 07s)
14:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
14:28 cgoubert@deploy1002: cgoubert: Backport for gerrit:911792Revert "debug.json: List primary DC servers first" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:26 cgoubert@deploy1002: Started scap: Backport for gerrit:911792Revert "debug.json: List primary DC servers first"
14:24 cgoubert@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchback - T327920 (duration: 69m 03s)
14:16 marostegui: Update dns for parsercache T327920
14:10 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
14:08 claime: Phase 9.5 Update DNS records for new database masters
14:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
14:07 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
14:05 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
14:05 claime: Restarting maintenance jobs - T327920
14:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
14:04 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
14:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
14:03 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-04-26 14:03:01.527715
14:00 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
13:59 claime: Going to read-only for mediawiki datacenter switchback - T327920
13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
13:55 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 1239
13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1239
13:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
13:47 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
13:46 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:45 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
13:45 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:45 claime: Stopping maintenance scripts for datacenter switchback - T327920
13:43 vgutierrez: restarting varnish on cp4041 and cp4049 to drop port 80 - T322774
13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
13:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
13:31 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
13:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
13:25 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
13:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
13:25 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
13:23 claime: Starting mediawiki datacenter switchback preparation - T327920
13:15 cgoubert@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchback - T327920
13:14 claime: Locking scap for datacenter switchback - T327920
13:13 vgutierrez: restarting varnish on cp4042 and cp4050 to drop port 80 - T322774
13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
13:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
13:06 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:56 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
12:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
12:49 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:49 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:49 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:40 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
12:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 36s)
12:13 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
12:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 33s)
12:10 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
12:10 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 01m 15s)
12:09 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
12:03 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 34s)
12:03 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
11:37 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
11:27 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
11:25 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
11:16 moritzm: import php-excimer 1.0.2-1+wmf3+buster1+icu67 to component/icu67 T332964
11:15 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
10:54 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955] (duration: 01m 30s)
10:52 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955]
10:52 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955] (duration: 02m 08s)
10:50 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955]
10:49 btullis@deploy1002: Finished deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955] (duration: 05m 23s)
10:44 btullis@deploy1002: Started deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955]
10:25 vgutierrez: restarting varnish on cp4043 and cp4051 to drop port 80 - T322774
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 1828
10:07 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
10:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1828
09:57 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
09:54 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
09:54 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
09:49 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
09:49 btullis@cumin1001: Added views for new wiki: kbdwiktionary T333270
09:31 vgutierrez: restarting varnish on cp4044 and cp4052 to drop port 80 - T322774
09:26 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955] (duration: 00m 04s)
09:26 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955]
09:25 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955] (duration: 00m 05s)
09:25 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955]
09:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:20 btullis@deploy1002: Finished deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955] (duration: 00m 46s)
09:19 btullis@deploy1002: Started deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955]
09:05 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
09:05 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
08:53 moritzm: installing golang-1.11 security updates
08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:16 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
07:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
07:39 Emperor: start to load new swift backends, drain old ones T335278 T335279 T335280 T335281
07:39 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
07:35 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
07:34 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
07:33 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
07:33 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
07:32 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync
07:32 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: sync
07:28 taavi@deploy1002: Finished scap: Backport for gerrit:911311Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062) (duration: 07m 48s)
07:22 taavi@deploy1002: taavi and migr: Backport for gerrit:911311Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:20 taavi@deploy1002: Started scap: Backport for gerrit:911311Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062)
07:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38082
07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38082
07:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
07:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4826
07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4826
07:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55818
07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55818
07:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
07:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
06:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 133840
06:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 133840
06:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
06:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
06:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
06:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7552
06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7552
06:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45796
06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45796
06:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 140407
06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 140407
06:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1828
06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1828
06:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
06:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
06:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4657
06:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4657
06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1239
06:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1239
06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
06:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17676
06:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17676
06:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45498
06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45498
06:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 134823
06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 134823
06:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9583
06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9583
06:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 24482
06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137831
06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 137831
06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9002
06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9002
06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23951
06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23951
06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299
06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9299
06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8529
06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8529
06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38040
06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38040
06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4651
06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4651
06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132132
06:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132132
06:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58552
06:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58552
06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23947
06:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23947
06:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17961
06:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17961
06:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 54994
06:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55818
06:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55818
06:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9009
06:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9009
06:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4773
06:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4773
06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 133840
06:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 133840
06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140951
06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 140951
06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4761
06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4761
06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 49544
06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 49544
06:12 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 6939
06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6939
06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136907
06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136907
06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
06:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
06:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
06:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
06:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23824
06:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23824
06:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18403
06:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18403
06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136106
06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136106
06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
06:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10089
06:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10089
06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 906
06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 906
06:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
06:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139836
05:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139836
05:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10030
05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10030
05:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38158
05:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38158
05:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63199
05:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63199
05:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 131285
05:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 131285
05:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2518
05:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2518
05:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55967
05:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55967
05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2519
05:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2519
05:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45430
05:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45430
05:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 703
05:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
05:45 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 703
05:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
05:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 703
05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
05:33 XioNoX: bounce SGIX RS BGP - T327284
05:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59369
05:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59369
05:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59360
05:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59360
05:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59360
05:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59360
04:55 eileen: civicrm upgraded from 2bc9f372 to 601d223e

2023-04-25

21:40 mutante: gerrit1003 - chown -R gerrit2:gerrit2 /var/lib/gerrit2/review_site/ - T326368
21:19 mutante: gerrit1003 - chown -R gerrit2:gerrit2 /srv/gerrit T333143 T326368
21:17 mutante: gerrit1003 - mv /srv/gerrit/plugins/lfs /srv/gerrit/data/ T333143 T326368
21:14 mutante: gerrit1003 - manually replacing deploy2002 with deploy1002 in /srv/deployment/gerrit/gerrit-cache/.config to fix initial scap deployment T257317 T326368
21:12 mutante: once again running into T257317 when applying gerrit role to new hardware
21:06 mutante: adding production gerrit role to new machine gerrit1003 - monitoring downtimed - but it has a service IP that is going to be added by this and cant be downtimed ? (Bug: T326368)
21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit1003.wikimedia.org with reason: setup
21:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit1003.wikimedia.org with reason: setup
19:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2006.codfw.wmnet
19:48 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2006.codfw.wmnet
19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2009.codfw.wmnet
19:46 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2009.codfw.wmnet
19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wdqs2006.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:46 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wdqs2006.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:23 inflatador: bking@cumin1001 finishing WDQS deploy...restarting `wdqs-categories` across lvs-managed hosts
18:57 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: 0.3.123 (duration: 17m 29s)
18:39 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: 0.3.123
18:18 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.6 refs T330212
16:55 ejegg: payments-wiki upgraded from 2a4c450d to f30bc859
15:39 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:39 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:34 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:33 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:32 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:31 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:31 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:31 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:30 dancy@deploy1002: Installation of scap version "4.50.0" completed for 1 hosts
15:30 dancy@deploy1002: Installing scap version "4.50.0" for 1 hosts
15:28 XioNoX: update cr2-eqsin BBIX interface
15:27 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:27 btullis@cumin1001: Added views for new wiki: azwikimedia T330442
15:25 dancy@deploy1002: Installing scap version "4.50.0" for 592 hosts
15:24 cgoubert@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T335015
15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on irc2002.wikimedia.org with reason: Non-functional, WIP for Bullseye update
15:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on irc2002.wikimedia.org with reason: Non-functional, WIP for Bullseye update
15:22 claime: Datacenter Service Switchback concluded - T335015
15:21 cgoubert@deploy1002: Synchronized README: check the deployment server after switchback - T335015 (duration: 19m 55s)
15:19 cgoubert@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
15:19 cgoubert@cumin2002: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
15:19 cgoubert@cumin2002: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T335015
15:19 claime: Restoring restbase-async to codfw only - T335015
15:18 cgoubert@deploy1002: Finished deploy [restbase/deploy@a08f56d]: (no justification provided) (duration: 13m 06s)
15:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
15:05 cgoubert@deploy1002: Started deploy [restbase/deploy@a08f56d]: (no justification provided)
15:02 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:02 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
15:02 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:02 btullis@cumin1001: Added views for new wiki: vewikimedia T330704
15:01 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:01 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:01 btullis@cumin1001: Added views for new wiki: ckbwiktionary T331834
15:01 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:01 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:00 btullis@cumin1001: Added views for new wiki: fatwiki T335018
15:00 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:00 btullis@cumin1001: Added views for new wiki: kcgwiktionary T334739
15:00 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
14:59 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
14:59 btullis@cumin1001: Added views for new wiki: guwwikinews T334408
14:59 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
14:58 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: 0.3.123 (duration: 07m 38s)
14:54 cgoubert@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Service Switchback - T335015 (duration: 81m 19s)
14:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
14:50 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: 0.3.123
14:48 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
14:45 claime: Running authdns-update - T335015
14:45 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.123`. Pre-deploy tests passing on canary `wdqs1003`
14:44 claime: Switch deployment server back to eqiad - T335015
14:43 claime: All active/active services repooled in codfw - T335015
14:43 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: Datacenter Services Switchback - T335015
14:36 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
14:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
14:26 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacenter Services Switchback - T335015
14:26 claime: All services pooled in eqiad, all depooled in codfw, proceeding with repooling active/active services in codfw - T335015
14:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
14:25 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
14:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Services Switchback - T335015
14:19 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
14:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
14:18 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in codfw: Datacenter Services Switchback - T335015
14:16 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
14:04 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
14:04 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in codfw: Datacenter Services Switchback - T335015
14:02 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
14:01 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
14:00 claime: Starting Datacenter Services Switchback - T335015
13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
13:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
13:33 cgoubert@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Service Switchback - T335015
13:30 inflatador: bking@cumin1001 transfer.py wdqs2009.codfw.wmnet:/srv/wdqs wdqs2022.codfw.wmnet:/srv/wdqs
13:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: attempting WDQS stack on bullseye
13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: attempting WDQS stack on bullseye
13:06 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
13:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
13:05 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
13:05 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
13:04 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
13:03 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
13:03 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
11:44 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe
11:40 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe
10:58 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2394.codfw.wmnet
10:57 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2395.codfw.wmnet
10:57 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2410.codfw.wmnet
10:56 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2411.codfw.wmnet
10:52 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=videoscaler,service=canary
10:52 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=jobrunner,service=canary
10:21 moritzm: installing libxml2 security updates on bullseye
09:34 moritzm: upgrade php-excimer on remaining mediawiki hosts to 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) T332964
08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
08:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
07:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46887
06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46887
06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4557
06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4557
04:08 ejegg: re-enabled fundraising scheduled jobs
04:07 ejegg: civicrm upgraded from fa5265bf to 2bc9f372
03:55 ejegg: civicrm upgraded from 14644f30 to fa5265bf
03:52 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.4 (duration: 02m 06s)
03:50 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.6 refs T330212 (duration: 48m 05s)
03:16 eileen: config revision changed from 554bb874 to d1462a30
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.6 refs T330212

2023-04-24

23:15 eileen: civicrm upgraded from c17c8db2 to 26150ed4
22:00 eileen: civicrm upgraded from 3466c2d3 to c17c8db2
20:53 cjming: end of UTC late backport window
20:52 cjming@deploy2002: Finished scap: Backport for gerrit:911366Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149) (duration: 11m 25s)
20:42 cjming@deploy2002: cjming and nray: Backport for gerrit:911366Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:42 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6e76561]: (no justification provided) (duration: 00m 23s)
20:41 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6e76561]: (no justification provided)
20:41 cjming@deploy2002: Started scap: Backport for gerrit:911366Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149)
20:38 cjming@deploy2002: Finished scap: Backport for [[gerrit:910857|[fywiki] Add portal and portal talk namespace (T334807)]] (duration: 07m 26s)
20:32 cjming@deploy2002: cjming and superpes: Backport for [[gerrit:910857|[fywiki] Add portal and portal talk namespace (T334807)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:30 cjming@deploy2002: Started scap: Backport for [[gerrit:910857|[fywiki] Add portal and portal talk namespace (T334807)]]
20:28 cjming@deploy2002: Finished scap: Backport for [[gerrit:910604|[guwwikinews] Add a HD logo for vector legacy (T335162)]] (duration: 07m 22s)
20:22 cjming@deploy2002: superpes and cjming: Backport for [[gerrit:910604|[guwwikinews] Add a HD logo for vector legacy (T335162)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:21 cjming@deploy2002: Started scap: Backport for [[gerrit:910604|[guwwikinews] Add a HD logo for vector legacy (T335162)]]
20:19 cjming@deploy2002: Finished scap: Backport for [[gerrit:910603|[kcgwiktionary] Add a HD logo for vector legacy (T335162)]] (duration: 07m 51s)
20:13 cjming@deploy2002: superpes and cjming: Backport for [[gerrit:910603|[kcgwiktionary] Add a HD logo for vector legacy (T335162)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:11 cjming@deploy2002: Started scap: Backport for [[gerrit:910603|[kcgwiktionary] Add a HD logo for vector legacy (T335162)]]
19:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active for testing
19:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active for testing
19:29 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict.discovery.wmnet on all recursors
19:29 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict.discovery.wmnet on all recursors
18:44 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
17:51 wfan: payments-wiki upgraded from a6288840 to 2a4c450d
17:43 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
17:36 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
17:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5015.mgmt.eqsin.wmnet with reboot policy FORCED
17:32 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5015.mgmt.eqsin.wmnet with reboot policy FORCED
17:26 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5014.mgmt.eqsin.wmnet with reboot policy FORCED
17:20 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5014.mgmt.eqsin.wmnet with reboot policy FORCED
17:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
17:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:04 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
17:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5016
16:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5016
16:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5015
16:36 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5015
16:36 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5014
16:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5014
16:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5013
16:34 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5013
15:48 ejegg: payments-wiki upgraded from 25d867dc to a6288840
15:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:14 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
15:11 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
15:09 robh@cumin1001: START - Cookbook sre.dns.netbox
15:09 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:09 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
15:08 vgutierrez: restarting haproxy on cp3064 - T334448
15:07 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
15:05 robh@cumin1001: START - Cookbook sre.dns.netbox
14:59 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1003.wikimedia.org to gitlab1004.wikimedia.org
14:58 inflatador: bking@wdqs1015 repool wdqs1015 as lag is back down
14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
14:47 mutante: DNS - new project language "btm" added - Mandailing language is spoken in Indonesia - https://en.wikipedia.org/wiki/Mandailing_language
14:31 herron: re-enabled icinga meta monitoring on wikitech-static T333837
14:07 herron: disabled icinga meta monitoring on wikitech-static T333837
14:07 herron: beginning alert host failover from alert2001 to alert1001 T333837
13:40 dcausse: repooling wdqs1005
13:32 claime: Deployed push-notifications production for switch to mw-api-int - T334061
13:32 moritzm: installing libxml2 security updates on bullseye
13:27 urbanecm@deploy2002: Finished scap: Backport for gerrit:910723Update InterwikiSortOrders (T335019) (duration: 06m 59s)
13:24 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1003.wikimedia.org to gitlab1004.wikimedia.org
13:24 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
13:20 urbanecm@deploy2002: Started scap: Backport for gerrit:910723Update InterwikiSortOrders (T335019)
13:15 urbanecm@deploy2002: Finished scap: Backport for gerrit:910018Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090) (duration: 11m 02s)
13:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
13:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
13:13 claime: Deploying push-notifications production for switch to mw-api-int - T334061
13:05 urbanecm@deploy2002: urbanecm and anzx: Backport for gerrit:910018Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:04 urbanecm@deploy2002: Started scap: Backport for gerrit:910018Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090)
12:56 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:29 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
12:28 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
12:28 claime: Deploying push-notifications staging for switch to mw-api-int - T334061
11:23 cgoubert@cumin1001: conftool action : set/weight=30; selector: dc=codfw,cluster=api_appserver,service=canary
11:21 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=appserver,service=canary
11:19 cgoubert@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=canary
11:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:17 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
11:14 cgoubert@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,service=canary
11:13 cgoubert@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=parsoid,service=canary
11:13 claime: Fixing appserver clusters canary weights
10:56 jynus: deployed new ssh key for jcrespo on production cluster
10:29 claime: Datacenter switchover live testing setting db to read-only and back in eqiad successful - T327920
10:29 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
10:29 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
10:29 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
10:29 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
10:27 claime: Datacenter switchover live testing setting db to read-only and back in eqiad - T327920
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ilooremeta out of all services on: 801 hosts
10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ilooremeta out of all services on: 801 hosts
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ilooremeta out of all services on: 1262 hosts
10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ilooremeta out of all services on: 1262 hosts
10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hghani out of all services on: 1262 hosts
10:20 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hghani out of all services on: 1262 hosts
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hghani out of all services on: 801 hosts
10:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hghani out of all services on: 801 hosts
10:17 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hibashaath out of all services on: 801 hosts
10:17 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hibashaath out of all services on: 801 hosts
10:16 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hibashaath out of all services on: 1262 hosts
10:14 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hibashaath out of all services on: 1262 hosts
10:11 marostegui: Enable replication eqiad -> codfw on s1 dbmaint eqiad T335266
10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 38 hosts with reason: Enabling replication T335266
10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 38 hosts with reason: Enabling replication T335266
10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 35 hosts with reason: Enabling replication T335266
10:08 marostegui: Enable replication eqiad -> codfw on s4 dbmaint eqiad T335266
10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 35 hosts with reason: Enabling replication T335266
10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 24 hosts with reason: Enabling replication T335266
10:06 marostegui: Enable replication eqiad -> codfw on s3 dbmaint eqiad T335266
10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 24 hosts with reason: Enabling replication T335266
10:01 moritzm: installing git security updates
09:55 slyngs: Update LDAP schema wmf-user: T148048
09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Enabling replication T335266
09:55 marostegui: Enable replication eqiad -> codfw on s7 dbmaint eqiad T335266
09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Enabling replication T335266
09:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1110.eqiad.wmnet
09:21 moritzm: upgrade php-excimer on mw canaries to 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) T332964
08:45 moritzm: uploaded php-excimer 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) to component/php74 for buster-wikimedia T332964
08:44 marostegui: Enable replication eqiad -> codfw on s8 dbmaint eqiad T335266
08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Enabling replication T335266
08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266
08:33 marostegui: Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266
08:32 cgoubert@deploy2002: Finished scap: testing T329857 (duration: 14m 29s)
08:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266
08:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266
08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266
08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266
08:28 marostegui: Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266
08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266
08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266
08:26 marostegui: Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266
08:25 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet
08:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
08:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
08:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266
08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266
08:20 marostegui: Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266
08:18 cgoubert@deploy2002: Started scap: testing T329857
08:17 marostegui: Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266
08:14 claime: Deploying 909302 on deploy2002 for T329857
08:10 claime: Disabling puppet on deploy2002 - T329857
08:09 claime: Deploying 909302 on deploy1002 for T329857
08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266
08:08 marostegui: Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266
08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266
08:07 marostegui: Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266
08:06 marostegui: Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266
08:05 marostegui: Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266
07:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw
07:51 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw
07:45 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
07:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw
07:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw
07:39 dcausse: restarting blazegraph on wdqs1005 (stuck for 3+days)
07:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw
07:36 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw
07:24 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
07:21 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
07:06 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye

2023-04-22

05:41 joe: <thumbor/codfw>$ helmfile --state-values-set roll_restart=1 -e codfw sync
05:40 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
05:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
05:39 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
05:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
05:15 hashar@deploy2002: Finished deploy [integration/docroot@b816911]: Update Grafana URL (duration: 00m 11s)
05:15 hashar@deploy2002: Started deploy [integration/docroot@b816911]: Update Grafana URL
05:10 joe: sudo cumin -b 1 -s 20 'A:swift-fe-codfw' 'systemctl restart swift-proxy.service'
04:33 vgutierrez: restart haproxy on cp1087 - T334448

2023-04-21

18:27 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
18:25 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
15:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:910780Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130) (duration: 10m 01s)
15:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for gerrit:910780Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:47 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:910780Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130)
12:18 duesen: reverted monky-patch, mwdebug2001 and deploy2002 are back to wmf/1.41.0-wmf.5 (T335183)
11:56 duesen: monky-patching Ib11a871ff on mwdebug2001 to investigate T335183
09:03 Amir1: finish of the wikibase populate sites table
08:35 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
03:19 eileen: civicrm upgraded from 5b63c2b2 to 0fad720a
03:11 eileen: civicrm upgraded from a2e7c079 to 5b63c2b2
01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2011.codfw.wmnet with OS bullseye
01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
01:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2010.codfw.wmnet with OS bullseye
00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
00:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bullseye

2023-04-20

22:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2011']
22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2011']
22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2011']
22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2011']
21:47 zabe@deploy2002: Finished scap: Backport for gerrit:910530Update interwiki cache (duration: 06m 26s)
21:42 zabe@deploy2002: zabe: Backport for gerrit:910530Update interwiki cache synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:41 zabe@deploy2002: Started scap: Backport for gerrit:910530Update interwiki cache
21:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
21:35 zabe@deploy2002: Finished scap: T334394 (duration: 07m 46s)
21:28 zabe@deploy2002: zabe: T334394 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
21:27 zabe@deploy2002: Started scap: T334394
21:26 zabe: create Wikinews Gungbe # T334394
21:22 inflatador: bking@cumin1001 repool wdqs2012 T331300
21:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
21:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:18 inflatador: bking@cumin1001 depool wdqs2009 for data xfer T331300
21:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2011.mgmt.codfw.wmnet with reboot policy FORCED
20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
20:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
20:31 thcipriani@deploy2002: Finished scap: Backport for gerrit:910499Fix TypeError: trigger.attr is not a function (T335148) (duration: 09m 53s)
20:22 thcipriani@deploy2002: nray and thcipriani: Backport for gerrit:910499Fix TypeError: trigger.attr is not a function (T335148) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:21 thcipriani@deploy2002: Started scap: Backport for gerrit:910499Fix TypeError: trigger.attr is not a function (T335148)
19:58 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:54 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:47 zabe@deploy2002: Finished scap: Backport for gerrit:910529Update interwiki cache (duration: 06m 47s)
19:41 zabe@deploy2002: zabe: Backport for gerrit:910529Update interwiki cache synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
19:40 zabe@deploy2002: Started scap: Backport for gerrit:910529Update interwiki cache
19:34 zabe@deploy2002: Finished scap: T333266 (duration: 07m 04s)
19:29 zabe@deploy2002: zabe: T333266 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
19:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:27 zabe@deploy2002: Started scap: T333266
19:27 zabe: create Wiktionary Kabardian # T333266
19:16 inflatador: bking@cumin1001 depool wdqs2012.codfw.wmnet for data xfer T331300
19:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:15 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:13 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:58 zabe@deploy2002: Finished scap: Backport for gerrit:910569Disable VE as default editor on kcgwiktionary (T334730), gerrit:910570db-production: Fix indentation, gerrit:910528Update interwiki cache (duration: 07m 06s)
18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host backup2011.mgmt.codfw.wmnet with reboot policy FORCED
18:52 zabe@deploy2002: zabe: Backport for gerrit:910569Disable VE as default editor on kcgwiktionary (T334730), gerrit:910570db-production: Fix indentation, gerrit:910528Update interwiki cache synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add backup2011 DNS entries - pt1979@cumin2002"
18:51 zabe@deploy2002: Started scap: Backport for gerrit:910569Disable VE as default editor on kcgwiktionary (T334730), gerrit:910570db-production: Fix indentation, gerrit:910528Update interwiki cache
18:50 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
18:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add backup2011 DNS entries - pt1979@cumin2002"
18:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bullseye
18:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:36 zabe@deploy2002: Finished scap: T335016 (duration: 07m 28s)
18:30 zabe@deploy2002: zabe: T335016 synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
18:29 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
18:29 zabe@deploy2002: Started scap: T335016
18:29 zabe: create Wikipedia Fante # T335016
18:26 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
18:17 zabe@deploy2002: Finished scap: Backport for gerrit:910494Add messages for Fante Wikipedia (fatwiki) (T335016), gerrit:910496Localisation updates from https://translatewiki.net., gerrit:910495Localisation updates from https://translatewiki.net. (duration: 23m 58s)
18:10 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
18:05 zabe@deploy2002: zabe: Backport for gerrit:910494Add messages for Fante Wikipedia (fatwiki) (T335016), gerrit:910496Localisation updates from https://translatewiki.net., gerrit:910495Localisation updates from https://translatewiki.net. synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
18:01 sukhe: enable puppet and run agent in A:lvs and A:eqiad CR 910563
18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bullseye
18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:54 sukhe: disable puppet in A:lvs and A:eqiad to test CR 910563
17:53 zabe@deploy2002: Started scap: Backport for gerrit:910494Add messages for Fante Wikipedia (fatwiki) (T335016), gerrit:910496Localisation updates from https://translatewiki.net., gerrit:910495Localisation updates from https://translatewiki.net.
17:48 zabe@deploy2002: Finished scap: create kcgwiktionary (T334730) (duration: 08m 08s)
17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
17:41 zabe@deploy2002: zabe: create kcgwiktionary (T334730) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
17:39 zabe@deploy2002: Started scap: create kcgwiktionary (T334730)
17:39 zabe: create Wiktionary Tyap # T334730
17:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS bullseye
17:02 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001
16:31 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001
16:25 SandraEbele: Deployed refinery using scap, then deployed onto hdfs as part of weekly deployment train.
16:23 claime: repooling parse2010 after fix - T335138
16:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse2010.codfw.wmnet
16:22 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse2010.codfw.wmnet
16:20 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-airflow1006.eqiad.wmnet with OS buster
16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['bast2003']
16:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
16:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['bast2003']
16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
16:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['bast2003']
16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host bast2003.mgmt.codfw.wmnet with reboot policy FORCED
16:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setting sretest2001 back to offine - pt1979@cumin2002"
16:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setting sretest2001 back to offine - pt1979@cumin2002"
16:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:01 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
15:59 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1631dea] (duration: 01m 29s)
15:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host bast2003.mgmt.codfw.wmnet with reboot policy FORCED
15:58 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1631dea]
15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add bast2003 DNS entries - pt1979@cumin2002"
15:56 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea] (thin): Regular analytics weekly train THIN [analytics/refinery@1631dea] (duration: 00m 08s)
15:56 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea] (thin): Regular analytics weekly train THIN [analytics/refinery@1631dea]
15:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add bast2003 DNS entries - pt1979@cumin2002"
15:54 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea]: Regular analytics weekly train [analytics/refinery@1631dea] (duration: 08m 30s)
15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:48 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1006.eqiad.wmnet with OS buster
15:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:46 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea]: Regular analytics weekly train [analytics/refinery@1631dea]
15:44 SandraEbele: deploying weekly deployment train for analytics refinery.
15:38 sukhe: sudo cumin -b1 -s1200 'A:cp and A:eqsin' 'varnish-frontend-restart'
15:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1006.eqiad.wmnet
15:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
15:36 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
15:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
15:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
15:31 ejegg: payments-wiki upgraded from 66be66e0 to 744d82c6
15:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
15:27 sukhe: run puppet manually in A:cp and A:eqsin to pick up CR 910005
15:26 sukhe: re-enable puppet in A:cp and A:eqsin
15:23 sukhe: varnish-frontend-restart cp5022
15:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
14:56 sukhe: disable puppet in A:cp and A:eqsin to test CR 910005
14:50 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:910479Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130) (duration: 07m 40s)
14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1006.eqiad.wmnet on all recursors
14:49 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1006.eqiad.wmnet on all recursors
14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
14:47 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
14:45 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
14:45 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1006.eqiad.wmnet
14:43 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for gerrit:910479Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
14:42 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:910479Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130)
14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse2010.codfw.wmnet with reason: PSU failure
14:39 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse2010.codfw.wmnet with reason: PSU failure
14:33 claime: depooling parse2010 for PSU failure
13:35 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
13:33 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
12:44 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
12:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
12:12 ladsgroup@deploy2002: Finished scap: Backport for gerrit:888708Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748) (duration: 07m 48s)
12:05 ladsgroup@deploy2002: aklapper and ladsgroup: Backport for gerrit:888708Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
12:04 ladsgroup@deploy2002: Started scap: Backport for gerrit:888708Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748)
10:57 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:57 moritzm: installing openvswitch security updates on bullseye
10:57 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:43 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
10:41 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
09:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:40 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:35 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:35 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.18 in codfw
09:04 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.18 in codfw
08:57 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:17 jnuche@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.5 refs T330211
07:24 moritzm: uploaded imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 to apt.wikimedia.org for buster-wikimedia T328901
06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
06:19 moritzm: installing tomcat9 security updates
06:15 joe: enabled requestctl rule for T332061
06:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
06:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
03:49 eileen: civicrm upgraded from efdf9434 to a2e7c079
00:02 mutante: LDAP - adding uid fnavas-foundation to group wmf - T331482

2023-04-19

23:36 zabe@deploy2002: Finished scap: gerrit:910078 (duration: 06m 40s)
23:29 zabe@deploy2002: Started scap: gerrit:910078
23:15 tzatziki: removing 1 file for legal compliance
23:02 tzatziki: removing 3 files for legal compliance
22:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2022.codfw.wmnet with OS bullseye
22:10 tzatziki: removing 5 files for legal compliance
21:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
20:16 zabe@deploy2002: Finished scap: Backport for gerrit:909884Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921) (duration: 07m 26s)
20:10 zabe@deploy2002: zabe: Backport for gerrit:909884Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:09 zabe@deploy2002: Started scap: Backport for gerrit:909884Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921)
19:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2006.wikimedia.org with OS bullseye
19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
19:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
19:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
19:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns2005.wikimedia.org with OS bullseye
19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bullseye
18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
18:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
18:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
18:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
18:28 sukhe@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 286m 39s)
18:25 sukhe: restart pybal on lvs1017 to pick up bgp-med change: T321309
18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye
18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
18:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bullseye
17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2006']
17:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2006']
17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2005']
17:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
17:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2005']
17:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2006']
17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2005']
17:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2006']
17:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2004']
17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2004']
17:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2006.mgmt.codfw.wmnet with reboot policy FORCED
17:21 sukhe: stop pybal in lvs1017 for reimaging
17:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2006.mgmt.codfw.wmnet with reboot policy FORCED
17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2005.mgmt.codfw.wmnet with reboot policy FORCED
17:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2005.mgmt.codfw.wmnet with reboot policy FORCED
17:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
16:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
16:39 sukhe: restart pybal on lvs1018 to remove bgp-med change: T321309
16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
16:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1018
16:35 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1018
16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1018.eqiad.wmnet with OS bullseye
16:17 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:09 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:09 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:09 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:06 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: host reimage
16:06 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:05 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:05 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:04 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:04 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:04 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:04 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
16:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1018.eqiad.wmnet with reason: host reimage
15:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1018.eqiad.wmnet with OS bullseye
15:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1018
15:47 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1018
15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
15:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
15:36 mutante: DNS - added new project language "fat" (fat.wikipedia.org) - the "Fante" language, a dialect of Akan, spoken by 2.8 million people in Ghana - https://en.wikipedia.org/wiki/Fante_dialect T335016
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for dns200[4-6] - pt1979@cumin2002"
15:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for dns200[4-6] - pt1979@cumin2002"
15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:20 sukhe: stop pybal on lvs1018 for reimaging: T321309
14:54 sukhe: restart pybal on lvs1019 to pick up bpg-med change
14:42 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1019
14:42 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1019
14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS bullseye
14:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
14:19 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
14:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS bullseye
13:41 sukhe@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309
13:41 sukhe@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 00m 16s)
13:41 sukhe@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309
13:28 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
13:25 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
13:16 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:16 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:14 taavi@deploy2002: Finished scap: Backport for gerrit:909875cleanup: Remove duplicate permission config of confirmed users (duration: 11m 32s)
13:09 moritzm: installing lldpd security updates
13:04 taavi@deploy2002: func and taavi: Backport for gerrit:909875cleanup: Remove duplicate permission config of confirmed users synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:02 taavi@deploy2002: Started scap: Backport for gerrit:909875cleanup: Remove duplicate permission config of confirmed users
11:18 hnowlan@puppetmaster1001: conftool action : set/weight=7; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1126.eqiad.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1126.eqiad.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in codfw
10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
10:43 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in codfw
10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in eqiad
10:40 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in eqiad
10:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.e4 in eqiad
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47260 and previous config saved to /var/cache/conftool/dbconfig/20230419-103603-ladsgroup.json
10:34 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.e4 in eqiad
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47259 and previous config saved to /var/cache/conftool/dbconfig/20230419-102057-ladsgroup.json
10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47258 and previous config saved to /var/cache/conftool/dbconfig/20230419-101614-root.json
10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
10:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47257 and previous config saved to /var/cache/conftool/dbconfig/20230419-100746-root.json
10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47256 and previous config saved to /var/cache/conftool/dbconfig/20230419-100550-ladsgroup.json
10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47255 and previous config saved to /var/cache/conftool/dbconfig/20230419-100109-root.json
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47254 and previous config saved to /var/cache/conftool/dbconfig/20230419-095807-root.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P47253 and previous config saved to /var/cache/conftool/dbconfig/20230419-095316-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47252 and previous config saved to /var/cache/conftool/dbconfig/20230419-095241-root.json
09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47250 and previous config saved to /var/cache/conftool/dbconfig/20230419-095044-ladsgroup.json
09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47249 and previous config saved to /var/cache/conftool/dbconfig/20230419-094836-ladsgroup.json
09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47248 and previous config saved to /var/cache/conftool/dbconfig/20230419-094604-root.json
09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47247 and previous config saved to /var/cache/conftool/dbconfig/20230419-094302-root.json
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P47246 and previous config saved to /var/cache/conftool/dbconfig/20230419-093812-root.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47245 and previous config saved to /var/cache/conftool/dbconfig/20230419-093737-root.json
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47244 and previous config saved to /var/cache/conftool/dbconfig/20230419-093059-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47243 and previous config saved to /var/cache/conftool/dbconfig/20230419-092757-root.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P47242 and previous config saved to /var/cache/conftool/dbconfig/20230419-092307-root.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47241 and previous config saved to /var/cache/conftool/dbconfig/20230419-092232-root.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47240 and previous config saved to /var/cache/conftool/dbconfig/20230419-091554-root.json
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47239 and previous config saved to /var/cache/conftool/dbconfig/20230419-091252-root.json
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P47238 and previous config saved to /var/cache/conftool/dbconfig/20230419-090802-root.json
09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47237 and previous config saved to /var/cache/conftool/dbconfig/20230419-090727-root.json
09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
09:01 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
09:00 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47236 and previous config saved to /var/cache/conftool/dbconfig/20230419-090050-root.json
09:00 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
08:59 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
08:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47235 and previous config saved to /var/cache/conftool/dbconfig/20230419-085748-root.json
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P47234 and previous config saved to /var/cache/conftool/dbconfig/20230419-085257-root.json
08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47233 and previous config saved to /var/cache/conftool/dbconfig/20230419-085222-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P47232 and previous config saved to /var/cache/conftool/dbconfig/20230419-084545-root.json
08:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47231 and previous config saved to /var/cache/conftool/dbconfig/20230419-084243-root.json
08:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 9%: Pooling', diff saved to https://phabricator.wikimedia.org/P47230 and previous config saved to /var/cache/conftool/dbconfig/20230419-083753-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P47229 and previous config saved to /var/cache/conftool/dbconfig/20230419-083717-root.json
08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:01:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Test
08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:01:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Test
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P47228 and previous config saved to /var/cache/conftool/dbconfig/20230419-083040-root.json
08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47227 and previous config saved to /var/cache/conftool/dbconfig/20230419-082738-root.json
08:24 jnuche@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.5 refs T330211 (duration: 05m 43s)
08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P47226 and previous config saved to /var/cache/conftool/dbconfig/20230419-082345-root.json
08:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 8%: Pooling', diff saved to https://phabricator.wikimedia.org/P47225 and previous config saved to /var/cache/conftool/dbconfig/20230419-082247-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P47224 and previous config saved to /var/cache/conftool/dbconfig/20230419-082213-root.json
08:18 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.5 refs T330211
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P47223 and previous config saved to /var/cache/conftool/dbconfig/20230419-081535-root.json
08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P47222 and previous config saved to /var/cache/conftool/dbconfig/20230419-080841-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 7%: Pooling', diff saved to https://phabricator.wikimedia.org/P47221 and previous config saved to /var/cache/conftool/dbconfig/20230419-080742-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P47220 and previous config saved to /var/cache/conftool/dbconfig/20230419-080708-root.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47219 and previous config saved to /var/cache/conftool/dbconfig/20230419-080030-root.json
07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P47218 and previous config saved to /var/cache/conftool/dbconfig/20230419-075336-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 6%: Pooling', diff saved to https://phabricator.wikimedia.org/P47217 and previous config saved to /var/cache/conftool/dbconfig/20230419-075237-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47216 and previous config saved to /var/cache/conftool/dbconfig/20230419-075203-root.json
07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P47215 and previous config saved to /var/cache/conftool/dbconfig/20230419-073831-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P47214 and previous config saved to /var/cache/conftool/dbconfig/20230419-073732-root.json
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P47213 and previous config saved to /var/cache/conftool/dbconfig/20230419-072326-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P47212 and previous config saved to /var/cache/conftool/dbconfig/20230419-072228-root.json
07:15 XioNoX: update TLS cert on pfw - T334676
07:13 kartik@deploy2002: Finished scap: Backport for gerrit:909607Enable Content/Section translation on 6 Wikipedias (T327102) (duration: 09m 33s)
07:10 XioNoX: push pfw policies - T334983
07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47211 and previous config saved to /var/cache/conftool/dbconfig/20230419-070920-ladsgroup.json
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P47210 and previous config saved to /var/cache/conftool/dbconfig/20230419-070822-root.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P47209 and previous config saved to /var/cache/conftool/dbconfig/20230419-070723-root.json
07:05 kartik@deploy2002: kartik: Backport for gerrit:909607Enable Content/Section translation on 6 Wikipedias (T327102) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:03 kartik@deploy2002: Started scap: Backport for gerrit:909607Enable Content/Section translation on 6 Wikipedias (T327102)
06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47208 and previous config saved to /var/cache/conftool/dbconfig/20230419-065413-ladsgroup.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P47207 and previous config saved to /var/cache/conftool/dbconfig/20230419-065317-root.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P47206 and previous config saved to /var/cache/conftool/dbconfig/20230419-065218-root.json
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 T335011', diff saved to https://phabricator.wikimedia.org/P47205 and previous config saved to /var/cache/conftool/dbconfig/20230419-064122-root.json
06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47204 and previous config saved to /var/cache/conftool/dbconfig/20230419-063907-ladsgroup.json
06:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
06:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P47203 and previous config saved to /var/cache/conftool/dbconfig/20230419-063812-root.json
06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P47202 and previous config saved to /var/cache/conftool/dbconfig/20230419-063713-root.json
06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47201 and previous config saved to /var/cache/conftool/dbconfig/20230419-062401-ladsgroup.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P47200 and previous config saved to /var/cache/conftool/dbconfig/20230419-062307-root.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P47197 and previous config saved to /var/cache/conftool/dbconfig/20230419-062123-root.json
06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47196 and previous config saved to /var/cache/conftool/dbconfig/20230419-062007-ladsgroup.json
06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47195 and previous config saved to /var/cache/conftool/dbconfig/20230419-061944-ladsgroup.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1219 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47194 and previous config saved to /var/cache/conftool/dbconfig/20230419-061414-marostegui.json
06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P47193 and previous config saved to /var/cache/conftool/dbconfig/20230419-060803-root.json
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47192 and previous config saved to /var/cache/conftool/dbconfig/20230419-060437-ladsgroup.json
05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47191 and previous config saved to /var/cache/conftool/dbconfig/20230419-054931-ladsgroup.json
05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47190 and previous config saved to /var/cache/conftool/dbconfig/20230419-053425-ladsgroup.json
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47189 and previous config saved to /var/cache/conftool/dbconfig/20230419-053027-ladsgroup.json
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47188 and previous config saved to /var/cache/conftool/dbconfig/20230419-053003-ladsgroup.json
05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47187 and previous config saved to /var/cache/conftool/dbconfig/20230419-051457-ladsgroup.json
04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47186 and previous config saved to /var/cache/conftool/dbconfig/20230419-045951-ladsgroup.json
04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47185 and previous config saved to /var/cache/conftool/dbconfig/20230419-044445-ladsgroup.json
04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47184 and previous config saved to /var/cache/conftool/dbconfig/20230419-044050-ladsgroup.json
04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47183 and previous config saved to /var/cache/conftool/dbconfig/20230419-044027-ladsgroup.json
04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47182 and previous config saved to /var/cache/conftool/dbconfig/20230419-042520-ladsgroup.json
04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47181 and previous config saved to /var/cache/conftool/dbconfig/20230419-041013-ladsgroup.json
03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47180 and previous config saved to /var/cache/conftool/dbconfig/20230419-035507-ladsgroup.json
03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47178 and previous config saved to /var/cache/conftool/dbconfig/20230419-035112-ladsgroup.json
03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47177 and previous config saved to /var/cache/conftool/dbconfig/20230419-035048-ladsgroup.json
03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47176 and previous config saved to /var/cache/conftool/dbconfig/20230419-033542-ladsgroup.json
03:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47175 and previous config saved to /var/cache/conftool/dbconfig/20230419-032036-ladsgroup.json
03:12 ejegg: payments-wiki upgraded from a01e5ae8 to 66be66e0
03:11 ejegg: civicrm upgraded from 39bbe8cc to efdf9434
03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47174 and previous config saved to /var/cache/conftool/dbconfig/20230419-030530-ladsgroup.json
03:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47173 and previous config saved to /var/cache/conftool/dbconfig/20230419-030234-ladsgroup.json
03:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
03:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
03:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
03:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
03:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47172 and previous config saved to /var/cache/conftool/dbconfig/20230419-030205-ladsgroup.json
02:47 ejegg: civicrm upgraded from dab8912d to 39bbe8cc
02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47171 and previous config saved to /var/cache/conftool/dbconfig/20230419-024658-ladsgroup.json
02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47170 and previous config saved to /var/cache/conftool/dbconfig/20230419-023152-ladsgroup.json
02:19 cstone: payments-wiki upgraded from c01a32c4 to a01e5ae8
02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47168 and previous config saved to /var/cache/conftool/dbconfig/20230419-021646-ladsgroup.json
02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47167 and previous config saved to /var/cache/conftool/dbconfig/20230419-021051-ladsgroup.json
02:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
02:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47166 and previous config saved to /var/cache/conftool/dbconfig/20230419-021028-ladsgroup.json
02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1075.eqiad.wmnet with OS bullseye
02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47165 and previous config saved to /var/cache/conftool/dbconfig/20230419-015522-ladsgroup.json
01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1073.eqiad.wmnet with OS bullseye
01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1075.eqiad.wmnet with reason: host reimage
01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47164 and previous config saved to /var/cache/conftool/dbconfig/20230419-014016-ladsgroup.json
01:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1075.eqiad.wmnet with reason: host reimage
01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1074.eqiad.wmnet with OS bullseye
01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47163 and previous config saved to /var/cache/conftool/dbconfig/20230419-012509-ladsgroup.json
01:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1075.eqiad.wmnet with OS bullseye
01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47162 and previous config saved to /var/cache/conftool/dbconfig/20230419-012114-ladsgroup.json
01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
01:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1072.eqiad.wmnet with OS bullseye
01:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47161 and previous config saved to /var/cache/conftool/dbconfig/20230419-011754-ladsgroup.json
01:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1074.eqiad.wmnet with reason: host reimage
01:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1074.eqiad.wmnet with reason: host reimage
01:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1073.eqiad.wmnet with reason: host reimage
01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P47160 and previous config saved to /var/cache/conftool/dbconfig/20230419-010247-ladsgroup.json
01:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1073.eqiad.wmnet with reason: host reimage
00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1072.eqiad.wmnet with reason: host reimage
00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P47159 and previous config saved to /var/cache/conftool/dbconfig/20230419-004741-ladsgroup.json
00:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1072.eqiad.wmnet with reason: host reimage
00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1074.eqiad.wmnet with OS bullseye
00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1073.eqiad.wmnet with OS bullseye
00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47158 and previous config saved to /var/cache/conftool/dbconfig/20230419-003235-ladsgroup.json
00:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47157 and previous config saved to /var/cache/conftool/dbconfig/20230419-002952-ladsgroup.json
00:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47156 and previous config saved to /var/cache/conftool/dbconfig/20230419-002929-ladsgroup.json
00:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1072.eqiad.wmnet with OS bullseye
00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
00:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47155 and previous config saved to /var/cache/conftool/dbconfig/20230419-001423-ladsgroup.json
00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
00:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
00:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED

2023-04-18

23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47154 and previous config saved to /var/cache/conftool/dbconfig/20230418-235916-ladsgroup.json
23:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
23:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
23:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47153 and previous config saved to /var/cache/conftool/dbconfig/20230418-234410-ladsgroup.json
23:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47152 and previous config saved to /var/cache/conftool/dbconfig/20230418-234032-ladsgroup.json
23:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47151 and previous config saved to /var/cache/conftool/dbconfig/20230418-234008-ladsgroup.json
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47150 and previous config saved to /var/cache/conftool/dbconfig/20230418-232502-ladsgroup.json
23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47149 and previous config saved to /var/cache/conftool/dbconfig/20230418-230956-ladsgroup.json
22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47148 and previous config saved to /var/cache/conftool/dbconfig/20230418-225449-ladsgroup.json
22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47147 and previous config saved to /var/cache/conftool/dbconfig/20230418-225211-ladsgroup.json
22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47146 and previous config saved to /var/cache/conftool/dbconfig/20230418-225148-ladsgroup.json
22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47145 and previous config saved to /var/cache/conftool/dbconfig/20230418-223642-ladsgroup.json
22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47144 and previous config saved to /var/cache/conftool/dbconfig/20230418-222135-ladsgroup.json
22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47143 and previous config saved to /var/cache/conftool/dbconfig/20230418-220629-ladsgroup.json
22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47142 and previous config saved to /var/cache/conftool/dbconfig/20230418-220350-ladsgroup.json
22:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47141 and previous config saved to /var/cache/conftool/dbconfig/20230418-220327-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47140 and previous config saved to /var/cache/conftool/dbconfig/20230418-214820-ladsgroup.json
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47139 and previous config saved to /var/cache/conftool/dbconfig/20230418-213314-ladsgroup.json
21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47138 and previous config saved to /var/cache/conftool/dbconfig/20230418-211808-ladsgroup.json
21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47137 and previous config saved to /var/cache/conftool/dbconfig/20230418-211529-ladsgroup.json
21:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47136 and previous config saved to /var/cache/conftool/dbconfig/20230418-211354-ladsgroup.json
20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47134 and previous config saved to /var/cache/conftool/dbconfig/20230418-205848-ladsgroup.json
20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47133 and previous config saved to /var/cache/conftool/dbconfig/20230418-204339-ladsgroup.json
20:32 TheresNoTime: close UTC late backport window
20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47132 and previous config saved to /var/cache/conftool/dbconfig/20230418-202833-ladsgroup.json
20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47131 and previous config saved to /var/cache/conftool/dbconfig/20230418-202554-ladsgroup.json
20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47130 and previous config saved to /var/cache/conftool/dbconfig/20230418-202530-ladsgroup.json
20:25 samtar@deploy2002: Finished scap: Backport for gerrit:905712Remove weird VisualEditor config hack from 2015, gerrit:909747Simplify some more VisualEditor configuration (duration: 10m 32s)
20:16 samtar@deploy2002: matmarex and samtar: Backport for gerrit:905712Remove weird VisualEditor config hack from 2015, gerrit:909747Simplify some more VisualEditor configuration synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:14 samtar@deploy2002: Started scap: Backport for gerrit:905712Remove weird VisualEditor config hack from 2015, gerrit:909747Simplify some more VisualEditor configuration
20:13 samtar@deploy2002: Finished scap: Backport for gerrit:909710Enable visual enhancements on pages using __NEWSECTIONLINK__ on dewiki (T318596) (duration: 07m 49s)
20:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47129 and previous config saved to /var/cache/conftool/dbconfig/20230418-201024-ladsgroup.json
20:06 samtar@deploy2002: matmarex and samtar: Backport for gerrit:909710Enable visual enhancements on pages using __NEWSECTIONLINK__ on dewiki (T318596) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:05 samtar@deploy2002: Started scap: Backport for gerrit:909710Enable visual enhancements on pages using __NEWSECTIONLINK__ on dewiki (T318596)
19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47126 and previous config saved to /var/cache/conftool/dbconfig/20230418-195518-ladsgroup.json
19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47125 and previous config saved to /var/cache/conftool/dbconfig/20230418-194401-ladsgroup.json
19:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
19:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47124 and previous config saved to /var/cache/conftool/dbconfig/20230418-194012-ladsgroup.json
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47123 and previous config saved to /var/cache/conftool/dbconfig/20230418-193832-ladsgroup.json
19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47122 and previous config saved to /var/cache/conftool/dbconfig/20230418-193809-ladsgroup.json
19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P47121 and previous config saved to /var/cache/conftool/dbconfig/20230418-192855-ladsgroup.json
19:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P47120 and previous config saved to /var/cache/conftool/dbconfig/20230418-192302-ladsgroup.json
19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P47119 and previous config saved to /var/cache/conftool/dbconfig/20230418-191348-ladsgroup.json
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P47118 and previous config saved to /var/cache/conftool/dbconfig/20230418-190756-ladsgroup.json
19:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47117 and previous config saved to /var/cache/conftool/dbconfig/20230418-185842-ladsgroup.json
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47116 and previous config saved to /var/cache/conftool/dbconfig/20230418-185627-ladsgroup.json
18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47115 and previous config saved to /var/cache/conftool/dbconfig/20230418-185604-ladsgroup.json
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47114 and previous config saved to /var/cache/conftool/dbconfig/20230418-185250-ladsgroup.json
18:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
18:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47113 and previous config saved to /var/cache/conftool/dbconfig/20230418-185010-ladsgroup.json
18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudswift100[1-2] - pt1979@cumin2002"
18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudswift100[1-2] - pt1979@cumin2002"
18:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
18:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
18:43 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1020
18:43 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1020
18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P47112 and previous config saved to /var/cache/conftool/dbconfig/20230418-184058-ladsgroup.json
18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:28 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:26 taavi@deploy2002: Finished scap: 909693 and 909700 (duration: 07m 36s)
18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P47111 and previous config saved to /var/cache/conftool/dbconfig/20230418-182551-ladsgroup.json
18:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:19 taavi@deploy2002: taavi: 909693 and 909700 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
18:18 taavi@deploy2002: Started scap: 909693 and 909700
18:15 taavi@deploy2002: Finished scap: Backport for gerrit:909639Add temporary message for Graph being disabled (T334895), gerrit:909640Add temporary message for Graph being disabled (T334895), gerrit:909641Add temporary tracking category for Graph being disabled (T334895), gerrit:909642Add temporary tracking category for Graph being disabled (T334895) (duration: 37m 33s)
18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47110 and previous config saved to /var/cache/conftool/dbconfig/20230418-181045-ladsgroup.json
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47109 and previous config saved to /var/cache/conftool/dbconfig/20230418-180830-ladsgroup.json
18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47108 and previous config saved to /var/cache/conftool/dbconfig/20230418-180807-ladsgroup.json
17:59 taavi@deploy2002: taavi: Backport for gerrit:909639Add temporary message for Graph being disabled (T334895), gerrit:909640Add temporary message for Graph being disabled (T334895), gerrit:909641Add temporary tracking category for Graph being disabled (T334895), gerrit:909642Add temporary tracking category for Graph being disabled (T334895) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P47107 and previous config saved to /var/cache/conftool/dbconfig/20230418-175301-ladsgroup.json
17:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:47 jclark@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
17:46 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P47106 and previous config saved to /var/cache/conftool/dbconfig/20230418-173754-ladsgroup.json
17:37 taavi@deploy2002: Started scap: Backport for gerrit:909639Add temporary message for Graph being disabled (T334895), gerrit:909640Add temporary message for Graph being disabled (T334895), gerrit:909641Add temporary tracking category for Graph being disabled (T334895), gerrit:909642Add temporary tracking category for Graph being disabled (T334895)
17:26 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@3b8ab60]: (no justification provided) (duration: 00m 12s)
17:26 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@3b8ab60]: (no justification provided)
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47105 and previous config saved to /var/cache/conftool/dbconfig/20230418-172247-ladsgroup.json
17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47104 and previous config saved to /var/cache/conftool/dbconfig/20230418-172032-ladsgroup.json
17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
17:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47103 and previous config saved to /var/cache/conftool/dbconfig/20230418-171951-ladsgroup.json
17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47102 and previous config saved to /var/cache/conftool/dbconfig/20230418-170445-ladsgroup.json
16:57 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host htmldumper1001.eqiad.wmnet with OS bullseye
16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47101 and previous config saved to /var/cache/conftool/dbconfig/20230418-164939-ladsgroup.json
16:44 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47100 and previous config saved to /var/cache/conftool/dbconfig/20230418-163432-ladsgroup.json
16:33 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on htmldumper1001.eqiad.wmnet with reason: host reimage
16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47099 and previous config saved to /var/cache/conftool/dbconfig/20230418-163217-ladsgroup.json
16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47098 and previous config saved to /var/cache/conftool/dbconfig/20230418-163154-ladsgroup.json
16:29 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on htmldumper1001.eqiad.wmnet with reason: host reimage
16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47097 and previous config saved to /var/cache/conftool/dbconfig/20230418-161648-ladsgroup.json
16:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: Depool from primary DC following network maintenance
16:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
16:09 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
16:09 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: Depool from primary DC following network maintenance
16:08 claime: depooling restbase-async from codfw
16:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: End of maintenance - T333377
16:08 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
16:04 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
16:03 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
16:03 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
16:03 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host htmldumper1001.eqiad.wmnet with OS bullseye
16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47095 and previous config saved to /var/cache/conftool/dbconfig/20230418-160141-ladsgroup.json
16:00 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
16:00 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
15:54 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
15:54 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47093 and previous config saved to /var/cache/conftool/dbconfig/20230418-154635-ladsgroup.json
15:45 sukhe: enable puppet in A:lvs and A:codfw to test CR 908909
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47092 and previous config saved to /var/cache/conftool/dbconfig/20230418-154219-ladsgroup.json
15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47091 and previous config saved to /var/cache/conftool/dbconfig/20230418-154156-ladsgroup.json
15:38 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
15:38 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
15:37 sukhe: disable puppet in A:lvs and A:codfw to test CR 908909
15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P47090 and previous config saved to /var/cache/conftool/dbconfig/20230418-152649-ladsgroup.json
15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P47089 and previous config saved to /var/cache/conftool/dbconfig/20230418-151143-ladsgroup.json
15:07 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
15:07 claime: repooling all eqiad active active services post T333377
14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47088 and previous config saved to /var/cache/conftool/dbconfig/20230418-145637-ladsgroup.json
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47087 and previous config saved to /var/cache/conftool/dbconfig/20230418-145422-ladsgroup.json
14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47086 and previous config saved to /var/cache/conftool/dbconfig/20230418-145359-ladsgroup.json
14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P47085 and previous config saved to /var/cache/conftool/dbconfig/20230418-143852-ladsgroup.json
14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P47084 and previous config saved to /var/cache/conftool/dbconfig/20230418-142346-ladsgroup.json
14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47083 and previous config saved to /var/cache/conftool/dbconfig/20230418-140840-ladsgroup.json
14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1018.eqiad.wmnet
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47082 and previous config saved to /var/cache/conftool/dbconfig/20230418-140626-ladsgroup.json
14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[5-7].eqiad.wmnet
14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase103[03].eqiad.wmnet
14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47081 and previous config saved to /var/cache/conftool/dbconfig/20230418-140602-ladsgroup.json
14:04 sukhe: running authdns-update to repool eqiad after switch maint: T333377
13:57 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-worker1110.eqiad.wmnet
13:57 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1110.eqiad.wmnet
13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw2-d-eqiad
13:55 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for asw2-d-eqiad
13:52 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 270 hosts
13:51 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47080 and previous config saved to /var/cache/conftool/dbconfig/20230418-135056-ladsgroup.json
13:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 270 hosts
13:41 elukey: restart etcdmirror on conf2005 (down due to conf1009 under maintenance)
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47079 and previous config saved to /var/cache/conftool/dbconfig/20230418-133549-ladsgroup.json
13:25 topranks: Rebooting asw2-d-eqiad virtual-chassis (all row D top-of-rack switches) to upgrade JunOS. Row D going down T333377
13:22 xSavitar: RESTBase/Proton deployment complete
13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47078 and previous config saved to /var/cache/conftool/dbconfig/20230418-132042-ladsgroup.json
13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47076 and previous config saved to /var/cache/conftool/dbconfig/20230418-131827-ladsgroup.json
13:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
13:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47075 and previous config saved to /var/cache/conftool/dbconfig/20230418-131738-ladsgroup.json
13:16 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
13:15 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
13:15 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw2-d-eqiad with reason: eqiad row D upgrade
13:15 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on asw2-d-eqiad with reason: eqiad row D upgrade
13:14 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
13:13 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
13:12 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:12 jbond: disable puppet fleet wide T333377
13:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 270 hosts with reason: eqiad row D upgrade
13:10 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
13:06 topranks: disabling ping offload on cr1-eqiad and cr2-eqiad in advance of row D switch upgrade T333377
13:06 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf12u1
13:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 270 hosts with reason: eqiad row D upgrade
13:03 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47074 and previous config saved to /var/cache/conftool/dbconfig/20230418-130231-ladsgroup.json
13:02 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47073 and previous config saved to /var/cache/conftool/dbconfig/20230418-124724-ladsgroup.json
12:40 sukhe: run authdns-update to depool eqiad for switch upgrade
12:39 moritzm: imported puppet 5.5.22-2+deb12u2 for bookworm-wikimedia T330495
12:36 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
12:36 jiji@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47072 and previous config saved to /var/cache/conftool/dbconfig/20230418-123218-ladsgroup.json
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47071 and previous config saved to /var/cache/conftool/dbconfig/20230418-122903-ladsgroup.json
12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47070 and previous config saved to /var/cache/conftool/dbconfig/20230418-122839-ladsgroup.json
12:27 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
12:27 jiji@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
12:26 jiji@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47069 and previous config saved to /var/cache/conftool/dbconfig/20230418-121333-ladsgroup.json
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47068 and previous config saved to /var/cache/conftool/dbconfig/20230418-115827-ladsgroup.json
11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase103[03].eqiad.wmnet
11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[5-7].eqiad.wmnet
11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1018.eqiad.wmnet
11:50 jiji@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
11:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1102.eqiad.wmnet
11:49 jynus@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:49 jynus@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1102.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
11:48 effie: depooling eqiad due to eqiad row D switches upgrade - T333377
11:46 jynus@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1102.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47067 and previous config saved to /var/cache/conftool/dbconfig/20230418-114320-ladsgroup.json
11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47066 and previous config saved to /var/cache/conftool/dbconfig/20230418-114106-ladsgroup.json
11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47065 and previous config saved to /var/cache/conftool/dbconfig/20230418-114042-ladsgroup.json
11:39 jynus@cumin1001: START - Cookbook sre.dns.netbox
11:34 jynus@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1102.eqiad.wmnet
11:32 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-worker1110.eqiad.wmnet
11:30 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1110.eqiad.wmnet
11:27 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1116.eqiad.wmnet
11:27 jynus@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:27 jynus@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1116.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47064 and previous config saved to /var/cache/conftool/dbconfig/20230418-112536-ladsgroup.json
11:24 jynus@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1116.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
11:22 jynus@cumin1001: START - Cookbook sre.dns.netbox
11:22 taavi@deploy2002: Finished scap: Backport for gerrit:909623Hide raw Graph tags (T334895) (duration: 07m 09s)
11:16 jynus@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1116.eqiad.wmnet
11:16 taavi@deploy2002: taavi: Backport for gerrit:909623Hide raw Graph tags (T334895) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:14 taavi@deploy2002: Started scap: Backport for gerrit:909623Hide raw Graph tags (T334895)
11:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:908365|[Growth] Prepare for a Personalized praise config variable change (T334630)]] (duration: 06m 43s)
11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47063 and previous config saved to /var/cache/conftool/dbconfig/20230418-111029-ladsgroup.json
11:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:908365|[Growth] Prepare for a Personalized praise config variable change (T334630)]]
11:00 elukey: puppet cert clean kafka_jumbo-eqiad_broker on puppetmaster1001 - remove old certificate (not used anymore)
10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47062 and previous config saved to /var/cache/conftool/dbconfig/20230418-105523-ladsgroup.json
10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47061 and previous config saved to /var/cache/conftool/dbconfig/20230418-105308-ladsgroup.json
10:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47060 and previous config saved to /var/cache/conftool/dbconfig/20230418-105131-ladsgroup.json
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P47059 and previous config saved to /var/cache/conftool/dbconfig/20230418-103625-ladsgroup.json
10:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P47058 and previous config saved to /var/cache/conftool/dbconfig/20230418-102119-ladsgroup.json
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47057 and previous config saved to /var/cache/conftool/dbconfig/20230418-100612-ladsgroup.json
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47056 and previous config saved to /var/cache/conftool/dbconfig/20230418-100359-ladsgroup.json
10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
08:38 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.5 refs T330211
08:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
08:37 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
08:12 zabe@deploy2002: Finished scap: Backport for gerrit:909603Add separate config for enabling JsonConfig (duration: 07m 43s)
08:08 dcausse: repooling wdqs2011
08:06 zabe@deploy2002: zabe: Backport for gerrit:909603Add separate config for enabling JsonConfig synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
08:04 zabe@deploy2002: Started scap: Backport for gerrit:909603Add separate config for enabling JsonConfig
07:51 cgoubert@deploy2002: Finished scap: Forcing redeplou (duration: 02m 31s)
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1212 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47055 and previous config saved to /var/cache/conftool/dbconfig/20230418-075032-marostegui.json
07:48 cgoubert@deploy2002: Started scap: Forcing redeplou
07:41 zabe@deploy2002: Finished scap: T334895 (duration: 06m 42s)
07:35 zabe@deploy2002: Started scap: T334895
07:30 zabe@deploy2002: Finished scap: T334895 (duration: 06m 37s)
07:24 zabe@deploy2002: Started scap: T334895
07:20 zabe@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.41.0-wmf.4" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.8ZJFnr01rx"' returned non-zero exit status 255. (duration: 00m 00s)
07:20 zabe@deploy2002: Started scap: T334895
07:18 zabe@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.41.0-wmf.4" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.c2xgrltrG8"' returned non-zero exit status 255. (duration: 00m 01s)
07:18 zabe@deploy2002: Started scap: T334895
07:16 joe: added requestctl rule for T332061 in logging mode
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1109.eqiad.wmnet
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1109.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:05 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1109.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:03 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1109.eqiad.wmnet
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2142 to x2 primary T334821', diff saved to https://phabricator.wikimedia.org/P47054 and previous config saved to /var/cache/conftool/dbconfig/20230418-061101-root.json
06:06 marostegui: Starting x2 codfw failover from db2144 to db2142 - T334821
06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T334821
06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16591
06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T334821
06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16591
03:53 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.3 (duration: 02m 08s)
03:51 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.5 refs T330211 (duration: 49m 03s)
03:30 eileen: civicrm upgraded from 0b8e303d to dab8912d
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.5 refs T330211
01:38 eileen: civicrm upgraded from cd0f886d to 0b8e303d
00:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cassandra-dev2001.codfw.wmnet
00:54 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for cassandra-dev2001.codfw.wmnet
00:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cassandra-dev2001.codfw.wmnet with reason: testing systemd unit changes — T327954
00:28 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cassandra-dev2001.codfw.wmnet with reason: testing systemd unit changes — T327954
00:26 eileen: config revision changed from 7da418a4 to f25cb7cc

2023-04-17

22:00 zabe@deploy2002: Finished scap: Backport for gerrit:909277Fix infinite loop for self-redirects with variants conversion (T333050) (duration: 06m 52s)
22:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: T333377 maint
21:59 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: T333377 maint
21:54 zabe@deploy2002: zabe: Backport for gerrit:909277Fix infinite loop for self-redirects with variants conversion (T333050) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:53 zabe@deploy2002: Started scap: Backport for gerrit:909277Fix infinite loop for self-redirects with variants conversion (T333050)
21:45 zabe@deploy2002: Finished scap: Backport for gerrit:909276RC: Handle deleted story (T334829) (duration: 07m 01s)
21:39 zabe@deploy2002: zabe: Backport for gerrit:909276RC: Handle deleted story (T334829) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:38 zabe@deploy2002: Started scap: Backport for gerrit:909276RC: Handle deleted story (T334829)
21:20 sbassett: Deployed updated mitigation for T333140
21:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47053 and previous config saved to /var/cache/conftool/dbconfig/20230417-211909-ladsgroup.json
21:17 inflatador: bking@cumin1001 ban cloudelastic1004 for upcoming switch maintenance T333377
21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47052 and previous config saved to /var/cache/conftool/dbconfig/20230417-210403-ladsgroup.json
20:52 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:908972|[trwikiquote] Add a HD logo for Vector legacy (T334732)]] (duration: 07m 02s)
20:50 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47051 and previous config saved to /var/cache/conftool/dbconfig/20230417-204856-ladsgroup.json
20:48 joal@deploy2002: Started restart [analytics/aqs/deploy@d273fde]: Restarting AQS to pick up new druid datasource
20:46 urbanecm@deploy2002: urbanecm and superpes: Backport for [[gerrit:908972|[trwikiquote] Add a HD logo for Vector legacy (T334732)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:44 urbanecm@deploy2002: Started scap: Backport for [[gerrit:908972|[trwikiquote] Add a HD logo for Vector legacy (T334732)]]
20:35 urbanecm@deploy2002: Finished scap: Backport for gerrit:909274Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), gerrit:909275Mobile editor: Don't try to take over on non-wikitext content (T334799) (duration: 09m 14s)
20:35 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47049 and previous config saved to /var/cache/conftool/dbconfig/20230417-203350-ladsgroup.json
20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47048 and previous config saved to /var/cache/conftool/dbconfig/20230417-203108-ladsgroup.json
20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47047 and previous config saved to /var/cache/conftool/dbconfig/20230417-203056-ladsgroup.json
20:27 urbanecm@deploy2002: urbanecm and matmarex: Backport for gerrit:909274Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), gerrit:909275Mobile editor: Don't try to take over on non-wikitext content (T334799) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:26 urbanecm@deploy2002: Started scap: Backport for gerrit:909274Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), gerrit:909275Mobile editor: Don't try to take over on non-wikitext content (T334799)
20:25 urbanecm@deploy2002: Finished scap: Backport for gerrit:905711Stop using redundant $wmg variables for VisualEditor extension (T119117) (duration: 08m 19s)
20:18 urbanecm@deploy2002: urbanecm and matmarex: Backport for gerrit:905711Stop using redundant $wmg variables for VisualEditor extension (T119117) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:17 urbanecm@deploy2002: Started scap: Backport for gerrit:905711Stop using redundant $wmg variables for VisualEditor extension (T119117)
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47046 and previous config saved to /var/cache/conftool/dbconfig/20230417-201549-ladsgroup.json
20:14 urbanecm@deploy2002: Finished scap: Backport for gerrit:908945ruwiki: Allow sysop to add/remove confirmed group (T334780) (duration: 07m 31s)
20:08 urbanecm@deploy2002: urbanecm and stang: Backport for gerrit:908945ruwiki: Allow sysop to add/remove confirmed group (T334780) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:06 urbanecm@deploy2002: Started scap: Backport for gerrit:908945ruwiki: Allow sysop to add/remove confirmed group (T334780)
20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47045 and previous config saved to /var/cache/conftool/dbconfig/20230417-200043-ladsgroup.json
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47044 and previous config saved to /var/cache/conftool/dbconfig/20230417-194537-ladsgroup.json
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47043 and previous config saved to /var/cache/conftool/dbconfig/20230417-194253-ladsgroup.json
19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47042 and previous config saved to /var/cache/conftool/dbconfig/20230417-194229-ladsgroup.json
19:32 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47041 and previous config saved to /var/cache/conftool/dbconfig/20230417-192723-ladsgroup.json
19:16 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
19:13 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47040 and previous config saved to /var/cache/conftool/dbconfig/20230417-191217-ladsgroup.json
19:00 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47039 and previous config saved to /var/cache/conftool/dbconfig/20230417-185710-ladsgroup.json
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47038 and previous config saved to /var/cache/conftool/dbconfig/20230417-184525-ladsgroup.json
18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47037 and previous config saved to /var/cache/conftool/dbconfig/20230417-184502-ladsgroup.json
18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47036 and previous config saved to /var/cache/conftool/dbconfig/20230417-182956-ladsgroup.json
18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47035 and previous config saved to /var/cache/conftool/dbconfig/20230417-181449-ladsgroup.json
17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47034 and previous config saved to /var/cache/conftool/dbconfig/20230417-175943-ladsgroup.json
17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47033 and previous config saved to /var/cache/conftool/dbconfig/20230417-175700-ladsgroup.json
17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47032 and previous config saved to /var/cache/conftool/dbconfig/20230417-175636-ladsgroup.json
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47031 and previous config saved to /var/cache/conftool/dbconfig/20230417-174130-ladsgroup.json
17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47030 and previous config saved to /var/cache/conftool/dbconfig/20230417-172623-ladsgroup.json
17:26 SandraEbele: restarted turnilo with ‘sudo systemctl restart turnilo’
17:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010']
17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
17:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010']
17:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
17:14 jhancock@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['backup2010']
17:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
17:13 SandraEbele: restarted Oozie page view-druid-daily job 0174450-220913162928808-oozie-oozi-C
17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47029 and previous config saved to /var/cache/conftool/dbconfig/20230417-171117-ladsgroup.json
17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47028 and previous config saved to /var/cache/conftool/dbconfig/20230417-170838-ladsgroup.json
17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47027 and previous config saved to /var/cache/conftool/dbconfig/20230417-170757-ladsgroup.json
17:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010']
17:03 volans: installed spicerack_6.4.2 on cumin1001
17:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@f8dad05]: analytics: deploy Airflow ArchiveOperator should have a number of retries of 0. T332216 (duration: 00m 12s)
16:59 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@f8dad05]: analytics: deploy Airflow ArchiveOperator should have a number of retries of 0. T332216
16:56 SandraEbele: restarted oozie page view-druid-hourly job 0174449-220913162928808-oozie-oozi-C
16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47026 and previous config saved to /var/cache/conftool/dbconfig/20230417-165251-ladsgroup.json
16:49 volans: installed spicerack_6.4.2 on cumin2002
16:46 volans: uploaded spicerack_6.4.2 to apt.wikimedia.org bullseye-wikimedia
16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2010.mgmt.codfw.wmnet with reboot policy FORCED
16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47025 and previous config saved to /var/cache/conftool/dbconfig/20230417-163744-ladsgroup.json
16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47024 and previous config saved to /var/cache/conftool/dbconfig/20230417-162238-ladsgroup.json
16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47023 and previous config saved to /var/cache/conftool/dbconfig/20230417-161955-ladsgroup.json
16:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47022 and previous config saved to /var/cache/conftool/dbconfig/20230417-161931-ladsgroup.json
16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47021 and previous config saved to /var/cache/conftool/dbconfig/20230417-160425-ladsgroup.json
16:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2010.mgmt.codfw.wmnet with reboot policy FORCED
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47020 and previous config saved to /var/cache/conftool/dbconfig/20230417-155654-root.json
15:53 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:909303 Bumping portals to master (T128546) (duration: 05m 30s)
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked backup2010 hosts in codfw - jhancock@cumin2002"
15:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked backup2010 hosts in codfw - jhancock@cumin2002"
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47019 and previous config saved to /var/cache/conftool/dbconfig/20230417-154918-ladsgroup.json
15:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:909303 Bumping portals to master (T128546) (duration: 05m 59s)
15:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47018 and previous config saved to /var/cache/conftool/dbconfig/20230417-154149-root.json
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47017 and previous config saved to /var/cache/conftool/dbconfig/20230417-153412-ladsgroup.json
15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47016 and previous config saved to /var/cache/conftool/dbconfig/20230417-153134-ladsgroup.json
15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47015 and previous config saved to /var/cache/conftool/dbconfig/20230417-152916-ladsgroup.json
15:27 urbanecm@deploy2002: Finished scap: Expose the sfsblock-bypass right so it can be assigned to global groups (T334856; second try) (duration: 06m 22s)
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47014 and previous config saved to /var/cache/conftool/dbconfig/20230417-152644-root.json
15:21 urbanecm@deploy2002: Started scap: Expose the sfsblock-bypass right so it can be assigned to global groups (T334856; second try)
15:20 urbanecm@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (duration: 23m 03s)
15:18 sukhe: run authdns-update and repool eqiad
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47013 and previous config saved to /var/cache/conftool/dbconfig/20230417-151409-ladsgroup.json
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47012 and previous config saved to /var/cache/conftool/dbconfig/20230417-151138-root.json
15:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1020
15:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1020
15:07 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host lvs1020.eqiad.wmnet with OS bullseye
15:07 vgutierrez: rolling restart of HAProxy in the text cluster - T334448
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47011 and previous config saved to /var/cache/conftool/dbconfig/20230417-145902-ladsgroup.json
14:57 urbanecm@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage
14:57 urbanecm@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (duration: 00m 01s)
14:57 urbanecm@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage
14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47010 and previous config saved to /var/cache/conftool/dbconfig/20230417-145633-root.json
14:55 claime: repooled mw1375.eqiad.wmnet
14:54 claime: depooling mw1375.eqiad.wmnet
14:53 ladsgroup@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (T334703) (duration: 13m 39s)
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47009 and previous config saved to /var/cache/conftool/dbconfig/20230417-144356-ladsgroup.json
14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47008 and previous config saved to /var/cache/conftool/dbconfig/20230417-144133-ladsgroup.json
14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47007 and previous config saved to /var/cache/conftool/dbconfig/20230417-144128-root.json
14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47006 and previous config saved to /var/cache/conftool/dbconfig/20230417-144109-ladsgroup.json
14:40 ladsgroup@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage (T334703)
14:31 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid
14:31 claime: repooling parsoid in eqiad
14:31 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
14:31 claime: repooling appserver in eqiad
14:30 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver
14:30 claime: repooling api_appserver in eqiad
14:30 sukhe: running auth-dns update to depool eqiad
14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47005 and previous config saved to /var/cache/conftool/dbconfig/20230417-142623-root.json
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47004 and previous config saved to /var/cache/conftool/dbconfig/20230417-142603-ladsgroup.json
14:25 urbanecm@deploy2002: Finished scap: Backport for gerrit:909267Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856) (duration: 07m 36s)
14:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
14:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
14:19 urbanecm@deploy2002: urbanecm and maurelio: Backport for gerrit:909267Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:17 urbanecm@deploy2002: Started scap: Backport for gerrit:909267Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856)
14:14 elukey: upload amd-k8s-device-plugin deb (1.25.2.3-1) to bullseye-wikimedia - T333009
14:12 claime: Migrated linkrecommandation to mw-api-int - T334060
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47003 and previous config saved to /var/cache/conftool/dbconfig/20230417-141056-ladsgroup.json
14:10 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
14:09 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS bullseye
14:07 claime: Migrating linkrecommandation to mw-api-int - T334060
14:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47002 and previous config saved to /var/cache/conftool/dbconfig/20230417-135550-ladsgroup.json
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47001 and previous config saved to /var/cache/conftool/dbconfig/20230417-135334-ladsgroup.json
13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P47000 and previous config saved to /var/cache/conftool/dbconfig/20230417-135311-ladsgroup.json
13:47 moritzm: installing mariadb-10.3 security updates (Debian packaged version, not the wmf-mariadb packages)
13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.e4 in codfw
13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P46999 and previous config saved to /var/cache/conftool/dbconfig/20230417-133804-ladsgroup.json
13:37 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.e4 in codfw
13:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1132.eqiad.wmnet
13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1132.eqiad.wmnet
13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P46998 and previous config saved to /var/cache/conftool/dbconfig/20230417-132258-ladsgroup.json
13:12 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
13:10 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
13:10 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:08 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:08 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P46997 and previous config saved to /var/cache/conftool/dbconfig/20230417-130751-ladsgroup.json
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P46996 and previous config saved to /var/cache/conftool/dbconfig/20230417-130535-ladsgroup.json
13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46995 and previous config saved to /var/cache/conftool/dbconfig/20230417-130512-ladsgroup.json
12:59 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
12:59 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
12:59 claime: Migrating linkrecommandation staging to mw-api-int - T334060
12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P46994 and previous config saved to /var/cache/conftool/dbconfig/20230417-125006-ladsgroup.json
12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P46993 and previous config saved to /var/cache/conftool/dbconfig/20230417-123500-ladsgroup.json
12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46992 and previous config saved to /var/cache/conftool/dbconfig/20230417-121953-ladsgroup.json
12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46991 and previous config saved to /var/cache/conftool/dbconfig/20230417-121734-ladsgroup.json
12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46990 and previous config saved to /var/cache/conftool/dbconfig/20230417-121710-ladsgroup.json
12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P46989 and previous config saved to /var/cache/conftool/dbconfig/20230417-120204-ladsgroup.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T326669', diff saved to https://phabricator.wikimedia.org/P46987 and previous config saved to /var/cache/conftool/dbconfig/20230417-115847-marostegui.json
11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P46986 and previous config saved to /var/cache/conftool/dbconfig/20230417-114658-ladsgroup.json
11:33 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46985 and previous config saved to /var/cache/conftool/dbconfig/20230417-113152-ladsgroup.json
11:30 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46984 and previous config saved to /var/cache/conftool/dbconfig/20230417-113031-ladsgroup.json
11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46983 and previous config saved to /var/cache/conftool/dbconfig/20230417-113008-ladsgroup.json
11:23 kamila@deploy2002: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1109 from dbctl T334820', diff saved to https://phabricator.wikimedia.org/P46981 and previous config saved to /var/cache/conftool/dbconfig/20230417-111724-marostegui.json
11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P46980 and previous config saved to /var/cache/conftool/dbconfig/20230417-111501-ladsgroup.json
11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS buster
10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P46979 and previous config saved to /var/cache/conftool/dbconfig/20230417-105955-ladsgroup.json
10:59 ladsgroup@deploy2002: Finished scap: Backport for gerrit:908959filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138) (duration: 07m 16s)
10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
10:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.98 in codfw
10:53 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:908959filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
10:51 ladsgroup@deploy2002: Started scap: Backport for gerrit:908959filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138)
10:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
10:50 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.98 in codfw
10:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.98 in eqiad
10:46 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.98 in eqiad
10:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in eqiad
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46978 and previous config saved to /var/cache/conftool/dbconfig/20230417-104449-ladsgroup.json
10:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in eqiad
10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46977 and previous config saved to /var/cache/conftool/dbconfig/20230417-104229-ladsgroup.json
10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46976 and previous config saved to /var/cache/conftool/dbconfig/20230417-104144-ladsgroup.json
10:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P46974 and previous config saved to /var/cache/conftool/dbconfig/20230417-102637-ladsgroup.json
10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P46973 and previous config saved to /var/cache/conftool/dbconfig/20230417-101131-ladsgroup.json
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46972 and previous config saved to /var/cache/conftool/dbconfig/20230417-100003-root.json
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46971 and previous config saved to /var/cache/conftool/dbconfig/20230417-095625-ladsgroup.json
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46970 and previous config saved to /var/cache/conftool/dbconfig/20230417-095404-ladsgroup.json
09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46969 and previous config saved to /var/cache/conftool/dbconfig/20230417-095311-ladsgroup.json
09:48 ladsgroup@deploy2002: Finished scap: Backport for gerrit:905626Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702) (duration: 44m 21s)
09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46968 and previous config saved to /var/cache/conftool/dbconfig/20230417-094459-root.json
09:38 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1116.eqiad.wmnet with reason: T334066
09:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1116.eqiad.wmnet with reason: T334066
09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P46967 and previous config saved to /var/cache/conftool/dbconfig/20230417-093804-ladsgroup.json
09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46966 and previous config saved to /var/cache/conftool/dbconfig/20230417-092954-root.json
09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P46965 and previous config saved to /var/cache/conftool/dbconfig/20230417-092258-ladsgroup.json
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46964 and previous config saved to /var/cache/conftool/dbconfig/20230417-091449-root.json
09:12 ladsgroup@deploy2002: jmm and ladsgroup: Backport for gerrit:905626Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46963 and previous config saved to /var/cache/conftool/dbconfig/20230417-090751-ladsgroup.json
09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46962 and previous config saved to /var/cache/conftool/dbconfig/20230417-090535-ladsgroup.json
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46961 and previous config saved to /var/cache/conftool/dbconfig/20230417-090512-ladsgroup.json
09:04 ladsgroup@deploy2002: Started scap: Backport for gerrit:905626Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702)
09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow6002.drmrs.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
09:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
09:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46960 and previous config saved to /var/cache/conftool/dbconfig/20230417-085944-root.json
08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
08:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
08:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P46959 and previous config saved to /var/cache/conftool/dbconfig/20230417-085623-ladsgroup.json
08:55 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:55 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow6002.drmrs.wmnet
08:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
08:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow6002.drmrs.wmnet
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
08:52 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P46958 and previous config saved to /var/cache/conftool/dbconfig/20230417-085005-ladsgroup.json
08:48 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46957 and previous config saved to /var/cache/conftool/dbconfig/20230417-084439-root.json
08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P46956 and previous config saved to /var/cache/conftool/dbconfig/20230417-084118-ladsgroup.json
08:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:39 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
08:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P46955 and previous config saved to /var/cache/conftool/dbconfig/20230417-083459-ladsgroup.json
08:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:33 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow6002.drmrs.wmnet - jmm@cumin2002"
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46954 and previous config saved to /var/cache/conftool/dbconfig/20230417-082934-root.json
08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P46953 and previous config saved to /var/cache/conftool/dbconfig/20230417-082613-ladsgroup.json
08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46952 and previous config saved to /var/cache/conftool/dbconfig/20230417-081953-ladsgroup.json
08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46951 and previous config saved to /var/cache/conftool/dbconfig/20230417-081732-ladsgroup.json
08:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
08:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P46950 and previous config saved to /var/cache/conftool/dbconfig/20230417-081108-ladsgroup.json
08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1100.eqiad.wmnet
07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1100.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 100%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46948 and previous config saved to /var/cache/conftool/dbconfig/20230417-075818-root.json
07:57 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1100.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:55 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow6002.drmrs.wmnet - jmm@cumin2002"
07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1100.eqiad.wmnet
07:49 vgutierrez: restart haproxy on cp3054 - T334448
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
07:44 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
07:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 75%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46946 and previous config saved to /var/cache/conftool/dbconfig/20230417-074313-root.json
07:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:36 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow6002.drmrs.wmnet
07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
07:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 50%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46945 and previous config saved to /var/cache/conftool/dbconfig/20230417-072809-root.json
07:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 25%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46944 and previous config saved to /var/cache/conftool/dbconfig/20230417-071304-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 10%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46943 and previous config saved to /var/cache/conftool/dbconfig/20230417-065759-root.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T334820', diff saved to https://phabricator.wikimedia.org/P46942 and previous config saved to /var/cache/conftool/dbconfig/20230417-064525-marostegui.json
06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 5%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46941 and previous config saved to /var/cache/conftool/dbconfig/20230417-064254-root.json
06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 4%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46940 and previous config saved to /var/cache/conftool/dbconfig/20230417-062749-root.json
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 3%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46939 and previous config saved to /var/cache/conftool/dbconfig/20230417-061244-root.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 2%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46938 and previous config saved to /var/cache/conftool/dbconfig/20230417-055739-root.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1152 weight', diff saved to https://phabricator.wikimedia.org/P46937 and previous config saved to /var/cache/conftool/dbconfig/20230417-055721-root.json
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1152 to x2 primary T334663', diff saved to https://phabricator.wikimedia.org/P46936 and previous config saved to /var/cache/conftool/dbconfig/20230417-055644-root.json
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 1%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46935 and previous config saved to /var/cache/conftool/dbconfig/20230417-054235-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1214 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46934 and previous config saved to /var/cache/conftool/dbconfig/20230417-054154-marostegui.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1100 from dbctl T329352', diff saved to https://phabricator.wikimedia.org/P46933 and previous config saved to /var/cache/conftool/dbconfig/20230417-053310-marostegui.json
05:32 marostegui: Stop MariaDB on db1112 to clone db1212 - this will generate lag on s3 wiki replicas
05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T326669', diff saved to https://phabricator.wikimedia.org/P46931 and previous config saved to /var/cache/conftool/dbconfig/20230417-051903-marostegui.json
04:48 phedenskog@deploy2002: Finished deploy [performance/navtiming@e21f08f]: (no justification provided) (duration: 00m 06s)
04:48 phedenskog@deploy2002: Started deploy [performance/navtiming@e21f08f]: (no justification provided)

2023-04-16

07:54 vgutierrez: restart haproxy on cp2033 to clear unexpected service restart alerts - T334448
01:49 legoktm: legoktm@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki commonswiki "Commons:Picture of the Year/2021/Help" "Commons:Picture of the Year/Help" "Legoktm" --reason "make non-year specific" --skip-talkpages

2023-04-15

07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46929 and previous config saved to /var/cache/conftool/dbconfig/20230415-071327-ladsgroup.json
06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P46928 and previous config saved to /var/cache/conftool/dbconfig/20230415-065821-ladsgroup.json
06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P46927 and previous config saved to /var/cache/conftool/dbconfig/20230415-064314-ladsgroup.json
06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46926 and previous config saved to /var/cache/conftool/dbconfig/20230415-062808-ladsgroup.json
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46925 and previous config saved to /var/cache/conftool/dbconfig/20230415-062558-ladsgroup.json
06:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46924 and previous config saved to /var/cache/conftool/dbconfig/20230415-062534-ladsgroup.json
06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P46923 and previous config saved to /var/cache/conftool/dbconfig/20230415-061028-ladsgroup.json
05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P46922 and previous config saved to /var/cache/conftool/dbconfig/20230415-055521-ladsgroup.json
05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46921 and previous config saved to /var/cache/conftool/dbconfig/20230415-054015-ladsgroup.json
05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46920 and previous config saved to /var/cache/conftool/dbconfig/20230415-053804-ladsgroup.json
05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
05:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46919 and previous config saved to /var/cache/conftool/dbconfig/20230415-053752-ladsgroup.json
05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P46918 and previous config saved to /var/cache/conftool/dbconfig/20230415-052246-ladsgroup.json
05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P46917 and previous config saved to /var/cache/conftool/dbconfig/20230415-050739-ladsgroup.json
04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46916 and previous config saved to /var/cache/conftool/dbconfig/20230415-045233-ladsgroup.json
04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46915 and previous config saved to /var/cache/conftool/dbconfig/20230415-045023-ladsgroup.json
04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46914 and previous config saved to /var/cache/conftool/dbconfig/20230415-044959-ladsgroup.json
04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P46913 and previous config saved to /var/cache/conftool/dbconfig/20230415-043453-ladsgroup.json
04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P46912 and previous config saved to /var/cache/conftool/dbconfig/20230415-041947-ladsgroup.json
04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46911 and previous config saved to /var/cache/conftool/dbconfig/20230415-040440-ladsgroup.json
04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46910 and previous config saved to /var/cache/conftool/dbconfig/20230415-040230-ladsgroup.json
04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46909 and previous config saved to /var/cache/conftool/dbconfig/20230415-040207-ladsgroup.json
03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P46908 and previous config saved to /var/cache/conftool/dbconfig/20230415-034700-ladsgroup.json
03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P46907 and previous config saved to /var/cache/conftool/dbconfig/20230415-033154-ladsgroup.json
03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46906 and previous config saved to /var/cache/conftool/dbconfig/20230415-031648-ladsgroup.json
03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46905 and previous config saved to /var/cache/conftool/dbconfig/20230415-031437-ladsgroup.json
03:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
03:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
03:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
03:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46904 and previous config saved to /var/cache/conftool/dbconfig/20230415-031356-ladsgroup.json
02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P46903 and previous config saved to /var/cache/conftool/dbconfig/20230415-025850-ladsgroup.json
02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P46902 and previous config saved to /var/cache/conftool/dbconfig/20230415-024344-ladsgroup.json
02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46901 and previous config saved to /var/cache/conftool/dbconfig/20230415-022837-ladsgroup.json
02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46900 and previous config saved to /var/cache/conftool/dbconfig/20230415-022627-ladsgroup.json
02:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46899 and previous config saved to /var/cache/conftool/dbconfig/20230415-022604-ladsgroup.json
02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P46898 and previous config saved to /var/cache/conftool/dbconfig/20230415-021057-ladsgroup.json
01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P46897 and previous config saved to /var/cache/conftool/dbconfig/20230415-015551-ladsgroup.json
01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46896 and previous config saved to /var/cache/conftool/dbconfig/20230415-014045-ladsgroup.json
01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46895 and previous config saved to /var/cache/conftool/dbconfig/20230415-013835-ladsgroup.json
01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46894 and previous config saved to /var/cache/conftool/dbconfig/20230415-013811-ladsgroup.json
01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46893 and previous config saved to /var/cache/conftool/dbconfig/20230415-012753-ladsgroup.json
01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P46892 and previous config saved to /var/cache/conftool/dbconfig/20230415-012305-ladsgroup.json
01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46891 and previous config saved to /var/cache/conftool/dbconfig/20230415-011246-ladsgroup.json
01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P46890 and previous config saved to /var/cache/conftool/dbconfig/20230415-010759-ladsgroup.json
00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46889 and previous config saved to /var/cache/conftool/dbconfig/20230415-005740-ladsgroup.json
00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46888 and previous config saved to /var/cache/conftool/dbconfig/20230415-005252-ladsgroup.json
00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46887 and previous config saved to /var/cache/conftool/dbconfig/20230415-005042-ladsgroup.json
00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46886 and previous config saved to /var/cache/conftool/dbconfig/20230415-005019-ladsgroup.json
00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46885 and previous config saved to /var/cache/conftool/dbconfig/20230415-004233-ladsgroup.json
00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P46884 and previous config saved to /var/cache/conftool/dbconfig/20230415-003512-ladsgroup.json
00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46883 and previous config saved to /var/cache/conftool/dbconfig/20230415-003315-ladsgroup.json
00:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
00:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46882 and previous config saved to /var/cache/conftool/dbconfig/20230415-003251-ladsgroup.json
00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P46881 and previous config saved to /var/cache/conftool/dbconfig/20230415-002006-ladsgroup.json
00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46880 and previous config saved to /var/cache/conftool/dbconfig/20230415-001745-ladsgroup.json
00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46879 and previous config saved to /var/cache/conftool/dbconfig/20230415-000500-ladsgroup.json
00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46878 and previous config saved to /var/cache/conftool/dbconfig/20230415-000249-ladsgroup.json
00:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46877 and previous config saved to /var/cache/conftool/dbconfig/20230415-000239-ladsgroup.json
00:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46876 and previous config saved to /var/cache/conftool/dbconfig/20230415-000226-ladsgroup.json

2023-04-14

23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46875 and previous config saved to /var/cache/conftool/dbconfig/20230414-234732-ladsgroup.json
23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P46874 and previous config saved to /var/cache/conftool/dbconfig/20230414-234720-ladsgroup.json
23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46873 and previous config saved to /var/cache/conftool/dbconfig/20230414-234516-ladsgroup.json
23:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46872 and previous config saved to /var/cache/conftool/dbconfig/20230414-234453-ladsgroup.json
23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P46871 and previous config saved to /var/cache/conftool/dbconfig/20230414-233213-ladsgroup.json
23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46870 and previous config saved to /var/cache/conftool/dbconfig/20230414-232946-ladsgroup.json
23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46869 and previous config saved to /var/cache/conftool/dbconfig/20230414-231707-ladsgroup.json
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46868 and previous config saved to /var/cache/conftool/dbconfig/20230414-231557-ladsgroup.json
23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46867 and previous config saved to /var/cache/conftool/dbconfig/20230414-231440-ladsgroup.json
23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46866 and previous config saved to /var/cache/conftool/dbconfig/20230414-231440-ladsgroup.json
22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P46865 and previous config saved to /var/cache/conftool/dbconfig/20230414-225934-ladsgroup.json
22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46864 and previous config saved to /var/cache/conftool/dbconfig/20230414-225934-ladsgroup.json
22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46863 and previous config saved to /var/cache/conftool/dbconfig/20230414-225717-ladsgroup.json
22:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46862 and previous config saved to /var/cache/conftool/dbconfig/20230414-225654-ladsgroup.json
22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P46861 and previous config saved to /var/cache/conftool/dbconfig/20230414-224428-ladsgroup.json
22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46860 and previous config saved to /var/cache/conftool/dbconfig/20230414-224147-ladsgroup.json
22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46859 and previous config saved to /var/cache/conftool/dbconfig/20230414-222921-ladsgroup.json
22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46858 and previous config saved to /var/cache/conftool/dbconfig/20230414-222814-ladsgroup.json
22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
22:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46857 and previous config saved to /var/cache/conftool/dbconfig/20230414-222750-ladsgroup.json
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46856 and previous config saved to /var/cache/conftool/dbconfig/20230414-222641-ladsgroup.json
22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P46855 and previous config saved to /var/cache/conftool/dbconfig/20230414-221244-ladsgroup.json
22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46854 and previous config saved to /var/cache/conftool/dbconfig/20230414-221134-ladsgroup.json
22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46853 and previous config saved to /var/cache/conftool/dbconfig/20230414-220918-ladsgroup.json
22:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46852 and previous config saved to /var/cache/conftool/dbconfig/20230414-220838-ladsgroup.json
21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P46851 and previous config saved to /var/cache/conftool/dbconfig/20230414-215738-ladsgroup.json
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46850 and previous config saved to /var/cache/conftool/dbconfig/20230414-215331-ladsgroup.json
21:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46849 and previous config saved to /var/cache/conftool/dbconfig/20230414-214231-ladsgroup.json
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46848 and previous config saved to /var/cache/conftool/dbconfig/20230414-214123-ladsgroup.json
21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46847 and previous config saved to /var/cache/conftool/dbconfig/20230414-214100-ladsgroup.json
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46846 and previous config saved to /var/cache/conftool/dbconfig/20230414-213825-ladsgroup.json
21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P46845 and previous config saved to /var/cache/conftool/dbconfig/20230414-212554-ladsgroup.json
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46844 and previous config saved to /var/cache/conftool/dbconfig/20230414-212319-ladsgroup.json
21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46843 and previous config saved to /var/cache/conftool/dbconfig/20230414-212102-ladsgroup.json
21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46842 and previous config saved to /var/cache/conftool/dbconfig/20230414-212039-ladsgroup.json
21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P46841 and previous config saved to /var/cache/conftool/dbconfig/20230414-211048-ladsgroup.json
21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46840 and previous config saved to /var/cache/conftool/dbconfig/20230414-210533-ladsgroup.json
20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46838 and previous config saved to /var/cache/conftool/dbconfig/20230414-205541-ladsgroup.json
20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46837 and previous config saved to /var/cache/conftool/dbconfig/20230414-205333-ladsgroup.json
20:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
20:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46836 and previous config saved to /var/cache/conftool/dbconfig/20230414-205310-ladsgroup.json
20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46835 and previous config saved to /var/cache/conftool/dbconfig/20230414-205026-ladsgroup.json
20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P46834 and previous config saved to /var/cache/conftool/dbconfig/20230414-203804-ladsgroup.json
20:36 papaul: rebooting labstore1004 for mgmt interface issue
20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46833 and previous config saved to /var/cache/conftool/dbconfig/20230414-203520-ladsgroup.json
20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46832 and previous config saved to /var/cache/conftool/dbconfig/20230414-203304-ladsgroup.json
20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46831 and previous config saved to /var/cache/conftool/dbconfig/20230414-203241-ladsgroup.json
20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T333332)', diff saved to https://phabricator.wikimedia.org/P46830 and previous config saved to /var/cache/conftool/dbconfig/20230414-203220-ladsgroup.json
20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46829 and previous config saved to /var/cache/conftool/dbconfig/20230414-203156-ladsgroup.json
20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P46828 and previous config saved to /var/cache/conftool/dbconfig/20230414-202257-ladsgroup.json
20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46827 and previous config saved to /var/cache/conftool/dbconfig/20230414-201734-ladsgroup.json
20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P46826 and previous config saved to /var/cache/conftool/dbconfig/20230414-201650-ladsgroup.json
20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46825 and previous config saved to /var/cache/conftool/dbconfig/20230414-200751-ladsgroup.json
20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46824 and previous config saved to /var/cache/conftool/dbconfig/20230414-200543-ladsgroup.json
20:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1193.eqiad.wmnet with reason: Maintenance
20:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1193.eqiad.wmnet with reason: Maintenance
20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46823 and previous config saved to /var/cache/conftool/dbconfig/20230414-200520-ladsgroup.json
20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46822 and previous config saved to /var/cache/conftool/dbconfig/20230414-200226-ladsgroup.json
20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P46821 and previous config saved to /var/cache/conftool/dbconfig/20230414-200144-ladsgroup.json
19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P46820 and previous config saved to /var/cache/conftool/dbconfig/20230414-195014-ladsgroup.json
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46819 and previous config saved to /var/cache/conftool/dbconfig/20230414-194720-ladsgroup.json
19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46818 and previous config saved to /var/cache/conftool/dbconfig/20230414-194637-ladsgroup.json
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46817 and previous config saved to /var/cache/conftool/dbconfig/20230414-194504-ladsgroup.json
19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46816 and previous config saved to /var/cache/conftool/dbconfig/20230414-194441-ladsgroup.json
19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46815 and previous config saved to /var/cache/conftool/dbconfig/20230414-194424-ladsgroup.json
19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46814 and previous config saved to /var/cache/conftool/dbconfig/20230414-194401-ladsgroup.json
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P46813 and previous config saved to /var/cache/conftool/dbconfig/20230414-193507-ladsgroup.json
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46812 and previous config saved to /var/cache/conftool/dbconfig/20230414-192934-ladsgroup.json
19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P46811 and previous config saved to /var/cache/conftool/dbconfig/20230414-192855-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46810 and previous config saved to /var/cache/conftool/dbconfig/20230414-192001-ladsgroup.json
19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46809 and previous config saved to /var/cache/conftool/dbconfig/20230414-191854-ladsgroup.json
19:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
19:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46808 and previous config saved to /var/cache/conftool/dbconfig/20230414-191831-ladsgroup.json
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46807 and previous config saved to /var/cache/conftool/dbconfig/20230414-191428-ladsgroup.json
19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P46806 and previous config saved to /var/cache/conftool/dbconfig/20230414-191348-ladsgroup.json
19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P46805 and previous config saved to /var/cache/conftool/dbconfig/20230414-190324-ladsgroup.json
18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46804 and previous config saved to /var/cache/conftool/dbconfig/20230414-185921-ladsgroup.json
18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46803 and previous config saved to /var/cache/conftool/dbconfig/20230414-185842-ladsgroup.json
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46802 and previous config saved to /var/cache/conftool/dbconfig/20230414-185705-ladsgroup.json
18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46801 and previous config saved to /var/cache/conftool/dbconfig/20230414-185642-ladsgroup.json
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46800 and previous config saved to /var/cache/conftool/dbconfig/20230414-185630-ladsgroup.json
18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46799 and previous config saved to /var/cache/conftool/dbconfig/20230414-185545-ladsgroup.json
18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P46798 and previous config saved to /var/cache/conftool/dbconfig/20230414-184818-ladsgroup.json
18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46797 and previous config saved to /var/cache/conftool/dbconfig/20230414-184135-ladsgroup.json
18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P46796 and previous config saved to /var/cache/conftool/dbconfig/20230414-184038-ladsgroup.json
18:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
18:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46795 and previous config saved to /var/cache/conftool/dbconfig/20230414-183311-ladsgroup.json
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46794 and previous config saved to /var/cache/conftool/dbconfig/20230414-182629-ladsgroup.json
18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P46793 and previous config saved to /var/cache/conftool/dbconfig/20230414-182532-ladsgroup.json
18:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:17 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46792 and previous config saved to /var/cache/conftool/dbconfig/20230414-181123-ladsgroup.json
18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46791 and previous config saved to /var/cache/conftool/dbconfig/20230414-181025-ladsgroup.json
18:09 mutante: doc1002, doc2001 - manually remove php7.3-fpm restart timers to fix T334735 and alerting - T322357 - systemctl stop wmf_auto_restart_php7.3-fpm.timer; systemctl stop wmf_auto_restart_php7.3-fpm.service; rm /lib/systemd/system/wmf_auto_restart_php7.3-fpm.*
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46790 and previous config saved to /var/cache/conftool/dbconfig/20230414-180812-ladsgroup.json
18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46789 and previous config saved to /var/cache/conftool/dbconfig/20230414-180748-ladsgroup.json
18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46788 and previous config saved to /var/cache/conftool/dbconfig/20230414-180606-ladsgroup.json
18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46787 and previous config saved to /var/cache/conftool/dbconfig/20230414-180430-ladsgroup.json
18:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:03 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bullseye
17:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS bullseye
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P46786 and previous config saved to /var/cache/conftool/dbconfig/20230414-175242-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46785 and previous config saved to /var/cache/conftool/dbconfig/20230414-174924-ladsgroup.json
17:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet
17:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46784 and previous config saved to /var/cache/conftool/dbconfig/20230414-174356-ladsgroup.json
17:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46783 and previous config saved to /var/cache/conftool/dbconfig/20230414-174333-ladsgroup.json
17:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
17:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
17:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P46782 and previous config saved to /var/cache/conftool/dbconfig/20230414-173734-ladsgroup.json
17:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46781 and previous config saved to /var/cache/conftool/dbconfig/20230414-173418-ladsgroup.json
17:29 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P46780 and previous config saved to /var/cache/conftool/dbconfig/20230414-172826-ladsgroup.json
17:27 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirtlocal1001.eqiad.wmnet
17:25 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46779 and previous config saved to /var/cache/conftool/dbconfig/20230414-172229-ladsgroup.json
17:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46778 and previous config saved to /var/cache/conftool/dbconfig/20230414-172016-ladsgroup.json
17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
17:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46777 and previous config saved to /var/cache/conftool/dbconfig/20230414-171953-ladsgroup.json
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46776 and previous config saved to /var/cache/conftool/dbconfig/20230414-171911-ladsgroup.json
17:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bullseye
17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46775 and previous config saved to /var/cache/conftool/dbconfig/20230414-171702-ladsgroup.json
17:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
17:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46774 and previous config saved to /var/cache/conftool/dbconfig/20230414-171638-ladsgroup.json
17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS bullseye
17:15 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P46773 and previous config saved to /var/cache/conftool/dbconfig/20230414-171320-ladsgroup.json
17:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:10 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P46772 and previous config saved to /var/cache/conftool/dbconfig/20230414-170447-ladsgroup.json
17:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
17:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
17:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
17:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46771 and previous config saved to /var/cache/conftool/dbconfig/20230414-170133-ladsgroup.json
17:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46770 and previous config saved to /var/cache/conftool/dbconfig/20230414-165814-ladsgroup.json
16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P46769 and previous config saved to /var/cache/conftool/dbconfig/20230414-164940-ladsgroup.json
16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bullseye
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46768 and previous config saved to /var/cache/conftool/dbconfig/20230414-164627-ladsgroup.json
16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46767 and previous config saved to /var/cache/conftool/dbconfig/20230414-163434-ladsgroup.json
16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46766 and previous config saved to /var/cache/conftool/dbconfig/20230414-163221-ladsgroup.json
16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46765 and previous config saved to /var/cache/conftool/dbconfig/20230414-163120-ladsgroup.json
16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46764 and previous config saved to /var/cache/conftool/dbconfig/20230414-163110-ladsgroup.json
16:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46763 and previous config saved to /var/cache/conftool/dbconfig/20230414-162911-ladsgroup.json
16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46762 and previous config saved to /var/cache/conftool/dbconfig/20230414-162848-ladsgroup.json
16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P46761 and previous config saved to /var/cache/conftool/dbconfig/20230414-161604-ladsgroup.json
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46760 and previous config saved to /var/cache/conftool/dbconfig/20230414-161341-ladsgroup.json
16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bullseye
16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P46759 and previous config saved to /var/cache/conftool/dbconfig/20230414-160058-ladsgroup.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46758 and previous config saved to /var/cache/conftool/dbconfig/20230414-155835-ladsgroup.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46757 and previous config saved to /var/cache/conftool/dbconfig/20230414-155758-ladsgroup.json
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46756 and previous config saved to /var/cache/conftool/dbconfig/20230414-155735-ladsgroup.json
15:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
15:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46755 and previous config saved to /var/cache/conftool/dbconfig/20230414-154551-ladsgroup.json
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46754 and previous config saved to /var/cache/conftool/dbconfig/20230414-154339-ladsgroup.json
15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46753 and previous config saved to /var/cache/conftool/dbconfig/20230414-154329-ladsgroup.json
15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46752 and previous config saved to /var/cache/conftool/dbconfig/20230414-154316-ladsgroup.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P46751 and previous config saved to /var/cache/conftool/dbconfig/20230414-154228-ladsgroup.json
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46750 and previous config saved to /var/cache/conftool/dbconfig/20230414-154119-ladsgroup.json
15:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46749 and previous config saved to /var/cache/conftool/dbconfig/20230414-154056-ladsgroup.json
15:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P46748 and previous config saved to /var/cache/conftool/dbconfig/20230414-152809-ladsgroup.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P46747 and previous config saved to /var/cache/conftool/dbconfig/20230414-152722-ladsgroup.json
15:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46746 and previous config saved to /var/cache/conftool/dbconfig/20230414-152550-ladsgroup.json
15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P46745 and previous config saved to /var/cache/conftool/dbconfig/20230414-151303-ladsgroup.json
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46744 and previous config saved to /var/cache/conftool/dbconfig/20230414-151216-ladsgroup.json
15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46743 and previous config saved to /var/cache/conftool/dbconfig/20230414-151108-ladsgroup.json
15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46742 and previous config saved to /var/cache/conftool/dbconfig/20230414-151043-ladsgroup.json
15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46741 and previous config saved to /var/cache/conftool/dbconfig/20230414-151037-ladsgroup.json
15:04 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46740 and previous config saved to /var/cache/conftool/dbconfig/20230414-145756-ladsgroup.json
14:55 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46739 and previous config saved to /var/cache/conftool/dbconfig/20230414-145544-ladsgroup.json
14:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46738 and previous config saved to /var/cache/conftool/dbconfig/20230414-145537-ladsgroup.json
14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P46737 and previous config saved to /var/cache/conftool/dbconfig/20230414-145531-ladsgroup.json
14:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46736 and previous config saved to /var/cache/conftool/dbconfig/20230414-145521-ladsgroup.json
14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46735 and previous config saved to /var/cache/conftool/dbconfig/20230414-145327-ladsgroup.json
14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46734 and previous config saved to /var/cache/conftool/dbconfig/20230414-145245-ladsgroup.json
14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pybal-test2002.codfw.wmnet
14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pybal-test2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:48 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pybal-test2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:45 sukhe@cumin2002: START - Cookbook sre.dns.netbox
14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P46733 and previous config saved to /var/cache/conftool/dbconfig/20230414-144024-ladsgroup.json
14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P46732 and previous config saved to /var/cache/conftool/dbconfig/20230414-144014-ladsgroup.json
14:38 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mngmt dns fundrasing - jclark@cumin1001"
14:38 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts pybal-test2002.codfw.wmnet
14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pybal-test2001.codfw.wmnet
14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46731 and previous config saved to /var/cache/conftool/dbconfig/20230414-143738-ladsgroup.json
14:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mngmt dns fundrasing - jclark@cumin1001"
14:36 sukhe@cumin2002: START - Cookbook sre.dns.netbox
14:35 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts pybal-test2001.codfw.wmnet
14:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
14:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
14:29 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
14:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46730 and previous config saved to /var/cache/conftool/dbconfig/20230414-142518-ladsgroup.json
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P46729 and previous config saved to /var/cache/conftool/dbconfig/20230414-142508-ladsgroup.json
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46728 and previous config saved to /var/cache/conftool/dbconfig/20230414-142232-ladsgroup.json
14:21 claime: rebooting list1001 for cpu bump
14:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
14:11 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46727 and previous config saved to /var/cache/conftool/dbconfig/20230414-141002-ladsgroup.json
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46726 and previous config saved to /var/cache/conftool/dbconfig/20230414-140749-ladsgroup.json
14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
14:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46725 and previous config saved to /var/cache/conftool/dbconfig/20230414-140725-ladsgroup.json
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46724 and previous config saved to /var/cache/conftool/dbconfig/20230414-140616-ladsgroup.json
14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46723 and previous config saved to /var/cache/conftool/dbconfig/20230414-140553-ladsgroup.json
14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46722 and previous config saved to /var/cache/conftool/dbconfig/20230414-140401-ladsgroup.json
14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1116.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1116.eqiad.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46721 and previous config saved to /var/cache/conftool/dbconfig/20230414-140258-ladsgroup.json
13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P46720 and previous config saved to /var/cache/conftool/dbconfig/20230414-135220-ladsgroup.json
13:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46719 and previous config saved to /var/cache/conftool/dbconfig/20230414-135047-ladsgroup.json
13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P46718 and previous config saved to /var/cache/conftool/dbconfig/20230414-134751-ladsgroup.json
13:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS buster
13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P46717 and previous config saved to /var/cache/conftool/dbconfig/20230414-133714-ladsgroup.json
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46716 and previous config saved to /var/cache/conftool/dbconfig/20230414-133540-ladsgroup.json
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P46715 and previous config saved to /var/cache/conftool/dbconfig/20230414-133245-ladsgroup.json
13:31 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T333332)', diff saved to https://phabricator.wikimedia.org/P46714 and previous config saved to /var/cache/conftool/dbconfig/20230414-132208-ladsgroup.json
13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46713 and previous config saved to /var/cache/conftool/dbconfig/20230414-132034-ladsgroup.json
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T333332)', diff saved to https://phabricator.wikimedia.org/P46712 and previous config saved to /var/cache/conftool/dbconfig/20230414-131956-ladsgroup.json
13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46711 and previous config saved to /var/cache/conftool/dbconfig/20230414-131932-ladsgroup.json
13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46710 and previous config saved to /var/cache/conftool/dbconfig/20230414-131824-ladsgroup.json
13:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46709 and previous config saved to /var/cache/conftool/dbconfig/20230414-131739-ladsgroup.json
13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46708 and previous config saved to /var/cache/conftool/dbconfig/20230414-131631-ladsgroup.json
13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1114.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1114.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46707 and previous config saved to /var/cache/conftool/dbconfig/20230414-131607-ladsgroup.json
13:11 ottomata: granting IdempotentWrite on kafka jumbo-eqiad cluster to User:ANONYNOUS - this will allow for user of newer kafka producers that have enabled transactional writes by default. `kafka acls --add --allow-principal User:ANONYMOUS --cluster --operation IdempotentWrite`
13:07 ottomata: creating User:ANONYMOUS ACLs on kafka-test cluster https://wikitech.wikimedia.org/wiki/Kafka/Administration#Kafka_ACLs
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P46706 and previous config saved to /var/cache/conftool/dbconfig/20230414-130426-ladsgroup.json
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46705 and previous config saved to /var/cache/conftool/dbconfig/20230414-130234-ladsgroup.json
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P46704 and previous config saved to /var/cache/conftool/dbconfig/20230414-130101-ladsgroup.json
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P46703 and previous config saved to /var/cache/conftool/dbconfig/20230414-124920-ladsgroup.json
12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46702 and previous config saved to /var/cache/conftool/dbconfig/20230414-124727-ladsgroup.json
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P46701 and previous config saved to /var/cache/conftool/dbconfig/20230414-124553-ladsgroup.json
12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46700 and previous config saved to /var/cache/conftool/dbconfig/20230414-123413-ladsgroup.json
12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46699 and previous config saved to /var/cache/conftool/dbconfig/20230414-123221-ladsgroup.json
12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46698 and previous config saved to /var/cache/conftool/dbconfig/20230414-123201-ladsgroup.json
12:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46697 and previous config saved to /var/cache/conftool/dbconfig/20230414-123138-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46696 and previous config saved to /var/cache/conftool/dbconfig/20230414-123047-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46695 and previous config saved to /var/cache/conftool/dbconfig/20230414-123011-ladsgroup.json
12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46694 and previous config saved to /var/cache/conftool/dbconfig/20230414-122948-ladsgroup.json
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46693 and previous config saved to /var/cache/conftool/dbconfig/20230414-122939-ladsgroup.json
12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1111.eqiad.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1111.eqiad.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46692 and previous config saved to /var/cache/conftool/dbconfig/20230414-122915-ladsgroup.json
12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P46691 and previous config saved to /var/cache/conftool/dbconfig/20230414-121632-ladsgroup.json
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46690 and previous config saved to /var/cache/conftool/dbconfig/20230414-121442-ladsgroup.json
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P46689 and previous config saved to /var/cache/conftool/dbconfig/20230414-121409-ladsgroup.json
12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P46688 and previous config saved to /var/cache/conftool/dbconfig/20230414-120125-ladsgroup.json
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46687 and previous config saved to /var/cache/conftool/dbconfig/20230414-115935-ladsgroup.json
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P46686 and previous config saved to /var/cache/conftool/dbconfig/20230414-115903-ladsgroup.json
11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46685 and previous config saved to /var/cache/conftool/dbconfig/20230414-114619-ladsgroup.json
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46684 and previous config saved to /var/cache/conftool/dbconfig/20230414-114429-ladsgroup.json
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46683 and previous config saved to /var/cache/conftool/dbconfig/20230414-114407-ladsgroup.json
11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46682 and previous config saved to /var/cache/conftool/dbconfig/20230414-114356-ladsgroup.json
11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46681 and previous config saved to /var/cache/conftool/dbconfig/20230414-114219-ladsgroup.json
11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46680 and previous config saved to /var/cache/conftool/dbconfig/20230414-114148-ladsgroup.json
11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1109.eqiad.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1109.eqiad.wmnet with reason: Maintenance
11:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
10:49 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
10:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1120.eqiad.wmnet
10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1120.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
10:39 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1120.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
10:37 marostegui@cumin1001: START - Cookbook sre.dns.netbox
10:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1120.eqiad.wmnet
10:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
10:08 kamila@deploy2002: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
09:53 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2.*.codfw.wmnet,cluster=api_appserver
09:53 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2.*.codfw.wmnet,cluster=appserver
09:45 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
09:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
09:21 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
09:16 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2002.codfw.wmnet with reason: systemd package upgrade
09:16 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2002.codfw.wmnet with reason: systemd package upgrade
08:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
08:21 arturo: aborrero@apt2001:~ $ sudo -i reprepro --noskipold --component thirdparty/kubeadm-k8s-1-23 update buster-wikimedia (T298005)
07:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
07:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 T329352', diff saved to https://phabricator.wikimedia.org/P46679 and previous config saved to /var/cache/conftool/dbconfig/20230414-062553-marostegui.json
06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1107.eqiad.wmnet
06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1107.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:08 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1107.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:06 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1107.eqiad.wmnet
04:04 ejegg: SmashPig upgraded from 24d700f4 to db9fa965
01:37 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 11s)
01:37 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
01:07 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
01:07 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
00:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
00:04 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye

2023-04-13

23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
23:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
23:16 ejegg: civicrm upgraded from 2d5ede8d to cd0f886d
22:00 ryankemper: T333656 `ryankemper@dns1001:~$ sudo -i authdns-update` after merge of https://gerrit.wikimedia.org/r/905754 => `OK - authdns-update successful on all nodes!`
21:38 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
21:37 SandraEbele: Successfully Deployed analytics refinery using scap, then deployed onto hdfs.
21:28 mutante: https://query-preview.wikidata.org has been deactivated at ATS layer - T333656
21:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
21:25 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
21:10 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
21:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
21:02 mutante: doc1002 (doc.wikimedia.org) - switching from PHP 7.3 to 7.4 - systemctl stop php7.3-fpm, restart php7.4-fpm, apt-get remove --purge php7.3*, systemctl restart apache2. - all tests still working (on deployment server: httpbb --hosts doc1002.eqiad.wmnet /srv/deployment/httpbb-tests/doc/test_doc.yaml) T322357 T319477
21:01 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1002.eqiad.wmnet with reason: maintenance
20:55 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on doc1002.eqiad.wmnet with reason: maintenance
20:55 urbanecm@deploy2002: Finished scap: Backport for gerrit:907743Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), gerrit:905751Stop using redundant $wmg variable for MobileFrontend extension (T119117) (duration: 06m 26s)
20:50 urbanecm@deploy2002: urbanecm and matmarex: Backport for gerrit:907743Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), gerrit:905751Stop using redundant $wmg variable for MobileFrontend extension (T119117) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:48 urbanecm@deploy2002: Started scap: Backport for gerrit:907743Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), gerrit:905751Stop using redundant $wmg variable for MobileFrontend extension (T119117)
20:46 mutante: doc2001 - systemctl stop php7.3-fpm; systemctl restart php7.4-fpm - needed because after gerrit:901612 we had BOTH PHP versions, 7.3 and 7.4 running their own php-fpm process, also packages for both versions are installed, so also manual package removal needed - apt-get remove php7.3* T322357 T319477
20:38 urbanecm@deploy2002: Finished scap: Backport for gerrit:908614enwiki: Remove userrights from `founder` (T334692) (duration: 05m 55s)
20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
20:34 urbanecm@deploy2002: urbanecm: Backport for gerrit:908614enwiki: Remove userrights from `founder` (T334692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:32 urbanecm@deploy2002: Started scap: Backport for gerrit:908614enwiki: Remove userrights from `founder` (T334692)
20:32 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:908618|[wikitech] Add a logo and a wordmark for Vector 2022 (T334666)]] (duration: 05m 41s)
20:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
20:27 urbanecm@deploy2002: superpes and urbanecm: Backport for [[gerrit:908618|[wikitech] Add a logo and a wordmark for Vector 2022 (T334666)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:27 mutante: doc2001 - switching PHP version from 7.3 to 7.4 for T322357
20:26 urbanecm@deploy2002: Started scap: Backport for [[gerrit:908618|[wikitech] Add a logo and a wordmark for Vector 2022 (T334666)]]
20:25 urbanecm@deploy2002: Finished scap: Backport for gerrit:908607Enable mobile page tabs for everyone in ruwiki (T334395) (duration: 06m 49s)
20:20 urbanecm@deploy2002: urbanecm and matmarex: Backport for gerrit:908607Enable mobile page tabs for everyone in ruwiki (T334395) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:19 urbanecm@deploy2002: Started scap: Backport for gerrit:908607Enable mobile page tabs for everyone in ruwiki (T334395)
20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
20:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.3 refs T330210
20:14 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
19:29 sukhe: restart pybal on lvs2009 to pick up bgp-med change and pool
19:25 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2009
19:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:25 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2009
19:25 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2009
19:25 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2009
19:18 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
19:03 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.3 refs T330210
18:59 bblack: lvs1020: restart pybal for experiment...
18:58 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
18:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:56 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
18:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2009.codfw.wmnet with OS bullseye
18:46 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
18:44 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:44 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: host reimage
18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: host reimage
18:34 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
18:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
18:23 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
18:23 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2009.codfw.wmnet with OS bullseye
18:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:57 brett: Disable Puppet/PyBal on lvs2009 in preparation for reimaging - T321309
17:55 brett: restarting pybal on lvs2008 to pick up bgp-med change
17:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: T334057
17:48 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: T334057
17:46 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
17:46 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
17:37 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
17:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
17:37 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
17:36 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
17:31 ejegg: payments-wiki upgraded from 4dcba0a9 to c01a32c4
17:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2008.codfw.wmnet with OS bullseye
17:28 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:28 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:28 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:28 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:27 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:27 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2008.codfw.wmnet with reason: host reimage
17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2008.codfw.wmnet with reason: host reimage
16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2008.codfw.wmnet with OS bullseye
16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
16:31 sukhe: sudo cumin -b1 -s30 'A:cp-text' 'ats-backend-restart': T332650
16:28 jhancock@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be2067']
16:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2067']
16:27 sukhe: enable puppet on A:cp-text to merge CR 907937
16:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
16:21 sukhe: disable puppet on A:cp-text to merge CR 907937
16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
16:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
16:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
16:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
15:51 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac] (hadoop-test): Update druid pageview hourly and daily tables TEST [analytics/refinery@4e8f1ac] (duration: 01m 26s)
15:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
15:51 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:50 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac] (hadoop-test): Update druid pageview hourly and daily tables TEST [analytics/refinery@4e8f1ac]
15:49 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:49 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac] (thin): Update druid pageview hourly and daily tables THIN [analytics/refinery@4e8f1ac] (duration: 00m 08s)
15:49 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac] (thin): Update druid pageview hourly and daily tables THIN [analytics/refinery@4e8f1ac]
15:49 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
15:48 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:47 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac]: Update druid pageview hourly and daily tables [analytics/refinery@4e8f1ac] (duration: 06m 24s)
15:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
15:47 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:46 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:46 brett: Disable Puppet/PyBal on lvs2008 in preparation for reimaging - T321309
15:44 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
15:42 SandraEbele: paused Oozie pageview-druid-hourly job.
15:41 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac]: Update druid pageview hourly and daily tables [analytics/refinery@4e8f1ac]
15:36 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2007
15:36 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2007
15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
15:31 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
15:30 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
15:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
15:29 SandraEbele: deploying analytics refinery-update pageview druid table
15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
15:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:24 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:24 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:23 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
15:22 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:22 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:19 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:19 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:17 claime: cxserver migrated to mw-api-int on kubernetes, take three - T334204
15:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
15:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
15:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
15:13 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:13 moritzm: remove runc packages installed on mw1349-mw1436, these were once used for a load test with dragonfly and are no longer needed
15:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
15:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
15:10 claime: Migrating cxserver to mw-api-int on kubernetes, take three - T334204
15:10 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
15:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
15:07 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
15:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
15:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
15:05 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
15:04 moritzm: installing unbound security updates on buster
15:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
15:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
15:00 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
14:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
14:41 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
14:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
14:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
14:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
14:26 sukhe: restart pybal on lvs2007 to pick up bgp-med change CR 908552
14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:20 moritzm: installing mariadb-10.3 security updates (as shipped in Debian, not the wmf-mariadb packages)
14:19 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
14:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:06 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:06 kamila@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:05 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
14:04 vgutierrez: rolling restart of HAProxy on A:cp-text - T334448
14:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:54 sukhe: [puppetmaster] sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080; failing PCC for recently reimaged node
13:45 mbsantos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
13:45 mbsantos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
13:44 andrew@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirtlocal1003']
13:44 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
13:43 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
13:43 mbsantos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
13:42 mbsantos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46674 and previous config saved to /var/cache/conftool/dbconfig/20230413-134030-root.json
13:38 mbsantos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:37 mbsantos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
13:31 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2007
13:30 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2007
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46673 and previous config saved to /var/cache/conftool/dbconfig/20230413-132525-root.json
13:23 vgutierrez: restarting haproxy in cp5022 - T334448
13:19 jgiannelos@deploy2002: Finished deploy [restbase/deploy@a08f56d]: (no justification provided) (duration: 17m 02s)
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46672 and previous config saved to /var/cache/conftool/dbconfig/20230413-131021-root.json
13:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:02 jgiannelos@deploy2002: Started deploy [restbase/deploy@a08f56d]: (no justification provided)
12:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:56 claime: Migrating cxserver to mw-api-int on kubernetes, take two - T334204
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46671 and previous config saved to /var/cache/conftool/dbconfig/20230413-125516-root.json
12:49 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46670 and previous config saved to /var/cache/conftool/dbconfig/20230413-124011-root.json
12:38 moritzm: installing systemd security updates on buster
12:33 moritzm: installing Django security updates
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46669 and previous config saved to /var/cache/conftool/dbconfig/20230413-122506-root.json
12:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
12:21 moritzm: remove imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org (obsoleted by 8:6.9.10.23+dfsg-2.1+deb10u4 from the Debian archive) T328901
12:15 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
12:11 moritzm: installing imagemagick security updates for buster T328901
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46668 and previous config saved to /var/cache/conftool/dbconfig/20230413-121001-root.json
11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46667 and previous config saved to /var/cache/conftool/dbconfig/20230413-115456-root.json
11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46666 and previous config saved to /var/cache/conftool/dbconfig/20230413-113951-root.json
11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1120 from dbctl T334580', diff saved to https://phabricator.wikimedia.org/P46665 and previous config saved to /var/cache/conftool/dbconfig/20230413-113435-marostegui.json
11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46664 and previous config saved to /var/cache/conftool/dbconfig/20230413-112446-root.json
11:24 moritzm: installing imagemagick security updates
11:18 cgoubert@deploy2002: Finished scap: Updating mw-on-k8s certificates (duration: 01m 56s)
11:16 cgoubert@deploy2002: Started scap: Updating mw-on-k8s certificates
11:15 claime: Re-deploying mw-on-k8s to update certificates - T334561
10:39 claime: updating appservers and api certificates - T334561
10:23 Emperor: clear old 2/22/Free-object-universal-property.svg thumbs from wikipedia-commons-local-thumb.22 T334303
10:15 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46662 and previous config saved to /var/cache/conftool/dbconfig/20230413-101307-root.json
10:07 moritzm: installing tomcat security updates
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46661 and previous config saved to /var/cache/conftool/dbconfig/20230413-095802-root.json
09:53 taavi: taavi@mwmaint2002 ~ $ mwscript emptyUserGroup.php --wiki frwikinews editor # T333750
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46660 and previous config saved to /var/cache/conftool/dbconfig/20230413-094257-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46659 and previous config saved to /var/cache/conftool/dbconfig/20230413-092752-root.json
09:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-worker1001.eqiad.wmnet
09:22 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46658 and previous config saved to /var/cache/conftool/dbconfig/20230413-091247-root.json
09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: centrallog1001.eqiad.wmnet
09:04 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: centrallog1001.eqiad.wmnet
09:01 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46657 and previous config saved to /var/cache/conftool/dbconfig/20230413-085742-root.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 4%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46656 and previous config saved to /var/cache/conftool/dbconfig/20230413-084238-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46655 and previous config saved to /var/cache/conftool/dbconfig/20230413-084036-root.json
08:36 moritzm: installing git security updates
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46654 and previous config saved to /var/cache/conftool/dbconfig/20230413-083457-root.json
08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 3%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46653 and previous config saved to /var/cache/conftool/dbconfig/20230413-082732-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46652 and previous config saved to /var/cache/conftool/dbconfig/20230413-082532-root.json
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46651 and previous config saved to /var/cache/conftool/dbconfig/20230413-081952-root.json
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 2%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46650 and previous config saved to /var/cache/conftool/dbconfig/20230413-081227-root.json
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46649 and previous config saved to /var/cache/conftool/dbconfig/20230413-081027-root.json
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46647 and previous config saved to /var/cache/conftool/dbconfig/20230413-080447-root.json
08:00 moritzm: imported perccli 007.1910.0000.000 to bookworm-wikimedia-private T330495
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46646 and previous config saved to /var/cache/conftool/dbconfig/20230413-075722-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46645 and previous config saved to /var/cache/conftool/dbconfig/20230413-075522-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1223 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46644 and previous config saved to /var/cache/conftool/dbconfig/20230413-075513-marostegui.json
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46643 and previous config saved to /var/cache/conftool/dbconfig/20230413-074942-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46642 and previous config saved to /var/cache/conftool/dbconfig/20230413-074010-root.json
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46641 and previous config saved to /var/cache/conftool/dbconfig/20230413-073437-root.json
07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 10 hosts with reason: Cloning db1117
07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 10 hosts with reason: Cloning db1117
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46639 and previous config saved to /var/cache/conftool/dbconfig/20230413-072505-root.json
07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46638 and previous config saved to /var/cache/conftool/dbconfig/20230413-071932-root.json
07:14 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
07:14 slyngs: Puppet: move htcacheclean to httpd class https://gerrit.wikimedia.org/r/c/operations/puppet/+/904102
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46637 and previous config saved to /var/cache/conftool/dbconfig/20230413-071000-root.json
07:09 moritzm: update bookworm installer to rc1 T330495
07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46636 and previous config saved to /var/cache/conftool/dbconfig/20230413-070428-root.json
06:59 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
06:56 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46635 and previous config saved to /var/cache/conftool/dbconfig/20230413-065456-root.json
06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46634 and previous config saved to /var/cache/conftool/dbconfig/20230413-064922-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to clone db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46632 and previous config saved to /var/cache/conftool/dbconfig/20230413-064452-marostegui.json
06:43 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46631 and previous config saved to /var/cache/conftool/dbconfig/20230413-063951-root.json
06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46630 and previous config saved to /var/cache/conftool/dbconfig/20230413-063417-root.json
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46629 and previous config saved to /var/cache/conftool/dbconfig/20230413-062446-root.json
06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1221 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46628 and previous config saved to /var/cache/conftool/dbconfig/20230413-062231-marostegui.json
06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46627 and previous config saved to /var/cache/conftool/dbconfig/20230413-061913-root.json
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1210 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46626 and previous config saved to /var/cache/conftool/dbconfig/20230413-061716-marostegui.json
02:08 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
02:07 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
02:01 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 11s)
02:01 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
02:00 ejegg: civicrm upgraded from 0f37f981 to 2d5ede8d
01:41 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
01:41 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
01:23 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
01:23 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
00:22 krinkle@deploy2002: Finished deploy [integration/docroot@f68055d]: (no justification provided) (duration: 00m 28s)
00:21 krinkle@deploy2002: Started deploy [integration/docroot@f68055d]: (no justification provided)

2023-04-12

23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46625 and previous config saved to /var/cache/conftool/dbconfig/20230412-230933-ladsgroup.json
22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46624 and previous config saved to /var/cache/conftool/dbconfig/20230412-225427-ladsgroup.json
22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46623 and previous config saved to /var/cache/conftool/dbconfig/20230412-223921-ladsgroup.json
22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46622 and previous config saved to /var/cache/conftool/dbconfig/20230412-222414-ladsgroup.json
22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46621 and previous config saved to /var/cache/conftool/dbconfig/20230412-222141-ladsgroup.json
22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46620 and previous config saved to /var/cache/conftool/dbconfig/20230412-222117-ladsgroup.json
22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46619 and previous config saved to /var/cache/conftool/dbconfig/20230412-220611-ladsgroup.json
21:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2007.codfw.wmnet with OS bullseye
21:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for sessionstore1001.eqiad.wmnet
21:52 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for sessionstore1001.eqiad.wmnet
21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46618 and previous config saved to /var/cache/conftool/dbconfig/20230412-215104-ladsgroup.json
21:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2007.codfw.wmnet with reason: host reimage
21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46617 and previous config saved to /var/cache/conftool/dbconfig/20230412-213558-ladsgroup.json
21:35 urandom: restarting Cassandra —sessionstore1001— to reenable native transport — T327954
21:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2007.codfw.wmnet with reason: host reimage
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46616 and previous config saved to /var/cache/conftool/dbconfig/20230412-213325-ladsgroup.json
21:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46615 and previous config saved to /var/cache/conftool/dbconfig/20230412-213301-ladsgroup.json
21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46614 and previous config saved to /var/cache/conftool/dbconfig/20230412-211755-ladsgroup.json
21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2007.codfw.wmnet with OS bullseye
21:04 mutante: gerrit1001 - pushing data over to gerrit1003 via rsync, with bwlimit option: rsync -avp --bwlimit=1m /srv/gerrit/ rsync://gerrit1003.wikimedia.org/gerrit-data/ (T326368)
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46613 and previous config saved to /var/cache/conftool/dbconfig/20230412-210249-ladsgroup.json
21:01 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs2007.codfw.wmnet with OS bullseye
21:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2007.codfw.wmnet with OS bullseye
20:58 brett: Disable Puppet/PyBal on lvs2007 in preparation for reimaging - T321309
20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46612 and previous config saved to /var/cache/conftool/dbconfig/20230412-204742-ladsgroup.json
20:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46611 and previous config saved to /var/cache/conftool/dbconfig/20230412-204508-ladsgroup.json
20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46610 and previous config saved to /var/cache/conftool/dbconfig/20230412-204445-ladsgroup.json
20:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46609 and previous config saved to /var/cache/conftool/dbconfig/20230412-202939-ladsgroup.json
20:15 zabe@deploy2002: Finished scap: Backport for gerrit:907511Drop unused VectorPageTools feature flag (T332090), gerrit:907539Set Vector 2022 as default skin on Welsh Wikipedia (T334279) (duration: 10m 19s)
20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46608 and previous config saved to /var/cache/conftool/dbconfig/20230412-201432-ladsgroup.json
20:06 zabe@deploy2002: zabe and jdlrobson: Backport for gerrit:907511Drop unused VectorPageTools feature flag (T332090), gerrit:907539Set Vector 2022 as default skin on Welsh Wikipedia (T334279) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:05 zabe@deploy2002: Started scap: Backport for gerrit:907511Drop unused VectorPageTools feature flag (T332090), gerrit:907539Set Vector 2022 as default skin on Welsh Wikipedia (T334279)
19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46606 and previous config saved to /var/cache/conftool/dbconfig/20230412-195926-ladsgroup.json
19:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46605 and previous config saved to /var/cache/conftool/dbconfig/20230412-195453-ladsgroup.json
19:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
19:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
19:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46604 and previous config saved to /var/cache/conftool/dbconfig/20230412-195423-ladsgroup.json
19:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
19:43 zabe@deploy2002: Finished scap: Backport for gerrit:908292Revert "Ensure ApiHelp correctly types values in TOCData objects", gerrit:908293Revert "Ensure ApiHelp correctly types values in TOCData objects" (duration: 06m 40s)
19:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:41 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:40 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46603 and previous config saved to /var/cache/conftool/dbconfig/20230412-193917-ladsgroup.json
19:38 zabe@deploy2002: zabe: Backport for gerrit:908292Revert "Ensure ApiHelp correctly types values in TOCData objects", gerrit:908293Revert "Ensure ApiHelp correctly types values in TOCData objects" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:37 zabe@deploy2002: Started scap: Backport for gerrit:908292Revert "Ensure ApiHelp correctly types values in TOCData objects", gerrit:908293Revert "Ensure ApiHelp correctly types values in TOCData objects"
19:37 urandom: sessionstore1001: systemctl stop cassandra-a.service && systemctl start cassandra-a.service — T327954
19:36 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
19:35 zabe@deploy2002: Sync cancelled.
19:32 zabe@deploy2002: jforrester and zabe: Backport for gerrit:908291composer.json: Explicitly pin psr/http-message to 1.0.1 (T333993), gerrit:908290Ensure ApiHelp correctly types values in TOCData objects (T334551), gerrit:908289Ensure ApiHelp correctly types values in TOCData objects (T334551) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.
19:30 zabe@deploy2002: Started scap: Backport for gerrit:908291composer.json: Explicitly pin psr/http-message to 1.0.1 (T333993), gerrit:908290Ensure ApiHelp correctly types values in TOCData objects (T334551), gerrit:908289Ensure ApiHelp correctly types values in TOCData objects (T334551)
19:28 urandom: restart Cassandra —sessionstore1001— to disable native transport for testing — T327954
19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46602 and previous config saved to /var/cache/conftool/dbconfig/20230412-192411-ladsgroup.json
19:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sessionstore1001.eqiad.wmnet with reason: Reproducing dissonant cluster state
19:16 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on sessionstore1001.eqiad.wmnet with reason: Reproducing dissonant cluster state
19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46601 and previous config saved to /var/cache/conftool/dbconfig/20230412-190904-ladsgroup.json
18:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
18:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46600 and previous config saved to /var/cache/conftool/dbconfig/20230412-183822-ladsgroup.json
18:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46599 and previous config saved to /var/cache/conftool/dbconfig/20230412-183758-ladsgroup.json
18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46598 and previous config saved to /var/cache/conftool/dbconfig/20230412-182252-ladsgroup.json
18:16 dancy@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.4 refs T330210 (duration: 06m 02s)
18:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.4 refs T330210
18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46597 and previous config saved to /var/cache/conftool/dbconfig/20230412-180746-ladsgroup.json
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46596 and previous config saved to /var/cache/conftool/dbconfig/20230412-175240-ladsgroup.json
17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46595 and previous config saved to /var/cache/conftool/dbconfig/20230412-174806-ladsgroup.json
17:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46594 and previous config saved to /var/cache/conftool/dbconfig/20230412-174743-ladsgroup.json
17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
17:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46593 and previous config saved to /var/cache/conftool/dbconfig/20230412-173237-ladsgroup.json
17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46592 and previous config saved to /var/cache/conftool/dbconfig/20230412-171730-ladsgroup.json
17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46591 and previous config saved to /var/cache/conftool/dbconfig/20230412-171219-ladsgroup.json
17:06 ejegg: payments-wiki upgraded from efe7e408 to 4dcba0a9
17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46590 and previous config saved to /var/cache/conftool/dbconfig/20230412-170224-ladsgroup.json
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46589 and previous config saved to /var/cache/conftool/dbconfig/20230412-165951-ladsgroup.json
16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46588 and previous config saved to /var/cache/conftool/dbconfig/20230412-165928-ladsgroup.json
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P46587 and previous config saved to /var/cache/conftool/dbconfig/20230412-165712-ladsgroup.json
16:54 topranks: Updating routing-options on drmrs asw switches to add empty rib inet6 stanza T334281
16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:51 topranks: Updating routing-options on Eqiad lsw1 switches to add empty rib inet6 stanza T334281
16:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46586 and previous config saved to /var/cache/conftool/dbconfig/20230412-164422-ladsgroup.json
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P46585 and previous config saved to /var/cache/conftool/dbconfig/20230412-164206-ladsgroup.json
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46584 and previous config saved to /var/cache/conftool/dbconfig/20230412-162915-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46583 and previous config saved to /var/cache/conftool/dbconfig/20230412-162700-ladsgroup.json
16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46582 and previous config saved to /var/cache/conftool/dbconfig/20230412-162448-ladsgroup.json
16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46581 and previous config saved to /var/cache/conftool/dbconfig/20230412-162422-ladsgroup.json
16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46580 and previous config saved to /var/cache/conftool/dbconfig/20230412-161409-ladsgroup.json
16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46579 and previous config saved to /var/cache/conftool/dbconfig/20230412-161135-ladsgroup.json
16:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
16:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46578 and previous config saved to /var/cache/conftool/dbconfig/20230412-161112-ladsgroup.json
16:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P46577 and previous config saved to /var/cache/conftool/dbconfig/20230412-160916-ladsgroup.json
16:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
16:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
16:04 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
16:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2010.codfw.wmnet with OS bullseye
16:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
16:03 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
16:02 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
15:58 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
15:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46576 and previous config saved to /var/cache/conftool/dbconfig/20230412-155606-ladsgroup.json
15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P46575 and previous config saved to /var/cache/conftool/dbconfig/20230412-155410-ladsgroup.json
15:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:49 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:49 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:47 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:47 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2010.codfw.wmnet with reason: host reimage
15:45 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:44 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2010.codfw.wmnet with reason: host reimage
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46573 and previous config saved to /var/cache/conftool/dbconfig/20230412-154100-ladsgroup.json
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46572 and previous config saved to /var/cache/conftool/dbconfig/20230412-153903-ladsgroup.json
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46571 and previous config saved to /var/cache/conftool/dbconfig/20230412-153651-ladsgroup.json
15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46570 and previous config saved to /var/cache/conftool/dbconfig/20230412-153627-ladsgroup.json
15:30 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:30 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46569 and previous config saved to /var/cache/conftool/dbconfig/20230412-152553-ladsgroup.json
15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46568 and previous config saved to /var/cache/conftool/dbconfig/20230412-152320-ladsgroup.json
15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
15:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2010.codfw.wmnet with OS bullseye
15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P46567 and previous config saved to /var/cache/conftool/dbconfig/20230412-152120-ladsgroup.json
15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46566 and previous config saved to /var/cache/conftool/dbconfig/20230412-152104-ladsgroup.json
15:14 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
15:14 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P46565 and previous config saved to /var/cache/conftool/dbconfig/20230412-150614-ladsgroup.json
15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46564 and previous config saved to /var/cache/conftool/dbconfig/20230412-150557-ladsgroup.json
15:05 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:05 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
15:04 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:04 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
15:00 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
15:00 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
14:59 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
14:59 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
14:59 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
14:59 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
14:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
14:58 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
14:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
14:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
14:57 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
14:56 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
14:56 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
14:55 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
14:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
14:54 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
14:53 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
14:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
14:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
14:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46563 and previous config saved to /var/cache/conftool/dbconfig/20230412-145108-ladsgroup.json
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46562 and previous config saved to /var/cache/conftool/dbconfig/20230412-145051-ladsgroup.json
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46561 and previous config saved to /var/cache/conftool/dbconfig/20230412-144856-ladsgroup.json
14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46560 and previous config saved to /var/cache/conftool/dbconfig/20230412-144815-ladsgroup.json
14:44 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
14:43 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
14:43 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
14:43 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
14:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:40 moritzm: installing apache security updates on phab1004 (phabricator.wikimedia.org)
14:38 moritzm: installing apache security updates on gerrit1001
14:36 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:36 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46559 and previous config saved to /var/cache/conftool/dbconfig/20230412-143545-ladsgroup.json
14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46558 and previous config saved to /var/cache/conftool/dbconfig/20230412-143331-ladsgroup.json
14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P46557 and previous config saved to /var/cache/conftool/dbconfig/20230412-143309-ladsgroup.json
14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46556 and previous config saved to /var/cache/conftool/dbconfig/20230412-143308-ladsgroup.json
14:32 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:23 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46554 and previous config saved to /var/cache/conftool/dbconfig/20230412-142045-root.json
14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P46553 and previous config saved to /var/cache/conftool/dbconfig/20230412-141802-ladsgroup.json
14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46552 and previous config saved to /var/cache/conftool/dbconfig/20230412-141801-ladsgroup.json
14:13 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes kswiki --fix # T334277, fixed the one remaining link
14:07 moritzm: re-enabled Puppet in codfw/edges after puppetdb maintenance
14:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
14:06 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:05 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46550 and previous config saved to /var/cache/conftool/dbconfig/20230412-140540-root.json
14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46549 and previous config saved to /var/cache/conftool/dbconfig/20230412-140255-ladsgroup.json
14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46548 and previous config saved to /var/cache/conftool/dbconfig/20230412-140045-ladsgroup.json
14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46547 and previous config saved to /var/cache/conftool/dbconfig/20230412-135959-ladsgroup.json
13:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46546 and previous config saved to /var/cache/conftool/dbconfig/20230412-135035-root.json
13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46545 and previous config saved to /var/cache/conftool/dbconfig/20230412-134749-ladsgroup.json
13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46544 and previous config saved to /var/cache/conftool/dbconfig/20230412-134535-ladsgroup.json
13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46543 and previous config saved to /var/cache/conftool/dbconfig/20230412-134512-ladsgroup.json
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P46542 and previous config saved to /var/cache/conftool/dbconfig/20230412-134453-ladsgroup.json
13:43 moritzm: stop Puppet in codfw/edges for puppetdb maintenance
13:43 Lucas_WMDE: UTC afternoon backport+config window done
13:39 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:896104Make VE on officewiki use Parsoid directly (T320529 T333402) (duration: 09m 48s)
13:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintenance
13:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintenance
13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46541 and previous config saved to /var/cache/conftool/dbconfig/20230412-133531-root.json
13:34 sukhe: [puppetmaster] sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080 to update PCC
13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and daniel: Backport for gerrit:896104Make VE on officewiki use Parsoid directly (T320529 T333402) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46540 and previous config saved to /var/cache/conftool/dbconfig/20230412-133006-ladsgroup.json
13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P46539 and previous config saved to /var/cache/conftool/dbconfig/20230412-132946-ladsgroup.json
13:29 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:896104Make VE on officewiki use Parsoid directly (T320529 T333402)
13:28 eoghan: Stopping puppet on gitlab hosts to slow-rollout puppet ssh key management - T333840
13:26 elukey: upload AMD ROCm 5.4 debian packages to wikimedia-bullseye:thirdparty/amd-rocm54 - T295661
13:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes kswiki --fix | tee >(phaste -t T334277) # P46538; errors on stderr, cf. T328634
13:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:907899GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133) (duration: 13m 30s)
13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46537 and previous config saved to /var/cache/conftool/dbconfig/20230412-132026-root.json
13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46535 and previous config saved to /var/cache/conftool/dbconfig/20230412-131459-ladsgroup.json
13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46533 and previous config saved to /var/cache/conftool/dbconfig/20230412-131440-ladsgroup.json
13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46532 and previous config saved to /var/cache/conftool/dbconfig/20230412-131227-ladsgroup.json
13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46531 and previous config saved to /var/cache/conftool/dbconfig/20230412-131204-ladsgroup.json
13:08 lucaswerkmeister-wmde@deploy2002: sgimeno and lucaswerkmeister-wmde: Backport for gerrit:907899GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
13:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:907899GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)
13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46530 and previous config saved to /var/cache/conftool/dbconfig/20230412-130521-root.json
13:03 moritzm: installing nodejs security updates on buster
13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idm1001.wikimedia.org
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46529 and previous config saved to /var/cache/conftool/dbconfig/20230412-125953-ladsgroup.json
12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46528 and previous config saved to /var/cache/conftool/dbconfig/20230412-125739-ladsgroup.json
12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idm1001.wikimedia.org
12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46527 and previous config saved to /var/cache/conftool/dbconfig/20230412-125716-ladsgroup.json
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P46526 and previous config saved to /var/cache/conftool/dbconfig/20230412-125658-ladsgroup.json
12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46525 and previous config saved to /var/cache/conftool/dbconfig/20230412-125016-root.json
12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46524 and previous config saved to /var/cache/conftool/dbconfig/20230412-124210-ladsgroup.json
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P46523 and previous config saved to /var/cache/conftool/dbconfig/20230412-124151-ladsgroup.json
12:35 moritzm: installing intel-microcode security updates
12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46522 and previous config saved to /var/cache/conftool/dbconfig/20230412-122703-ladsgroup.json
12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46521 and previous config saved to /var/cache/conftool/dbconfig/20230412-122645-ladsgroup.json
12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46520 and previous config saved to /var/cache/conftool/dbconfig/20230412-122433-ladsgroup.json
12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46519 and previous config saved to /var/cache/conftool/dbconfig/20230412-122409-ladsgroup.json
12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T334580', diff saved to https://phabricator.wikimedia.org/P46518 and previous config saved to /var/cache/conftool/dbconfig/20230412-121420-marostegui.json
12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46517 and previous config saved to /var/cache/conftool/dbconfig/20230412-121157-ladsgroup.json
12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46516 and previous config saved to /var/cache/conftool/dbconfig/20230412-120943-ladsgroup.json
12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P46515 and previous config saved to /var/cache/conftool/dbconfig/20230412-120903-ladsgroup.json
12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46514 and previous config saved to /var/cache/conftool/dbconfig/20230412-120853-ladsgroup.json
11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P46513 and previous config saved to /var/cache/conftool/dbconfig/20230412-115357-ladsgroup.json
11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46512 and previous config saved to /var/cache/conftool/dbconfig/20230412-115347-ladsgroup.json
11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46509 and previous config saved to /var/cache/conftool/dbconfig/20230412-113850-ladsgroup.json
11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46508 and previous config saved to /var/cache/conftool/dbconfig/20230412-113840-ladsgroup.json
11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46507 and previous config saved to /var/cache/conftool/dbconfig/20230412-113638-ladsgroup.json
11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46506 and previous config saved to /var/cache/conftool/dbconfig/20230412-113615-ladsgroup.json
11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46505 and previous config saved to /var/cache/conftool/dbconfig/20230412-112334-ladsgroup.json
11:23 marostegui: dbmaint Upgrade db1106 to mariadb 11.1 (eqiad) T333289
11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46504 and previous config saved to /var/cache/conftool/dbconfig/20230412-112217-ladsgroup.json
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46503 and previous config saved to /var/cache/conftool/dbconfig/20230412-112154-ladsgroup.json
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P46502 and previous config saved to /var/cache/conftool/dbconfig/20230412-112108-ladsgroup.json
11:12 moritzm: installing gnutls28 security updates on buster
11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46501 and previous config saved to /var/cache/conftool/dbconfig/20230412-110647-ladsgroup.json
11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P46500 and previous config saved to /var/cache/conftool/dbconfig/20230412-110602-ladsgroup.json
11:00 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2448.*.codfw.wmnet
10:59 claime: repooling mw2448.codfw.wmnet - T334429
10:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
10:59 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
10:56 moritzm: installing apache2 security updates on Buster
10:56 moritzm: installing apache2 security updates on Bullseye
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46499 and previous config saved to /var/cache/conftool/dbconfig/20230412-105356-root.json
10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46498 and previous config saved to /var/cache/conftool/dbconfig/20230412-105141-ladsgroup.json
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46497 and previous config saved to /var/cache/conftool/dbconfig/20230412-105056-ladsgroup.json
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46496 and previous config saved to /var/cache/conftool/dbconfig/20230412-104843-ladsgroup.json
10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46495 and previous config saved to /var/cache/conftool/dbconfig/20230412-104820-ladsgroup.json
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46494 and previous config saved to /var/cache/conftool/dbconfig/20230412-103851-root.json
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46493 and previous config saved to /var/cache/conftool/dbconfig/20230412-103635-ladsgroup.json
10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46492 and previous config saved to /var/cache/conftool/dbconfig/20230412-103421-ladsgroup.json
10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46491 and previous config saved to /var/cache/conftool/dbconfig/20230412-103348-ladsgroup.json
10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P46490 and previous config saved to /var/cache/conftool/dbconfig/20230412-103314-ladsgroup.json
10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 04s)
10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 06s)
10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 02s)
10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
10:28 hashar@deploy2002: Finished deploy [zuul/deploy@4c6859c]: Dummy deploy with dsh file managed by Puppet (duration: 00m 02s)
10:28 hashar@deploy2002: Started deploy [zuul/deploy@4c6859c]: Dummy deploy with dsh file managed by Puppet
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46489 and previous config saved to /var/cache/conftool/dbconfig/20230412-102346-root.json
10:18 Emperor: clearing out 24 ghost objects from Swift T327253
10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46488 and previous config saved to /var/cache/conftool/dbconfig/20230412-101841-ladsgroup.json
10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P46487 and previous config saved to /var/cache/conftool/dbconfig/20230412-101808-ladsgroup.json
10:10 cgoubert@deploy2002: Synchronized README: (no justification provided) (duration: 05m 44s)
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46486 and previous config saved to /var/cache/conftool/dbconfig/20230412-100841-root.json
10:06 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46485 and previous config saved to /var/cache/conftool/dbconfig/20230412-100335-ladsgroup.json
10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46484 and previous config saved to /var/cache/conftool/dbconfig/20230412-100301-ladsgroup.json
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to clone db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46482 and previous config saved to /var/cache/conftool/dbconfig/20230412-100111-marostegui.json
10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46481 and previous config saved to /var/cache/conftool/dbconfig/20230412-100049-ladsgroup.json
10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46480 and previous config saved to /var/cache/conftool/dbconfig/20230412-100026-ladsgroup.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46479 and previous config saved to /var/cache/conftool/dbconfig/20230412-095336-root.json
09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46478 and previous config saved to /var/cache/conftool/dbconfig/20230412-094829-ladsgroup.json
09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46477 and previous config saved to /var/cache/conftool/dbconfig/20230412-094615-ladsgroup.json
09:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46476 and previous config saved to /var/cache/conftool/dbconfig/20230412-094551-ladsgroup.json
09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P46475 and previous config saved to /var/cache/conftool/dbconfig/20230412-094520-ladsgroup.json
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46474 and previous config saved to /var/cache/conftool/dbconfig/20230412-093831-root.json
09:34 claime: Reverted migrating cxserver to mw-api-int on kubernetes - T334204
09:34 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46473 and previous config saved to /var/cache/conftool/dbconfig/20230412-093045-ladsgroup.json
09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P46472 and previous config saved to /var/cache/conftool/dbconfig/20230412-093013-ladsgroup.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46470 and previous config saved to /var/cache/conftool/dbconfig/20230412-092327-root.json
09:21 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
09:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:20 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46469 and previous config saved to /var/cache/conftool/dbconfig/20230412-091539-ladsgroup.json
09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46468 and previous config saved to /var/cache/conftool/dbconfig/20230412-091507-ladsgroup.json
09:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46467 and previous config saved to /var/cache/conftool/dbconfig/20230412-091255-ladsgroup.json
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
09:12 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46466 and previous config saved to /var/cache/conftool/dbconfig/20230412-091151-ladsgroup.json
09:11 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:07 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
09:06 claime: Migrating cxserver to mw-api-int on kubernetes - T334204
09:04 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46464 and previous config saved to /var/cache/conftool/dbconfig/20230412-090032-ladsgroup.json
08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46463 and previous config saved to /var/cache/conftool/dbconfig/20230412-085816-ladsgroup.json
08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P46462 and previous config saved to /var/cache/conftool/dbconfig/20230412-085644-ladsgroup.json
08:51 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
08:51 aqu@deploy2002: Finished deploy [airflow-dags/analytics@18ae3be]: Deploy airflow-dags including webrequest load job - Analytics [airflow-dags@18ae3be] (duration: 00m 12s)
08:50 aqu@deploy2002: Started deploy [airflow-dags/analytics@18ae3be]: Deploy airflow-dags including webrequest load job - Analytics [airflow-dags@18ae3be]
08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P46460 and previous config saved to /var/cache/conftool/dbconfig/20230412-084138-ladsgroup.json
08:37 marostegui: dbmaint Deploy schema change on s1 codfw with replication T334536
08:35 aqu@deploy2002: Finished deploy [analytics/refinery@f3389dc] (thin): Deploy analytics_refinery in production thin [analytics/refinery@f3389dc] (duration: 00m 07s)
08:35 aqu@deploy2002: Started deploy [analytics/refinery@f3389dc] (thin): Deploy analytics_refinery in production thin [analytics/refinery@f3389dc]
08:35 moritzm: imported puppet 5.5.22-2+deb12u1 for bookworm-wikimedia component/puppet5 T330495
08:34 aqu@deploy2002: Finished deploy [analytics/refinery@f3389dc]: Deploy analytics_refinery in production [analytics/refinery@f3389dc] (duration: 00m 41s)
08:34 aqu@deploy2002: Started deploy [analytics/refinery@f3389dc]: Deploy analytics_refinery in production [analytics/refinery@f3389dc]
08:33 aqu: About to deploy analytics/refinery in production
08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46459 and previous config saved to /var/cache/conftool/dbconfig/20230412-082632-ladsgroup.json
08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46458 and previous config saved to /var/cache/conftool/dbconfig/20230412-082424-ladsgroup.json
08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46457 and previous config saved to /var/cache/conftool/dbconfig/20230412-082400-ladsgroup.json
08:17 hashar@deploy2002: Synchronized wmf-config/CommonSettings-labs.php: [Beta Cluster] Replicate WebResponseSetCookie wgHooks migration here too - T333926 (duration: 05m 51s)
08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P46456 and previous config saved to /var/cache/conftool/dbconfig/20230412-080854-ladsgroup.json
08:03 marostegui: dbmaint Deploy schema change on s3 codfw with replication enabled (only for testwiki and test2wiki) T334536
08:01 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46455 and previous config saved to /var/cache/conftool/dbconfig/20230412-075703-root.json
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46454 and previous config saved to /var/cache/conftool/dbconfig/20230412-075422-root.json
07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P46453 and previous config saved to /var/cache/conftool/dbconfig/20230412-075348-ladsgroup.json
07:45 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46451 and previous config saved to /var/cache/conftool/dbconfig/20230412-074158-root.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1107 from dbctl T334447', diff saved to https://phabricator.wikimedia.org/P46450 and previous config saved to /var/cache/conftool/dbconfig/20230412-073921-marostegui.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46449 and previous config saved to /var/cache/conftool/dbconfig/20230412-073917-root.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46448 and previous config saved to /var/cache/conftool/dbconfig/20230412-073841-ladsgroup.json
07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46447 and previous config saved to /var/cache/conftool/dbconfig/20230412-073633-ladsgroup.json
07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:36 moritzm: installing python-cryptography security updates
07:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46446 and previous config saved to /var/cache/conftool/dbconfig/20230412-073550-ladsgroup.json
07:30 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46445 and previous config saved to /var/cache/conftool/dbconfig/20230412-072812-root.json
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 50%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46444 and previous config saved to /var/cache/conftool/dbconfig/20230412-072654-root.json
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46443 and previous config saved to /var/cache/conftool/dbconfig/20230412-072412-root.json
07:21 moritzm: installing xen security updates
07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P46442 and previous config saved to /var/cache/conftool/dbconfig/20230412-072044-ladsgroup.json
07:16 marostegui: Drop flaggerevs tables from ptwikisource T332594
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46441 and previous config saved to /var/cache/conftool/dbconfig/20230412-071307-root.json
07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 25%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46440 and previous config saved to /var/cache/conftool/dbconfig/20230412-071149-root.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46439 and previous config saved to /var/cache/conftool/dbconfig/20230412-070907-root.json
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P46438 and previous config saved to /var/cache/conftool/dbconfig/20230412-070538-ladsgroup.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46437 and previous config saved to /var/cache/conftool/dbconfig/20230412-065802-root.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 10%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46436 and previous config saved to /var/cache/conftool/dbconfig/20230412-065644-root.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46435 and previous config saved to /var/cache/conftool/dbconfig/20230412-065402-root.json
06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46434 and previous config saved to /var/cache/conftool/dbconfig/20230412-065032-ladsgroup.json
06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46433 and previous config saved to /var/cache/conftool/dbconfig/20230412-064823-ladsgroup.json
06:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
06:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46432 and previous config saved to /var/cache/conftool/dbconfig/20230412-064800-ladsgroup.json
06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46431 and previous config saved to /var/cache/conftool/dbconfig/20230412-064257-root.json
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 5%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46430 and previous config saved to /var/cache/conftool/dbconfig/20230412-064139-root.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46429 and previous config saved to /var/cache/conftool/dbconfig/20230412-063858-root.json
06:38 vgutierrez: restart haproxy on cp2035 - T334448
06:33 marostegui: Stop mariadb on db1121 to clone db1221 this will generate lag on clouddb replicas for s4 T326669
06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P46427 and previous config saved to /var/cache/conftool/dbconfig/20230412-063253-ladsgroup.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone db1221 T326669', diff saved to https://phabricator.wikimedia.org/P46426 and previous config saved to /var/cache/conftool/dbconfig/20230412-063224-marostegui.json
06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46425 and previous config saved to /var/cache/conftool/dbconfig/20230412-062752-root.json
06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 4%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46424 and previous config saved to /var/cache/conftool/dbconfig/20230412-062634-root.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46423 and previous config saved to /var/cache/conftool/dbconfig/20230412-062353-root.json
06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P46422 and previous config saved to /var/cache/conftool/dbconfig/20230412-061747-ladsgroup.json
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46421 and previous config saved to /var/cache/conftool/dbconfig/20230412-061248-root.json
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 3%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46420 and previous config saved to /var/cache/conftool/dbconfig/20230412-061129-root.json
06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46419 and previous config saved to /var/cache/conftool/dbconfig/20230412-060241-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46418 and previous config saved to /var/cache/conftool/dbconfig/20230412-060133-ladsgroup.json
06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46417 and previous config saved to /var/cache/conftool/dbconfig/20230412-060109-ladsgroup.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46416 and previous config saved to /var/cache/conftool/dbconfig/20230412-055743-root.json
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 2%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46415 and previous config saved to /var/cache/conftool/dbconfig/20230412-055624-root.json
05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P46414 and previous config saved to /var/cache/conftool/dbconfig/20230412-054603-ladsgroup.json
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 to clone db1210 T326669', diff saved to https://phabricator.wikimedia.org/P46412 and previous config saved to /var/cache/conftool/dbconfig/20230412-054258-marostegui.json
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46411 and previous config saved to /var/cache/conftool/dbconfig/20230412-054238-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 1%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46410 and previous config saved to /var/cache/conftool/dbconfig/20230412-054120-root.json
05:41 krinkle@deploy2002: Synchronized php-1.41.0-wmf.4/includes/libs/objectcache/: Ie3a2215d33: disable WANCache cool-off feature (duration: 06m 00s)
05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1218 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46409 and previous config saved to /var/cache/conftool/dbconfig/20230412-054024-marostegui.json
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P46408 and previous config saved to /var/cache/conftool/dbconfig/20230412-053057-ladsgroup.json
05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46407 and previous config saved to /var/cache/conftool/dbconfig/20230412-052733-root.json
05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1222 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46406 and previous config saved to /var/cache/conftool/dbconfig/20230412-052504-marostegui.json
05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46405 and previous config saved to /var/cache/conftool/dbconfig/20230412-051550-ladsgroup.json
05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46404 and previous config saved to /var/cache/conftool/dbconfig/20230412-051342-ladsgroup.json
05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46403 and previous config saved to /var/cache/conftool/dbconfig/20230412-051319-ladsgroup.json
04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P46402 and previous config saved to /var/cache/conftool/dbconfig/20230412-045813-ladsgroup.json
04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P46401 and previous config saved to /var/cache/conftool/dbconfig/20230412-044306-ladsgroup.json
04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46400 and previous config saved to /var/cache/conftool/dbconfig/20230412-042800-ladsgroup.json
04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46399 and previous config saved to /var/cache/conftool/dbconfig/20230412-042552-ladsgroup.json
04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46398 and previous config saved to /var/cache/conftool/dbconfig/20230412-042510-ladsgroup.json
04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P46397 and previous config saved to /var/cache/conftool/dbconfig/20230412-041003-ladsgroup.json
03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P46396 and previous config saved to /var/cache/conftool/dbconfig/20230412-035457-ladsgroup.json
03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46395 and previous config saved to /var/cache/conftool/dbconfig/20230412-033951-ladsgroup.json
03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46394 and previous config saved to /var/cache/conftool/dbconfig/20230412-033742-ladsgroup.json
03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
03:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46393 and previous config saved to /var/cache/conftool/dbconfig/20230412-033719-ladsgroup.json
03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P46392 and previous config saved to /var/cache/conftool/dbconfig/20230412-032213-ladsgroup.json
03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P46391 and previous config saved to /var/cache/conftool/dbconfig/20230412-030707-ladsgroup.json
02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46390 and previous config saved to /var/cache/conftool/dbconfig/20230412-025200-ladsgroup.json
02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46389 and previous config saved to /var/cache/conftool/dbconfig/20230412-024952-ladsgroup.json
02:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
02:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46388 and previous config saved to /var/cache/conftool/dbconfig/20230412-024929-ladsgroup.json
02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P46387 and previous config saved to /var/cache/conftool/dbconfig/20230412-023422-ladsgroup.json
02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P46386 and previous config saved to /var/cache/conftool/dbconfig/20230412-021916-ladsgroup.json
02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46385 and previous config saved to /var/cache/conftool/dbconfig/20230412-020410-ladsgroup.json
02:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46384 and previous config saved to /var/cache/conftool/dbconfig/20230412-020201-ladsgroup.json
02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46383 and previous config saved to /var/cache/conftool/dbconfig/20230412-020138-ladsgroup.json
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P46382 and previous config saved to /var/cache/conftool/dbconfig/20230412-014632-ladsgroup.json
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P46381 and previous config saved to /var/cache/conftool/dbconfig/20230412-013126-ladsgroup.json
01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46380 and previous config saved to /var/cache/conftool/dbconfig/20230412-011619-ladsgroup.json
01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46379 and previous config saved to /var/cache/conftool/dbconfig/20230412-011411-ladsgroup.json
01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46378 and previous config saved to /var/cache/conftool/dbconfig/20230412-011348-ladsgroup.json
01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46377 and previous config saved to /var/cache/conftool/dbconfig/20230412-010832-ladsgroup.json
00:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P46376 and previous config saved to /var/cache/conftool/dbconfig/20230412-005841-ladsgroup.json
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P46375 and previous config saved to /var/cache/conftool/dbconfig/20230412-005325-ladsgroup.json
00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P46374 and previous config saved to /var/cache/conftool/dbconfig/20230412-004335-ladsgroup.json
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P46373 and previous config saved to /var/cache/conftool/dbconfig/20230412-003819-ladsgroup.json
00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46372 and previous config saved to /var/cache/conftool/dbconfig/20230412-002829-ladsgroup.json
00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46371 and previous config saved to /var/cache/conftool/dbconfig/20230412-002620-ladsgroup.json
00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1138.eqiad.wmnet with reason: Maintenance
00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1138.eqiad.wmnet with reason: Maintenance
00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46370 and previous config saved to /var/cache/conftool/dbconfig/20230412-002557-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46369 and previous config saved to /var/cache/conftool/dbconfig/20230412-002312-ladsgroup.json
00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P46368 and previous config saved to /var/cache/conftool/dbconfig/20230412-001051-ladsgroup.json

2023-04-11

23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P46367 and previous config saved to /var/cache/conftool/dbconfig/20230411-235544-ladsgroup.json
23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46366 and previous config saved to /var/cache/conftool/dbconfig/20230411-235225-ladsgroup.json
23:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
23:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46365 and previous config saved to /var/cache/conftool/dbconfig/20230411-235202-ladsgroup.json
23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46364 and previous config saved to /var/cache/conftool/dbconfig/20230411-234038-ladsgroup.json
23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46363 and previous config saved to /var/cache/conftool/dbconfig/20230411-233930-ladsgroup.json
23:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
23:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P46362 and previous config saved to /var/cache/conftool/dbconfig/20230411-233655-ladsgroup.json
23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P46361 and previous config saved to /var/cache/conftool/dbconfig/20230411-232149-ladsgroup.json
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46360 and previous config saved to /var/cache/conftool/dbconfig/20230411-230643-ladsgroup.json
22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46359 and previous config saved to /var/cache/conftool/dbconfig/20230411-223732-ladsgroup.json
22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
22:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
22:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46358 and previous config saved to /var/cache/conftool/dbconfig/20230411-223651-ladsgroup.json
22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P46357 and previous config saved to /var/cache/conftool/dbconfig/20230411-222145-ladsgroup.json
22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P46356 and previous config saved to /var/cache/conftool/dbconfig/20230411-220638-ladsgroup.json
21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46355 and previous config saved to /var/cache/conftool/dbconfig/20230411-215132-ladsgroup.json
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46354 and previous config saved to /var/cache/conftool/dbconfig/20230411-212053-ladsgroup.json
21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46353 and previous config saved to /var/cache/conftool/dbconfig/20230411-205239-ladsgroup.json
20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P46352 and previous config saved to /var/cache/conftool/dbconfig/20230411-203733-ladsgroup.json
20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P46351 and previous config saved to /var/cache/conftool/dbconfig/20230411-202227-ladsgroup.json
20:19 mforns@deploy2002: Finished deploy [airflow-dags/analytics@fcc4c9b]: (no justification provided) (duration: 00m 11s)
20:19 mforns@deploy2002: Started deploy [airflow-dags/analytics@fcc4c9b]: (no justification provided)
20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46350 and previous config saved to /var/cache/conftool/dbconfig/20230411-200720-ladsgroup.json
20:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46349 and previous config saved to /var/cache/conftool/dbconfig/20230411-193640-ladsgroup.json
19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46348 and previous config saved to /var/cache/conftool/dbconfig/20230411-193628-ladsgroup.json
19:31 ejegg: payments-wiki upgraded from ad6e5801 to 153bdf64
19:29 ejegg: civicrm upgraded from e2fdb4a4 to 0f37f981
19:22 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1003']
19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P46347 and previous config saved to /var/cache/conftool/dbconfig/20230411-192122-ladsgroup.json
19:19 eileen: civicrm upgraded from b573aee4 to e2fdb4a4
19:16 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
19:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1002']
19:10 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1002']
19:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P46346 and previous config saved to /var/cache/conftool/dbconfig/20230411-190616-ladsgroup.json
19:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
19:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:59 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@d2cd28d]: (no justification provided) (duration: 00m 11s)
18:59 ebysans@deploy2002: Started deploy [airflow-dags/analytics@d2cd28d]: (no justification provided)
18:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1001']
18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46345 and previous config saved to /var/cache/conftool/dbconfig/20230411-185110-ladsgroup.json
18:50 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1001']
18:38 demon@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.4 refs T330210
18:32 zabe@deploy2002: Finished scap: close wowikiquote (T334482) (duration: 06m 46s)
18:25 zabe@deploy2002: Started scap: close wowikiquote (T334482)
18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46344 and previous config saved to /var/cache/conftool/dbconfig/20230411-182024-ladsgroup.json
18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
18:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
18:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46343 and previous config saved to /var/cache/conftool/dbconfig/20230411-181123-ladsgroup.json
17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3006.esams.wmnet with OS bullseye
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P46342 and previous config saved to /var/cache/conftool/dbconfig/20230411-175617-ladsgroup.json
17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3006.esams.wmnet with reason: host reimage
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P46341 and previous config saved to /var/cache/conftool/dbconfig/20230411-174110-ladsgroup.json
17:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3006.esams.wmnet with reason: host reimage
17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46340 and previous config saved to /var/cache/conftool/dbconfig/20230411-172604-ladsgroup.json
17:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3006.esams.wmnet with OS bullseye
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46339 and previous config saved to /var/cache/conftool/dbconfig/20230411-171600-ladsgroup.json
17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46338 and previous config saved to /var/cache/conftool/dbconfig/20230411-171537-ladsgroup.json
17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P46337 and previous config saved to /var/cache/conftool/dbconfig/20230411-170031-ladsgroup.json
16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P46336 and previous config saved to /var/cache/conftool/dbconfig/20230411-164524-ladsgroup.json
16:33 sbassett: Deployed security mitigation update for T333140
16:33 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46335 and previous config saved to /var/cache/conftool/dbconfig/20230411-163018-ladsgroup.json
16:30 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:27 mforns@deploy2002: Finished deploy [airflow-dags/analytics@ce3d4d6]: (no justification provided) (duration: 00m 11s)
16:27 mforns@deploy2002: Started deploy [airflow-dags/analytics@ce3d4d6]: (no justification provided)
16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46334 and previous config saved to /var/cache/conftool/dbconfig/20230411-162020-ladsgroup.json
16:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46333 and previous config saved to /var/cache/conftool/dbconfig/20230411-161956-ladsgroup.json
16:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:19 brett: Disable Puppet/PyBal on lvs3006 in preparation for reimaging - T321309
16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:12 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:11 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:09 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:08 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod
16:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod
16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS buster
16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P46332 and previous config saved to /var/cache/conftool/dbconfig/20230411-160450-ladsgroup.json
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P46331 and previous config saved to /var/cache/conftool/dbconfig/20230411-154943-ladsgroup.json
15:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46330 and previous config saved to /var/cache/conftool/dbconfig/20230411-153437-ladsgroup.json
15:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
15:33 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:32 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46329 and previous config saved to /var/cache/conftool/dbconfig/20230411-152438-ladsgroup.json
15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
15:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46328 and previous config saved to /var/cache/conftool/dbconfig/20230411-152413-ladsgroup.json
15:21 moritzm: installing xen security updates
15:13 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc] (hadoop-test): Update pageview hourly table with referer data field TEST [analytics/refinery@f3389dc] (duration: 01m 28s)
15:11 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc] (hadoop-test): Update pageview hourly table with referer data field TEST [analytics/refinery@f3389dc]
15:10 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc] (thin): Update pageview hourly table with referer data field THIN [analytics/refinery@f3389dc] (duration: 00m 08s)
15:10 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc] (thin): Update pageview hourly table with referer data field THIN [analytics/refinery@f3389dc]
15:09 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc]: Update pageview hourly table with referer data field [analytics/refinery@f3389dc] (duration: 05m 34s)
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P46327 and previous config saved to /var/cache/conftool/dbconfig/20230411-150907-ladsgroup.json
15:03 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc]: Update pageview hourly table with referer data field [analytics/refinery@f3389dc]
15:01 SandraEbele: deploying analytics refinery to update hive pageview hourly table with referer_data field.
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P46326 and previous config saved to /var/cache/conftool/dbconfig/20230411-145401-ladsgroup.json
14:53 SandraEbele: paused pageview hourly job.
14:51 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1005 ipv6 - herron@cumin1001"
14:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
14:47 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1005 ipv6 - herron@cumin1001"
14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
14:42 moritzm: installing Tomcat security updates
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46325 and previous config saved to /var/cache/conftool/dbconfig/20230411-143854-ladsgroup.json
14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
14:29 jnuche@deploy2002: Installing scap version "4.49.0" for 590 hosts
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46324 and previous config saved to /var/cache/conftool/dbconfig/20230411-142857-ladsgroup.json
14:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
14:27 jnuche@deploy2002: Installing scap version "4.49.0" for 590 hosts
14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46323 and previous config saved to /var/cache/conftool/dbconfig/20230411-141944-ladsgroup.json
14:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: HW failure
14:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: HW failure
14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P46321 and previous config saved to /var/cache/conftool/dbconfig/20230411-140438-ladsgroup.json
14:00 claime: Revoking kafka_main-codfw_broker and kafka_main-eqiad_broker puppet CA certs - T319372
13:55 elukey: remove old puppet certificates for kafka main brokers from A:kafka-main - T319372
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P46320 and previous config saved to /var/cache/conftool/dbconfig/20230411-134932-ladsgroup.json
13:46 elukey: powercycle analytics1069, down for some days now, host stuck from the mgmt/serial console
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46319 and previous config saved to /var/cache/conftool/dbconfig/20230411-133425-ladsgroup.json
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46318 and previous config saved to /var/cache/conftool/dbconfig/20230411-132348-ladsgroup.json
13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46317 and previous config saved to /var/cache/conftool/dbconfig/20230411-132324-ladsgroup.json
13:21 taavi@deploy2002: Finished scap: Backport for [[gerrit:907852|Deploy Nearby feature on most wikis [2/2] (T334079)]] (duration: 08m 25s)
13:14 taavi@deploy2002: wmde-fisch and taavi: Backport for [[gerrit:907852|Deploy Nearby feature on most wikis [2/2] (T334079)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:13 taavi@deploy2002: Started scap: Backport for [[gerrit:907852|Deploy Nearby feature on most wikis [2/2] (T334079)]]
13:11 taavi@deploy2002: Finished scap: Backport for [[gerrit:907851|Deploy Nearby feature on most wikis [1/2] (T334079)]] (duration: 07m 24s)
13:09 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P46316 and previous config saved to /var/cache/conftool/dbconfig/20230411-130817-ladsgroup.json
13:05 taavi@deploy2002: taavi and wmde-fisch: Backport for [[gerrit:907851|Deploy Nearby feature on most wikis [1/2] (T334079)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:04 taavi@deploy2002: Started scap: Backport for [[gerrit:907851|Deploy Nearby feature on most wikis [1/2] (T334079)]]
12:54 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P46315 and previous config saved to /var/cache/conftool/dbconfig/20230411-125310-ladsgroup.json
12:50 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
12:38 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46314 and previous config saved to /var/cache/conftool/dbconfig/20230411-123803-ladsgroup.json
12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46313 and previous config saved to /var/cache/conftool/dbconfig/20230411-122735-ladsgroup.json
12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
12:24 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2448.*.codfw.wmnet
12:24 claime: Setting mw2448.codfw.wmnet to pooled=invalid - T334429
12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46312 and previous config saved to /var/cache/conftool/dbconfig/20230411-115137-root.json
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46311 and previous config saved to /var/cache/conftool/dbconfig/20230411-113631-root.json
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46310 and previous config saved to /var/cache/conftool/dbconfig/20230411-112126-root.json
11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46309 and previous config saved to /var/cache/conftool/dbconfig/20230411-111854-root.json
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46308 and previous config saved to /var/cache/conftool/dbconfig/20230411-110621-root.json
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46307 and previous config saved to /var/cache/conftool/dbconfig/20230411-110349-root.json
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46306 and previous config saved to /var/cache/conftool/dbconfig/20230411-105116-root.json
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T334447', diff saved to https://phabricator.wikimedia.org/P46305 and previous config saved to /var/cache/conftool/dbconfig/20230411-105100-marostegui.json
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46304 and previous config saved to /var/cache/conftool/dbconfig/20230411-104844-root.json
10:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46303 and previous config saved to /var/cache/conftool/dbconfig/20230411-103611-root.json
10:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46302 and previous config saved to /var/cache/conftool/dbconfig/20230411-103339-root.json
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46301 and previous config saved to /var/cache/conftool/dbconfig/20230411-102106-root.json
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46300 and previous config saved to /var/cache/conftool/dbconfig/20230411-101835-root.json
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46298 and previous config saved to /var/cache/conftool/dbconfig/20230411-100330-root.json
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46297 and previous config saved to /var/cache/conftool/dbconfig/20230411-094825-root.json
09:44 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46296 and previous config saved to /var/cache/conftool/dbconfig/20230411-093320-root.json
09:27 Amir1: start of watchlist clean up of a user in wikidatawiki (T328501)
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46295 and previous config saved to /var/cache/conftool/dbconfig/20230411-092224-root.json
09:20 moritzm: installing nodejs security updates on buster
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46294 and previous config saved to /var/cache/conftool/dbconfig/20230411-091815-root.json
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46293 and previous config saved to /var/cache/conftool/dbconfig/20230411-090720-root.json
09:04 moritzm: installing pcre2 security updates
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46292 and previous config saved to /var/cache/conftool/dbconfig/20230411-090310-root.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to clone db1222 T326669', diff saved to https://phabricator.wikimedia.org/P46290 and previous config saved to /var/cache/conftool/dbconfig/20230411-085654-marostegui.json
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46289 and previous config saved to /var/cache/conftool/dbconfig/20230411-085215-root.json
08:50 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46288 and previous config saved to /var/cache/conftool/dbconfig/20230411-083710-root.json
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46287 and previous config saved to /var/cache/conftool/dbconfig/20230411-083339-root.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46286 and previous config saved to /var/cache/conftool/dbconfig/20230411-083106-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46285 and previous config saved to /var/cache/conftool/dbconfig/20230411-082521-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46284 and previous config saved to /var/cache/conftool/dbconfig/20230411-082205-root.json
08:19 aqu@deploy2002: Finished deploy [analytics/refinery@bed78f6] (hadoop-test): Deploy analytics_refinery including last webrequest load scripts in TEST 3nd try [analytics/refinery@bed78f6] (duration: 01m 25s)
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46283 and previous config saved to /var/cache/conftool/dbconfig/20230411-081834-root.json
08:18 aqu@deploy2002: Started deploy [analytics/refinery@bed78f6] (hadoop-test): Deploy analytics_refinery including last webrequest load scripts in TEST 3nd try [analytics/refinery@bed78f6]
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46282 and previous config saved to /var/cache/conftool/dbconfig/20230411-081601-root.json
08:15 aqu: About to deploy analytics/refinery (To migrate webrequest load from Oozie to Airflow)
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46281 and previous config saved to /var/cache/conftool/dbconfig/20230411-081016-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46280 and previous config saved to /var/cache/conftool/dbconfig/20230411-080700-root.json
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46279 and previous config saved to /var/cache/conftool/dbconfig/20230411-080329-root.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46278 and previous config saved to /var/cache/conftool/dbconfig/20230411-080057-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46277 and previous config saved to /var/cache/conftool/dbconfig/20230411-075511-root.json
07:54 vgutierrez: restart haproxy on cp2033 - T334448
07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46276 and previous config saved to /var/cache/conftool/dbconfig/20230411-075155-root.json
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46275 and previous config saved to /var/cache/conftool/dbconfig/20230411-074824-root.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46274 and previous config saved to /var/cache/conftool/dbconfig/20230411-074552-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46273 and previous config saved to /var/cache/conftool/dbconfig/20230411-074006-root.json
07:39 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1103.eqiad.wmnet
07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1103.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:37 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1103.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46272 and previous config saved to /var/cache/conftool/dbconfig/20230411-073651-root.json
07:35 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46271 and previous config saved to /var/cache/conftool/dbconfig/20230411-073319-root.json
07:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1103.eqiad.wmnet
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46270 and previous config saved to /var/cache/conftool/dbconfig/20230411-073047-root.json
07:30 dcausse: restarting blazegraph on wdqs1007 (stuck for 48hours)
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46269 and previous config saved to /var/cache/conftool/dbconfig/20230411-072501-root.json
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46268 and previous config saved to /var/cache/conftool/dbconfig/20230411-072146-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46267 and previous config saved to /var/cache/conftool/dbconfig/20230411-071815-root.json
07:18 zabe@deploy2002: Finished scap: Backport for gerrit:906793Add blkwiki to wgSitename (T334351) (duration: 08m 08s)
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1103 from dbctl T332293', diff saved to https://phabricator.wikimedia.org/P46266 and previous config saved to /var/cache/conftool/dbconfig/20230411-071647-marostegui.json
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46265 and previous config saved to /var/cache/conftool/dbconfig/20230411-071542-root.json
07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393731
07:11 zabe@deploy2002: zabe and jhsoby: Backport for gerrit:906793Add blkwiki to wgSitename (T334351) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
07:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393731
07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150279
07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 150279
07:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35467
07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35467
07:10 zabe@deploy2002: Started scap: Backport for gerrit:906793Add blkwiki to wgSitename (T334351)
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46264 and previous config saved to /var/cache/conftool/dbconfig/20230411-070956-root.json
07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46263 and previous config saved to /var/cache/conftool/dbconfig/20230411-070641-root.json
07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1211 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46262 and previous config saved to /var/cache/conftool/dbconfig/20230411-070609-marostegui.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46261 and previous config saved to /var/cache/conftool/dbconfig/20230411-070310-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46260 and previous config saved to /var/cache/conftool/dbconfig/20230411-070037-root.json
06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 T334375', diff saved to https://phabricator.wikimedia.org/P46258 and previous config saved to /var/cache/conftool/dbconfig/20230411-065734-marostegui.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 primary T334375', diff saved to https://phabricator.wikimedia.org/P46257 and previous config saved to /var/cache/conftool/dbconfig/20230411-065639-root.json
06:56 marostegui: Starting s1 eqiad failover from db1118 to db1163 - T334375
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46256 and previous config saved to /var/cache/conftool/dbconfig/20230411-065452-root.json
06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46255 and previous config saved to /var/cache/conftool/dbconfig/20230411-064805-root.json
06:43 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46254 and previous config saved to /var/cache/conftool/dbconfig/20230411-063947-root.json
06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46252 and previous config saved to /var/cache/conftool/dbconfig/20230411-063300-root.json
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46251 and previous config saved to /var/cache/conftool/dbconfig/20230411-062442-root.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 T334375', diff saved to https://phabricator.wikimedia.org/P46250 and previous config saved to /var/cache/conftool/dbconfig/20230411-062127-root.json
06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T334375
06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T334375
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46249 and previous config saved to /var/cache/conftool/dbconfig/20230411-061755-root.json
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1209 to dbctl T326206', diff saved to https://phabricator.wikimedia.org/P46248 and previous config saved to /var/cache/conftool/dbconfig/20230411-061642-marostegui.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 to clone db1210 T326669', diff saved to https://phabricator.wikimedia.org/P46246 and previous config saved to /var/cache/conftool/dbconfig/20230411-061044-marostegui.json
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46245 and previous config saved to /var/cache/conftool/dbconfig/20230411-060937-root.json
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1224 to dbctl T326206', diff saved to https://phabricator.wikimedia.org/P46244 and previous config saved to /var/cache/conftool/dbconfig/20230411-060922-marostegui.json
05:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swakiyama out of all services on: 814 hosts
05:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Swakiyama out of all services on: 814 hosts
05:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swakiyama out of all services on: 1241 hosts
05:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Swakiyama out of all services on: 1241 hosts
04:10 eileen: civicrm upgraded from bc2f5ccc to b573aee4
03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.2 (duration: 02m 15s)
03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.4 refs T330210 (duration: 49m 57s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.4 refs T330210
00:37 eileen: civicrm upgraded from 001e156a to bc2f5ccc
00:13 eileen: civicrm upgraded from 223f655a to 001e156a

2023-04-10

23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb1002.eqiad.wmnet
23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
23:06 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
23:00 dzahn@cumin1001: START - Cookbook sre.dns.netbox
22:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts miscweb1002.eqiad.wmnet
22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on miscweb1002.eqiad.wmnet with reason: decom
22:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on miscweb1002.eqiad.wmnet with reason: decom
21:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3005.esams.wmnet with OS bullseye
21:46 urandom: restarting Cassandra, sessionstore1001-a, to restore native transport settings — T327954
21:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
21:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
21:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
21:31 urandom: restarting Cassandra, sessionstore1002-a — T327954
21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
21:21 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
21:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3005.esams.wmnet with OS bullseye
21:14 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs3005.esams.wmnet with OS bullseye
21:13 sbassett: Deployed updated security mitigation for T333140
21:10 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
21:08 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
21:06 urandom: restarting Cassandra, sessionstore1003-a — T327954
21:04 urandom: restarting Cassandra, sessionstore1002-a — T327954
20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
20:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
20:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3005.esams.wmnet with OS bullseye
20:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
20:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
19:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1003']
19:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
19:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1002']
19:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1002']
19:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1001']
19:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1001']
19:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
19:35 brett: Disable Puppet/PyBal on lvs3005 in preparation for reimaging - T321309
19:25 mutante: mw2488 - scap pull - T334429
19:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs6002.drmrs.wmnet
19:22 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs6002.drmrs.wmnet
19:19 mforns@deploy2002: Finished deploy [airflow-dags/analytics@6d6f1ec]: (no justification provided) (duration: 00m 11s)
19:19 mforns@deploy2002: Started deploy [airflow-dags/analytics@6d6f1ec]: (no justification provided)
19:16 mutante: power-cycling mw2448 - down, no console output T334429
19:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS bullseye
18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
18:34 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:34 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1004 ipv6 - herron@cumin1001"
18:33 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1004 ipv6 - herron@cumin1001"
18:31 herron@cumin1001: START - Cookbook sre.dns.netbox
18:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS bullseye
18:16 krinkle@deploy2002: Synchronized wmf-config/: (no justification provided) (duration: 587m 34s)
17:29 brett: Disable Puppet/PyBal on lvs6002 in preparation for reimaging - T321309
16:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS bullseye
16:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
16:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bullseye
15:53 herron: centrallog1002:~# systemctl restart rsyslog
15:46 brett: Disable Puppet/PyBal on lvs6001 in preparation for reimaging - T321309
14:57 sukhe: enable puppet on A:lvs and A:ulsfo to merge 906580
14:52 sukhe: disable puppet on A:lvs and A:ulsfo to merge 906580
14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46242 and previous config saved to /var/cache/conftool/dbconfig/20230410-141052-root.json
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46241 and previous config saved to /var/cache/conftool/dbconfig/20230410-135547-root.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46240 and previous config saved to /var/cache/conftool/dbconfig/20230410-134042-root.json
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46239 and previous config saved to /var/cache/conftool/dbconfig/20230410-132538-root.json
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46238 and previous config saved to /var/cache/conftool/dbconfig/20230410-131033-root.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46237 and previous config saved to /var/cache/conftool/dbconfig/20230410-125528-root.json
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46236 and previous config saved to /var/cache/conftool/dbconfig/20230410-124023-root.json
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46235 and previous config saved to /var/cache/conftool/dbconfig/20230410-122112-root.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46234 and previous config saved to /var/cache/conftool/dbconfig/20230410-120607-root.json
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46233 and previous config saved to /var/cache/conftool/dbconfig/20230410-115102-root.json
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46232 and previous config saved to /var/cache/conftool/dbconfig/20230410-114733-root.json
11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46231 and previous config saved to /var/cache/conftool/dbconfig/20230410-113557-root.json
11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46230 and previous config saved to /var/cache/conftool/dbconfig/20230410-113228-root.json
11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201 to clone db1224 T326669', diff saved to https://phabricator.wikimedia.org/P46228 and previous config saved to /var/cache/conftool/dbconfig/20230410-112524-marostegui.json
11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46227 and previous config saved to /var/cache/conftool/dbconfig/20230410-112052-root.json
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46226 and previous config saved to /var/cache/conftool/dbconfig/20230410-111723-root.json
11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46225 and previous config saved to /var/cache/conftool/dbconfig/20230410-110548-root.json
11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46224 and previous config saved to /var/cache/conftool/dbconfig/20230410-110218-root.json
10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 4%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46222 and previous config saved to /var/cache/conftool/dbconfig/20230410-105043-root.json
10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46221 and previous config saved to /var/cache/conftool/dbconfig/20230410-104714-root.json
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 3%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46220 and previous config saved to /var/cache/conftool/dbconfig/20230410-103538-root.json
10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46219 and previous config saved to /var/cache/conftool/dbconfig/20230410-103209-root.json
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 2%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46218 and previous config saved to /var/cache/conftool/dbconfig/20230410-102033-root.json
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46217 and previous config saved to /var/cache/conftool/dbconfig/20230410-101704-root.json
10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46216 and previous config saved to /var/cache/conftool/dbconfig/20230410-100528-root.json
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46215 and previous config saved to /var/cache/conftool/dbconfig/20230410-100159-root.json
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1183 to s5 depooled T334080', diff saved to https://phabricator.wikimedia.org/P46214 and previous config saved to /var/cache/conftool/dbconfig/20230410-095846-marostegui.json
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46213 and previous config saved to /var/cache/conftool/dbconfig/20230410-095530-root.json
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46212 and previous config saved to /var/cache/conftool/dbconfig/20230410-094654-root.json
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46211 and previous config saved to /var/cache/conftool/dbconfig/20230410-094025-root.json
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46210 and previous config saved to /var/cache/conftool/dbconfig/20230410-093149-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46209 and previous config saved to /var/cache/conftool/dbconfig/20230410-092520-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46207 and previous config saved to /var/cache/conftool/dbconfig/20230410-091015-root.json
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46206 and previous config saved to /var/cache/conftool/dbconfig/20230410-090141-root.json
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46205 and previous config saved to /var/cache/conftool/dbconfig/20230410-085511-root.json
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46204 and previous config saved to /var/cache/conftool/dbconfig/20230410-085117-root.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46203 and previous config saved to /var/cache/conftool/dbconfig/20230410-084636-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46202 and previous config saved to /var/cache/conftool/dbconfig/20230410-084006-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46201 and previous config saved to /var/cache/conftool/dbconfig/20230410-083613-root.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46200 and previous config saved to /var/cache/conftool/dbconfig/20230410-083131-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46199 and previous config saved to /var/cache/conftool/dbconfig/20230410-082501-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46198 and previous config saved to /var/cache/conftool/dbconfig/20230410-082108-root.json
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46197 and previous config saved to /var/cache/conftool/dbconfig/20230410-081626-root.json
08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46196 and previous config saved to /var/cache/conftool/dbconfig/20230410-080956-root.json
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46195 and previous config saved to /var/cache/conftool/dbconfig/20230410-080603-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46194 and previous config saved to /var/cache/conftool/dbconfig/20230410-080121-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46193 and previous config saved to /var/cache/conftool/dbconfig/20230410-080115-root.json
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46192 and previous config saved to /var/cache/conftool/dbconfig/20230410-075451-root.json
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46191 and previous config saved to /var/cache/conftool/dbconfig/20230410-075058-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46190 and previous config saved to /var/cache/conftool/dbconfig/20230410-074617-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46189 and previous config saved to /var/cache/conftool/dbconfig/20230410-074610-root.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46188 and previous config saved to /var/cache/conftool/dbconfig/20230410-073947-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46187 and previous config saved to /var/cache/conftool/dbconfig/20230410-073553-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46186 and previous config saved to /var/cache/conftool/dbconfig/20230410-073112-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46185 and previous config saved to /var/cache/conftool/dbconfig/20230410-073105-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163', diff saved to https://phabricator.wikimedia.org/P46184 and previous config saved to /var/cache/conftool/dbconfig/20230410-072206-marostegui.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46183 and previous config saved to /var/cache/conftool/dbconfig/20230410-072048-root.json
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T326669', diff saved to https://phabricator.wikimedia.org/P46181 and previous config saved to /var/cache/conftool/dbconfig/20230410-071747-marostegui.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46180 and previous config saved to /var/cache/conftool/dbconfig/20230410-071600-root.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46179 and previous config saved to /var/cache/conftool/dbconfig/20230410-070948-root.json
07:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1101.eqiad.wmnet
07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1101.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46178 and previous config saved to /var/cache/conftool/dbconfig/20230410-070544-root.json
07:05 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1101.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:03 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46177 and previous config saved to /var/cache/conftool/dbconfig/20230410-070056-root.json
06:58 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1101.eqiad.wmnet
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46176 and previous config saved to /var/cache/conftool/dbconfig/20230410-065443-root.json
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103 T334374', diff saved to https://phabricator.wikimedia.org/P46175 and previous config saved to /var/cache/conftool/dbconfig/20230410-065149-marostegui.json
06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1179 to x1 primary T334374', diff saved to https://phabricator.wikimedia.org/P46174 and previous config saved to /var/cache/conftool/dbconfig/20230410-065047-root.json
06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46173 and previous config saved to /var/cache/conftool/dbconfig/20230410-065039-root.json
06:50 marostegui: Starting x1 eqiad failover from db1103 to db1179 - T334374
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46172 and previous config saved to /var/cache/conftool/dbconfig/20230410-064551-root.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46171 and previous config saved to /var/cache/conftool/dbconfig/20230410-063939-root.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1179 with weight 0 T334374', diff saved to https://phabricator.wikimedia.org/P46170 and previous config saved to /var/cache/conftool/dbconfig/20230410-063916-root.json
06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: Primary switchover x1 T334374
06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: Primary switchover x1 T334374
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46169 and previous config saved to /var/cache/conftool/dbconfig/20230410-063534-root.json
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1220 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46168 and previous config saved to /var/cache/conftool/dbconfig/20230410-063458-marostegui.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46167 and previous config saved to /var/cache/conftool/dbconfig/20230410-063046-root.json
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46166 and previous config saved to /var/cache/conftool/dbconfig/20230410-062434-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46165 and previous config saved to /var/cache/conftool/dbconfig/20230410-061541-root.json
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46164 and previous config saved to /var/cache/conftool/dbconfig/20230410-060929-root.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46163 and previous config saved to /var/cache/conftool/dbconfig/20230410-060037-root.json
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46162 and previous config saved to /var/cache/conftool/dbconfig/20230410-055424-root.json
05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 T334080', diff saved to https://phabricator.wikimedia.org/P46160 and previous config saved to /var/cache/conftool/dbconfig/20230410-055005-marostegui.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46159 and previous config saved to /var/cache/conftool/dbconfig/20230410-054532-root.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1207 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46158 and previous config saved to /var/cache/conftool/dbconfig/20230410-054504-marostegui.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46157 and previous config saved to /var/cache/conftool/dbconfig/20230410-053919-root.json

2023-04-08

17:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073']

2023-04-07

18:19 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c4ebda]: (no justification provided) (duration: 00m 35s)
18:18 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c4ebda]: (no justification provided)
17:02 urandom: restart Cassandra, sessionstore1001-a (re-enabling CQL) — T327954
11:05 aqu@deploy2002: Finished deploy [analytics/refinery@e70da10] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST 2nd try [analytics/refinery@e70da10] (duration: 01m 33s)
11:03 aqu@deploy2002: Started deploy [analytics/refinery@e70da10] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST 2nd try [analytics/refinery@e70da10]
10:40 aqu@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST [analytics/refinery@eb4c2b2] (duration: 00m 06s)
10:40 aqu@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST [analytics/refinery@eb4c2b2]
10:34 aqu: About to deploy analytics/refinery in test cluster
09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sonicmgmt - ayounsi@cumin1001"
09:22 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sonicmgmt - ayounsi@cumin1001"
09:20 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
01:17 urandom: rebooting sessionstore1001 — T327954
01:10 urandom: rebooting sessionstore1001 — T327954
01:02 urandom: rebooting sessionstore1001 — T327954
00:39 urandom: rebooting sessionstore1001 — T327954

2023-04-06

22:05 ejegg: SmashPig upgraded from 7c19151f to 24d700f4
22:04 ejegg: payments-wiki upgraded from 75b068a1 to 0f15a101
21:52 sbassett: Deployed updated mitigation for T333140
21:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46154 and previous config saved to /var/cache/conftool/dbconfig/20230406-211054-ladsgroup.json
21:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
21:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P46153 and previous config saved to /var/cache/conftool/dbconfig/20230406-205548-ladsgroup.json
20:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:50 eevans@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe1014.eqiad.wmnet
20:49 eevans@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe1013.eqiad.wmnet
20:49 eevans@cumin1001: conftool action : set/weight=40; selector: name=ms-fe1014.eqiad.wmnet
20:49 eevans@cumin1001: conftool action : set/weight=40; selector: name=ms-fe1013.eqiad.wmnet
20:45 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
20:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove info for new ssw as need to set back to planned to make homer happy - cmooney@cumin1001 - T322937"
20:43 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove info for new ssw as need to set back to planned to make homer happy - cmooney@cumin1001 - T322937"
20:41 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P46152 and previous config saved to /var/cache/conftool/dbconfig/20230406-204041-ladsgroup.json
20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46151 and previous config saved to /var/cache/conftool/dbconfig/20230406-202535-ladsgroup.json
20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46150 and previous config saved to /var/cache/conftool/dbconfig/20230406-202319-ladsgroup.json
20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46149 and previous config saved to /var/cache/conftool/dbconfig/20230406-202256-ladsgroup.json
20:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1014.eqiad.wmnet
20:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1013.eqiad.wmnet
20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1014.eqiad.wmnet
20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1013.eqiad.wmnet
20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P46148 and previous config saved to /var/cache/conftool/dbconfig/20230406-200750-ladsgroup.json
19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P46147 and previous config saved to /var/cache/conftool/dbconfig/20230406-195243-ladsgroup.json
19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46146 and previous config saved to /var/cache/conftool/dbconfig/20230406-193737-ladsgroup.json
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46145 and previous config saved to /var/cache/conftool/dbconfig/20230406-193510-ladsgroup.json
19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46144 and previous config saved to /var/cache/conftool/dbconfig/20230406-193447-ladsgroup.json
19:26 mforns@deploy2002: Finished deploy [airflow-dags/analytics@b454afd]: (no justification provided) (duration: 00m 11s)
19:26 mforns@deploy2002: Started deploy [airflow-dags/analytics@b454afd]: (no justification provided)
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P46143 and previous config saved to /var/cache/conftool/dbconfig/20230406-191941-ladsgroup.json
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P46142 and previous config saved to /var/cache/conftool/dbconfig/20230406-190435-ladsgroup.json
18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46141 and previous config saved to /var/cache/conftool/dbconfig/20230406-184929-ladsgroup.json
18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46140 and previous config saved to /var/cache/conftool/dbconfig/20230406-184701-ladsgroup.json
18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46139 and previous config saved to /var/cache/conftool/dbconfig/20230406-184638-ladsgroup.json
18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P46138 and previous config saved to /var/cache/conftool/dbconfig/20230406-183132-ladsgroup.json
18:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3007.esams.wmnet with OS bullseye
18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P46137 and previous config saved to /var/cache/conftool/dbconfig/20230406-181625-ladsgroup.json
18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3007.esams.wmnet with reason: host reimage
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46136 and previous config saved to /var/cache/conftool/dbconfig/20230406-180119-ladsgroup.json
17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46135 and previous config saved to /var/cache/conftool/dbconfig/20230406-175854-ladsgroup.json
17:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3007.esams.wmnet with reason: host reimage
17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46134 and previous config saved to /var/cache/conftool/dbconfig/20230406-175813-ladsgroup.json
17:49 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
17:49 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P46133 and previous config saved to /var/cache/conftool/dbconfig/20230406-174306-ladsgroup.json
17:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3007.esams.wmnet with OS bullseye
17:34 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
17:34 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
17:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-f1-eqiad down.
17:32 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-f1-eqiad down.
17:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
17:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P46132 and previous config saved to /var/cache/conftool/dbconfig/20230406-172800-ladsgroup.json
17:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3007.esams.wmnet
17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46131 and previous config saved to /var/cache/conftool/dbconfig/20230406-171254-ladsgroup.json
17:12 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3007.esams.wmnet
17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46130 and previous config saved to /var/cache/conftool/dbconfig/20230406-171028-ladsgroup.json
17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46129 and previous config saved to /var/cache/conftool/dbconfig/20230406-170928-ladsgroup.json
17:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@318480e]: Fix for dump_month_of_daily_pageviews dag - Analytics [airflow-dags@318480e] (duration: 00m 14s)
17:05 aqu@deploy2002: Started deploy [airflow-dags/analytics@318480e]: Fix for dump_month_of_daily_pageviews dag - Analytics [airflow-dags@318480e]
16:58 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P46128 and previous config saved to /var/cache/conftool/dbconfig/20230406-165422-ladsgroup.json
16:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs6003.drmrs.wmnet
16:41 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs6003.drmrs.wmnet
16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P46127 and previous config saved to /var/cache/conftool/dbconfig/20230406-163916-ladsgroup.json
16:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bullseye
16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46126 and previous config saved to /var/cache/conftool/dbconfig/20230406-162409-ladsgroup.json
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46125 and previous config saved to /var/cache/conftool/dbconfig/20230406-162144-ladsgroup.json
16:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46124 and previous config saved to /var/cache/conftool/dbconfig/20230406-162120-ladsgroup.json
16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
16:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P46123 and previous config saved to /var/cache/conftool/dbconfig/20230406-160614-ladsgroup.json
16:05 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
16:05 topranks: Enable BGP EVPN sessions between eqiad row e/f Leaf and Spine devices
15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P46122 and previous config saved to /var/cache/conftool/dbconfig/20230406-155108-ladsgroup.json
15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bullseye
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46121 and previous config saved to /var/cache/conftool/dbconfig/20230406-153602-ladsgroup.json
15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46120 and previous config saved to /var/cache/conftool/dbconfig/20230406-153335-ladsgroup.json
15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46119 and previous config saved to /var/cache/conftool/dbconfig/20230406-153312-ladsgroup.json
15:28 ladsgroup@deploy2002: Finished scap: Backport for gerrit:906600Disable writes on group2 for DT backend (duration: 08m 11s)
15:21 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:906600Disable writes on group2 for DT backend synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
15:20 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 11s)
15:20 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
15:20 ladsgroup@deploy2002: Started scap: Backport for gerrit:906600Disable writes on group2 for DT backend
15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P46118 and previous config saved to /var/cache/conftool/dbconfig/20230406-151806-ladsgroup.json
15:18 jgiannelos@deploy2002: Finished deploy [restbase/deploy@8fb20e9]: (no justification provided) (duration: 21m 01s)
15:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P46117 and previous config saved to /var/cache/conftool/dbconfig/20230406-150300-ladsgroup.json
14:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
14:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs6003.drmrs.wmnet with OS bullseye
14:57 jgiannelos@deploy2002: Started deploy [restbase/deploy@8fb20e9]: (no justification provided)
14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46116 and previous config saved to /var/cache/conftool/dbconfig/20230406-144753-ladsgroup.json
14:46 ladsgroup@deploy2002: Finished scap: Backport for gerrit:906593Disable DT backend on enwiki (duration: 07m 14s)
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46115 and previous config saved to /var/cache/conftool/dbconfig/20230406-144437-ladsgroup.json
14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
14:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46114 and previous config saved to /var/cache/conftool/dbconfig/20230406-144332-ladsgroup.json
14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new ssw1 spine switches in eqiad. - cmooney@cumin1001 - T322937"
14:40 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:906593Disable DT backend on enwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new ssw1 spine switches in eqiad. - cmooney@cumin1001 - T322937"
14:39 ladsgroup@deploy2002: Started scap: Backport for gerrit:906593Disable DT backend on enwiki
14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
14:37 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
14:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
14:33 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
14:30 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P46113 and previous config saved to /var/cache/conftool/dbconfig/20230406-142826-ladsgroup.json
14:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:21 elukey: upgrade istioctl on deploy[12]002 and istio-cni on ml-serve[12]00[1-8] manually - T334068
14:14 elukey: upload new istio-cni and istioctl 1.15.7 debian package versions to bullseye-wikimedia - T334068
14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P46112 and previous config saved to /var/cache/conftool/dbconfig/20230406-141319-ladsgroup.json
14:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
14:10 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905555Add session schema config for mobile apps (T331481) (duration: 07m 54s)
14:08 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 11s)
14:08 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
14:03 lucaswerkmeister-wmde@deploy2002: mazevedo and lucaswerkmeister-wmde: Backport for gerrit:905555Add session schema config for mobile apps (T331481) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:02 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905555Add session schema config for mobile apps (T331481)
14:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs6003.drmrs.wmnet
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46111 and previous config saved to /var/cache/conftool/dbconfig/20230406-135813-ladsgroup.json
13:56 urandom: rebooting sessionstore1001 — T327954
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46110 and previous config saved to /var/cache/conftool/dbconfig/20230406-135604-ladsgroup.json
13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46109 and previous config saved to /var/cache/conftool/dbconfig/20230406-135541-ladsgroup.json
13:51 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs6003.drmrs.wmnet
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P46108 and previous config saved to /var/cache/conftool/dbconfig/20230406-134035-ladsgroup.json
13:40 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
13:34 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P46106 and previous config saved to /var/cache/conftool/dbconfig/20230406-132528-ladsgroup.json
13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46104 and previous config saved to /var/cache/conftool/dbconfig/20230406-131022-ladsgroup.json
13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46103 and previous config saved to /var/cache/conftool/dbconfig/20230406-130812-ladsgroup.json
13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46102 and previous config saved to /var/cache/conftool/dbconfig/20230406-130749-ladsgroup.json
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P46101 and previous config saved to /var/cache/conftool/dbconfig/20230406-125242-ladsgroup.json
12:50 godog: import grafana 9.4 T317887
12:41 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P46100 and previous config saved to /var/cache/conftool/dbconfig/20230406-123735-ladsgroup.json
12:26 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46099 and previous config saved to /var/cache/conftool/dbconfig/20230406-122229-ladsgroup.json
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46098 and previous config saved to /var/cache/conftool/dbconfig/20230406-122018-ladsgroup.json
12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46097 and previous config saved to /var/cache/conftool/dbconfig/20230406-121955-ladsgroup.json
12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P46096 and previous config saved to /var/cache/conftool/dbconfig/20230406-120448-ladsgroup.json
11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P46095 and previous config saved to /var/cache/conftool/dbconfig/20230406-114942-ladsgroup.json
11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46094 and previous config saved to /var/cache/conftool/dbconfig/20230406-113436-ladsgroup.json
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46093 and previous config saved to /var/cache/conftool/dbconfig/20230406-113226-ladsgroup.json
11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46092 and previous config saved to /var/cache/conftool/dbconfig/20230406-113203-ladsgroup.json
11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P46091 and previous config saved to /var/cache/conftool/dbconfig/20230406-111657-ladsgroup.json
11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P46090 and previous config saved to /var/cache/conftool/dbconfig/20230406-110151-ladsgroup.json
10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46089 and previous config saved to /var/cache/conftool/dbconfig/20230406-104644-ladsgroup.json
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46088 and previous config saved to /var/cache/conftool/dbconfig/20230406-104435-ladsgroup.json
10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46087 and previous config saved to /var/cache/conftool/dbconfig/20230406-104319-ladsgroup.json
10:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
10:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
10:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1003.mgmt.eqiad.wmnet with reboot policy FORCED
10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P46086 and previous config saved to /var/cache/conftool/dbconfig/20230406-102813-ladsgroup.json
10:28 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:27 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:27 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:26 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1003.mgmt.eqiad.wmnet with reboot policy FORCED
10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P46085 and previous config saved to /var/cache/conftool/dbconfig/20230406-101306-ladsgroup.json
09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46084 and previous config saved to /var/cache/conftool/dbconfig/20230406-095800-ladsgroup.json
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46083 and previous config saved to /var/cache/conftool/dbconfig/20230406-095640-ladsgroup.json
09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
09:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
09:42 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
09:39 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:38 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
09:30 elukey: kafka main codfw cluster migrated to PKI TLS certs for brokers - T319372
09:22 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
09:19 cgoubert@deploy2002: Finished scap: Backport for gerrit:904463jobrunners: Raise memory_limit to match parsoid (T333528) (duration: 07m 11s)
09:13 cgoubert@deploy2002: cgoubert: Backport for gerrit:904463jobrunners: Raise memory_limit to match parsoid (T333528) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:12 cgoubert@deploy2002: Started scap: Backport for gerrit:904463jobrunners: Raise memory_limit to match parsoid (T333528)
08:40 elukey: powercycle ml-serve2004 - host frozen, racadm getsel shows multi-bit errors in various DIMM slots
08:28 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
08:09 hashar@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.3 refs T330209
08:08 volans: restarting update-ubuntu-mirror.service on mirror1001 o check if it was a transient erro
07:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
07:31 apergos: UTC morning backport and config training window done
07:28 moritzm: installing ghostscript security updates
07:19 kartik@deploy2002: Finished scap: Backport for gerrit:906137Enable Section Translation on Kashmiri Wikipedia (T326541) (duration: 09m 31s)
07:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Abuse filter maintainer" "Abuse filter maintainers" "Zabe" --reason "per request phab:T334147T334147"
07:11 kartik@deploy2002: kartik: Backport for gerrit:906137Enable Section Translation on Kashmiri Wikipedia (T326541) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:09 kartik@deploy2002: Started scap: Backport for gerrit:906137Enable Section Translation on Kashmiri Wikipedia (T326541)
02:07 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 21s)
02:06 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
00:50 urandom: rebooting sessionstore1001 — T327954
00:19 urandom: rebooting Cassandra on sessionstore1001 — T327954

2023-04-05

23:58 legoktm@deploy2002: Finished scap: Backport for gerrit:804805Remove misleading "disable" of Special:Mostlinkedcategories (T310456) (duration: 07m 55s)
23:55 urandom: rebooting Cassandra on sessionstore1001 — T327954
23:52 legoktm@deploy2002: legoktm: Backport for gerrit:804805Remove misleading "disable" of Special:Mostlinkedcategories (T310456) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
23:50 legoktm@deploy2002: Started scap: Backport for gerrit:804805Remove misleading "disable" of Special:Mostlinkedcategories (T310456)
23:44 legoktm@deploy2002: Finished scap: Backport for [[gerrit:896837|Add <link rel="me"> to verify Mastodon account on mediawiki.org]] (duration: 07m 47s)
23:38 legoktm@deploy2002: legoktm: Backport for [[gerrit:896837|Add <link rel="me"> to verify Mastodon account on mediawiki.org]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
23:36 legoktm@deploy2002: Started scap: Backport for [[gerrit:896837|Add <link rel="me"> to verify Mastodon account on mediawiki.org]]
22:36 topranks: enabling lsw1-e1-eqiad port et-0/0/51 to ssw1-e1-eqiad et-0/0/80 T322937
22:33 urandom: rebooting Cassandra on sessionstore1001 — T327954
22:21 urandom: restarting Cassandra on sessionstore1001 to apply (intentionally) unreachable native transport — T327954
22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS bullseye
21:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
21:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
21:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
21:28 cjming: end of UTC late backport window
21:23 cjming@deploy2002: Finished scap: Backport for [[gerrit:905960|[mgwiki] Replace the wordmark on Vector 2022 (T334022)]] (duration: 07m 58s)
21:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:16 cjming@deploy2002: superpes and cjming: Backport for [[gerrit:905960|[mgwiki] Replace the wordmark on Vector 2022 (T334022)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS bullseye
21:15 cjming@deploy2002: Started scap: Backport for [[gerrit:905960|[mgwiki] Replace the wordmark on Vector 2022 (T334022)]]
21:10 cjming@deploy2002: Finished scap: Backport for gerrit:905769Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681) (duration: 10m 06s)
21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
21:09 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
21:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:02 cjming@deploy2002: cjming and nray: Backport for gerrit:905769Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:01 cjming: UTC late backport & config window continuing
21:00 cjming@deploy2002: Started scap: Backport for gerrit:905769Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681)
20:58 cjming@deploy2002: Finished scap: Backport for gerrit:896936Undeploy SimilarEditors from Beta (T331718) (duration: 35m 41s)
20:57 brett: Disable Puppet/PyBal on lvs5005 in preparation for reimaging - T321309
20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5004.eqsin.wmnet with OS bullseye
20:44 cjming@deploy2002: tsepothoabala and cjming: Backport for gerrit:896936Undeploy SimilarEditors from Beta (T331718) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
20:22 cjming@deploy2002: Started scap: Backport for gerrit:896936Undeploy SimilarEditors from Beta (T331718)
20:21 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
20:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@2192f15]: (no justification provided) (duration: 00m 12s)
20:17 mforns@deploy2002: Started deploy [airflow-dags/analytics@2192f15]: (no justification provided)
20:03 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eb4c2b2] (duration: 01m 34s)
20:01 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eb4c2b2]
20:01 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (thin): Regular analytics weekly train THIN [analytics/refinery@eb4c2b2] (duration: 00m 08s)
20:01 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (thin): Regular analytics weekly train THIN [analytics/refinery@eb4c2b2]
20:01 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2]: Regular analytics weekly train [analytics/refinery@eb4c2b2] (duration: 06m 26s)
19:54 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2]: Regular analytics weekly train [analytics/refinery@eb4c2b2]
19:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS bullseye
19:30 brett: Disable Puppet/PyBal on lvs5004 in preparation for reimaging - T321309
19:27 mforns@deploy2002: Finished deploy [analytics/refinery@944a995] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@944a995] (duration: 01m 29s)
19:25 mforns@deploy2002: Started deploy [analytics/refinery@944a995] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@944a995]
19:25 mforns@deploy2002: Finished deploy [analytics/refinery@944a995] (thin): Regular analytics weekly train THIN [analytics/refinery@944a995] (duration: 00m 08s)
19:25 mforns@deploy2002: Started deploy [analytics/refinery@944a995] (thin): Regular analytics weekly train THIN [analytics/refinery@944a995]
19:25 mforns@deploy2002: Finished deploy [analytics/refinery@944a995]: Regular analytics weekly train [analytics/refinery@944a995] (duration: 06m 31s)
19:19 mforns@deploy2002: Started deploy [analytics/refinery@944a995]: Regular analytics weekly train [analytics/refinery@944a995]
19:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4009.ulsfo.wmnet with OS bullseye
18:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4009.ulsfo.wmnet with reason: host reimage
18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4009.ulsfo.wmnet with reason: host reimage
18:37 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4009.ulsfo.wmnet with OS bullseye
18:37 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs4009.ulsfo.wmnet with OS bullseye
17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4009.ulsfo.wmnet with OS bullseye
17:32 brett: Disable Puppet/PyBal on lvs4009 in preparation for reimaging - T321309
17:28 cjming: deploying labs-only change
17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bullseye
17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
16:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists1003.wikimedia.org with reason: Moar CPUs!
16:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lists1003.wikimedia.org with reason: Moar CPUs!
16:54 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
16:54 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
16:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: Depool from primary DC following network maintenance
16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
16:47 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs4008.ulsfo.wmnet with OS bullseye
16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
16:47 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
16:47 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: Depool from primary DC following network maintenance
16:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
16:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
16:36 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
16:30 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
16:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:20 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
16:18 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
16:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
16:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS bullseye
15:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
15:47 brett: Disable Puppet/PyBal on lvs4008 in preparation for reimaging - T321309
15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage
15:42 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
15:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage
15:39 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
15:31 moritzm: restarting FPM on mediawiki canaries to pick up pcre security update
15:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:27 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1010.eqiad.wmnet with OS bullseye
15:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:21 moritzm: installing pcre2 security updates on buster
15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=7; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
15:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905979Revert "VisualEditorFeatureUse sampling rate to 1 everywhere" (duration: 07m 42s)
15:14 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:10 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS bullseye
15:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for gerrit:905979Revert "VisualEditorFeatureUse sampling rate to 1 everywhere" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
15:09 moritzm: installing nodejs security updates on buster
15:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
15:08 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
15:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905979Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"
15:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:03 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:03 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage
14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage
14:48 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:48 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:48 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1009.eqiad.wmnet with OS bullseye
14:33 elukey: restart kafka on kafka-main1005 to pick up the new TLS certificate (PKI based) - T319372
14:31 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS bullseye
14:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI
14:30 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI
14:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage
14:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
14:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage
14:00 elukey: powercycle an-worker1132
13:58 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1008.eqiad.wmnet with OS bullseye
13:57 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
13:54 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
13:54 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
13:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
13:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
13:52 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
13:52 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
13:52 elukey: restart kafka on kafka-main1004 to pick up the new TLS certificate (PKI based) - T319372
13:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI
13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI
13:48 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905601VisualEditorFeatureUse sampling rate to 1 everywhere (T333168) (duration: 14m 47s)
13:33 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for gerrit:905601VisualEditorFeatureUse sampling rate to 1 everywhere (T333168) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:31 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905601VisualEditorFeatureUse sampling rate to 1 everywhere (T333168)
13:29 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905261mediawiki.edit_attempt: Ignore events from PHP MPC (T309985) (duration: 10m 52s)
13:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46079 and previous config saved to /var/cache/conftool/dbconfig/20230405-132318-root.json
13:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for gerrit:905261mediawiki.edit_attempt: Ignore events from PHP MPC (T309985) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905261mediawiki.edit_attempt: Ignore events from PHP MPC (T309985)
13:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905950GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134) (duration: 08m 00s)
13:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:15 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:14 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and sgimeno: Backport for gerrit:905950GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905950GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134)
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46078 and previous config saved to /var/cache/conftool/dbconfig/20230405-130813-root.json
13:03 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46077 and previous config saved to /var/cache/conftool/dbconfig/20230405-130315-root.json
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46076 and previous config saved to /var/cache/conftool/dbconfig/20230405-130121-root.json
12:58 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46075 and previous config saved to /var/cache/conftool/dbconfig/20230405-125308-root.json
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46074 and previous config saved to /var/cache/conftool/dbconfig/20230405-124810-root.json
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46073 and previous config saved to /var/cache/conftool/dbconfig/20230405-124616-root.json
12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46072 and previous config saved to /var/cache/conftool/dbconfig/20230405-123804-root.json
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46071 and previous config saved to /var/cache/conftool/dbconfig/20230405-123305-root.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46070 and previous config saved to /var/cache/conftool/dbconfig/20230405-123111-root.json
12:27 moritzm: installing xapian-core security updates
12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46069 and previous config saved to /var/cache/conftool/dbconfig/20230405-122259-root.json
12:20 samtar@deploy2002: Finished scap: Backport for gerrit:901553Remove WikiEditor's Realtime Preview config vars (T327515) (duration: 07m 41s)
12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46068 and previous config saved to /var/cache/conftool/dbconfig/20230405-121801-root.json
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46067 and previous config saved to /var/cache/conftool/dbconfig/20230405-121606-root.json
12:13 samtar@deploy2002: samwilson and samtar: Backport for gerrit:901553Remove WikiEditor's Realtime Preview config vars (T327515) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
12:12 samtar@deploy2002: Started scap: Backport for gerrit:901553Remove WikiEditor's Realtime Preview config vars (T327515)
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46066 and previous config saved to /var/cache/conftool/dbconfig/20230405-120754-root.json
12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46065 and previous config saved to /var/cache/conftool/dbconfig/20230405-120256-root.json
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46064 and previous config saved to /var/cache/conftool/dbconfig/20230405-120101-root.json
11:54 moritzm: installing apache2 security updates on buster
11:53 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46063 and previous config saved to /var/cache/conftool/dbconfig/20230405-115249-root.json
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46062 and previous config saved to /var/cache/conftool/dbconfig/20230405-114751-root.json
11:47 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46061 and previous config saved to /var/cache/conftool/dbconfig/20230405-114557-root.json
11:45 TheresNoTime: `[samtar@mwmaint2002 ~]$ echo 'https://en.wikipedia.org/robots.txt' | mwscript purgeList.php` T334038
11:40 samtar@deploy2002: Finished scap: Backport for gerrit:905764Remove possibly significant whitespace from robots.txt (T334038) (duration: 07m 14s)
11:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46060 and previous config saved to /var/cache/conftool/dbconfig/20230405-113745-root.json
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1414.eqiad.wmnet
11:34 samtar@deploy2002: legoktm and samtar: Backport for gerrit:905764Remove possibly significant whitespace from robots.txt (T334038) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
11:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
11:33 samtar@deploy2002: Started scap: Backport for gerrit:905764Remove possibly significant whitespace from robots.txt (T334038)
11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46059 and previous config saved to /var/cache/conftool/dbconfig/20230405-113246-root.json
11:31 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46058 and previous config saved to /var/cache/conftool/dbconfig/20230405-113052-root.json
11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46057 and previous config saved to /var/cache/conftool/dbconfig/20230405-113031-root.json
11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw1414.eqiad.wmnet
11:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:24 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:23 ladsgroup@deploy2002: Finished scap: Backport for gerrit:905609Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800) (duration: 08m 45s)
11:23 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46056 and previous config saved to /var/cache/conftool/dbconfig/20230405-112240-root.json
11:22 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
11:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46055 and previous config saved to /var/cache/conftool/dbconfig/20230405-111742-root.json
11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:16 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:905609Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46054 and previous config saved to /var/cache/conftool/dbconfig/20230405-111527-root.json
11:15 ladsgroup@deploy2002: Started scap: Backport for gerrit:905609Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800)
11:14 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:12 moritzm: installing systemd security updates on buster
11:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
11:10 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with 1% weight', diff saved to https://phabricator.wikimedia.org/P46053 and previous config saved to /var/cache/conftool/dbconfig/20230405-110717-root.json
11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 primary T331302', diff saved to https://phabricator.wikimedia.org/P46052 and previous config saved to /var/cache/conftool/dbconfig/20230405-110530-root.json
11:05 marostegui: Starting s5 eqiad failover from db1100 to db1130 - T331302
11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46051 and previous config saved to /var/cache/conftool/dbconfig/20230405-110237-root.json
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46050 and previous config saved to /var/cache/conftool/dbconfig/20230405-110022-root.json
11:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
11:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
10:59 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
10:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:56 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:56 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
10:50 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:50 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:50 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:49 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:47 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46049 and previous config saved to /var/cache/conftool/dbconfig/20230405-104732-root.json
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46048 and previous config saved to /var/cache/conftool/dbconfig/20230405-104517-root.json
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T331302', diff saved to https://phabricator.wikimedia.org/P46047 and previous config saved to /var/cache/conftool/dbconfig/20230405-104422-marostegui.json
10:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T331302
10:43 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T331302
10:43 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:41 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:40 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46046 and previous config saved to /var/cache/conftool/dbconfig/20230405-103012-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T326669', diff saved to https://phabricator.wikimedia.org/P46044 and previous config saved to /var/cache/conftool/dbconfig/20230405-102215-marostegui.json
10:20 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
10:17 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
10:17 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46043 and previous config saved to /var/cache/conftool/dbconfig/20230405-101507-root.json
10:14 elukey: restart purged on cp5032, cp1082, cp6004, cp1090 - errors after restart of kafka main eqiad brokers
10:12 elukey: restart purged on cp6015 to verify if connection to brokers failed are only temporary or not
10:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS bullseye
10:09 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
10:06 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46041 and previous config saved to /var/cache/conftool/dbconfig/20230405-100003-root.json
09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122', diff saved to https://phabricator.wikimedia.org/P46040 and previous config saved to /var/cache/conftool/dbconfig/20230405-095954-marostegui.json
09:57 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1162 to s2 primary T334067', diff saved to https://phabricator.wikimedia.org/P46039 and previous config saved to /var/cache/conftool/dbconfig/20230405-095600-root.json
09:55 marostegui: Starting s2 eqiad failover from db1122 to db1162 - T334067
09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage
09:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage
09:42 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
09:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1007.eqiad.wmnet with OS bullseye
09:35 elukey: restart kafka on kafka-main1003 to pick up the new TLS certificate (PKI based) - T319372
09:34 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1003.eqiad.wmnet with reason: restart kafka, switch to PKI
09:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1003.eqiad.wmnet with reason: restart kafka, switch to PKI
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1162 with weight 0 T334067', diff saved to https://phabricator.wikimedia.org/P46038 and previous config saved to /var/cache/conftool/dbconfig/20230405-093155-marostegui.json
09:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T334067
09:30 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
09:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T334067
09:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
09:15 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
08:58 hashar@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.3 refs T330209 (duration: 05m 46s)
08:52 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.3 refs T330209
08:39 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1003.eqiad.wmnet,service=thanos-web
08:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
08:27 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1006.eqiad.wmnet with OS bullseye
08:25 hashar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Remove akwiki from CX config (take 2, it was not fully deployed due to a scap lock issue on the spare server) (duration: 06m 06s)
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T326669', diff saved to https://phabricator.wikimedia.org/P46036 and previous config saved to /var/cache/conftool/dbconfig/20230405-082240-root.json
08:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
08:07 elukey: restart kafka on kafka-main1002 to pick up the new TLS certificate (PKI based) - T319372
08:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1002.eqiad.wmnet with reason: restart kafka, switch to PKI
08:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1002.eqiad.wmnet with reason: restart kafka, switch to PKI
07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1104.eqiad.wmnet
07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1104.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:56 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1006.eqiad.wmnet with OS bullseye
07:54 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kafka-test1006.eqiad.wmnet with OS bullseye
07:54 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1104.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
07:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1104.eqiad.wmnet
07:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from dbctl T329481', diff saved to https://phabricator.wikimedia.org/P46035 and previous config saved to /var/cache/conftool/dbconfig/20230405-073102-marostegui.json
07:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
07:24 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1006.eqiad.wmnet with OS bullseye
07:20 marostegui: Stop mariadb on db1101 T331381
07:11 kartik@deploy2002: Finished scap: Backport for gerrit:904952Remove akwiki from CX config (duration: 07m 22s)
07:11 marostegui: Failover m5-master T333377
07:05 kartik@deploy2002: kartik: Backport for gerrit:904952Remove akwiki from CX config synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
07:04 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
07:04 kartik@deploy2002: Started scap: Backport for gerrit:904952Remove akwiki from CX config
07:03 marostegui: Failover m3-master T333377
04:17 TimStarling: restarted swift-proxy on ms-fe* T328872

2023-04-04

23:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
23:34 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
23:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
23:25 tstarling@deploy2002: Synchronized src/Profiler.php: re-enable excimer T331882 (duration: 06m 25s)
23:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
23:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
23:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:58 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudvirtlocal - jclark@cumin1001"
22:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudvirtlocal - jclark@cumin1001"
22:55 jclark@cumin1001: START - Cookbook sre.dns.netbox
22:33 cstone: civicrm upgraded from 4231191f to 223f655a
22:26 mutante: deploying change to block scap execution on inactive deployment server via gerrit:904502 T330756
22:19 ejegg: payments-wiki upgraded from 49a2e104 to 75b068a1
21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb2002.codfw.wmnet
21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
21:37 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
21:26 dzahn@cumin1001: START - Cookbook sre.dns.netbox
21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts miscweb2002.codfw.wmnet
20:56 sbassett: Deployed mitigation for T333140
20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on miscweb2002.codfw.wmnet with reason: decom
20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on miscweb2002.codfw.wmnet with reason: decom
20:44 TheresNoTime: closing UTC late backport window
20:38 samtar@deploy2002: Finished scap: Backport for gerrit:903781Clean up history page visual diffs beta feature config (T333448) (duration: 06m 42s)
20:33 samtar@deploy2002: matmarex and samtar: Backport for gerrit:903781Clean up history page visual diffs beta feature config (T333448) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:31 samtar@deploy2002: Started scap: Backport for gerrit:903781Clean up history page visual diffs beta feature config (T333448)
20:27 samtar@deploy2002: Finished scap: Backport for gerrit:905685EditCheck: catch errors from TransactionSquasher (T324733) (duration: 08m 23s)
20:23 inflatador: bking@cumin1001 unban elastic nodes post switch maintenance T331882
20:20 samtar@deploy2002: matmarex and samtar: Backport for gerrit:905685EditCheck: catch errors from TransactionSquasher (T324733) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:18 samtar@deploy2002: Started scap: Backport for gerrit:905685EditCheck: catch errors from TransactionSquasher (T324733)
20:11 samtar@deploy2002: Finished scap: Backport for gerrit:905727Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733) (duration: 07m 30s)
20:10 mutante: deploying ATS config change on cp2* for query.wikidata.org
20:06 ryankemper: T331896 Running puppet on wcqs fleet to pickup new miscweb gui_url: `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'wcqs*' 'run-puppet-agent'`
20:05 samtar@deploy2002: matmarex and samtar: Backport for gerrit:905727Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:03 samtar@deploy2002: Started scap: Backport for gerrit:905727Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733)
20:03 mutante: running puppet on cp5*, cp4*...
20:00 ryankemper: T331896 Running puppet on wdqs fleet to pickup new miscweb gui_url: `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'run-puppet-agent'`
19:58 hashar@deploy2002: Finished deploy [gerrit/gerrit@dbaaa7a]: wm-zuul-status: change pending jobs SUCCESS > INFO | T214068 (duration: 00m 07s)
19:58 hashar@deploy2002: Started deploy [gerrit/gerrit@dbaaa7a]: wm-zuul-status: change pending jobs SUCCESS > INFO | T214068
19:55 mutante: https://query.wikidata.org and WCQS GUIs are switching to new backend VMs on bullseye in codfw T330090 T331896
19:46 hashar@deploy2002: Finished scap: Backport for gerrit:905726Replace usages of Hooks::register() (T334005) (duration: 06m 55s)
19:40 hashar@deploy2002: hashar: Backport for gerrit:905726Replace usages of Hooks::register() (T334005) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
19:39 hashar@deploy2002: Started scap: Backport for gerrit:905726Replace usages of Hooks::register() (T334005)
19:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.3 refs T330209
18:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:22 ladsgroup@deploy2002: Finished scap: Backport for gerrit:905617Revert "mergeMessageFileList.php: move code out of file scope." (T333966) (duration: 38m 18s)
17:04 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:905617Revert "mergeMessageFileList.php: move code out of file scope." (T333966) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
16:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:44 ladsgroup@deploy2002: Started scap: Backport for gerrit:905617Revert "mergeMessageFileList.php: move code out of file scope." (T333966)
16:37 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:37 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:17 ladsgroup@deploy2002: Finished scap: Backport for gerrit:905623Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961) (duration: 07m 31s)
16:11 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:905623Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
16:10 ladsgroup@deploy2002: Started scap: Backport for gerrit:905623Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961)
16:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1021 for reads', diff saved to https://phabricator.wikimedia.org/P46031 and previous config saved to /var/cache/conftool/dbconfig/20230404-160702-jynus.json
16:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1021 for reads (only 10%)', diff saved to https://phabricator.wikimedia.org/P46030 and previous config saved to /var/cache/conftool/dbconfig/20230404-160146-jynus.json
15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1022.eqiad.wmnet with reason: T333961
15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on es1022.eqiad.wmnet with reason: T333961
15:58 jynus: restart es1021, several connections in a "stuck" state T333961
15:50 dancy@deploy2002: Installation of scap version "4.48.0" completed for 592 hosts
15:49 dancy@deploy2002: Installing scap version "4.48.0" for 592 hosts
15:31 jynus: restart es1021, several connections in a "stuck" state T333961
15:25 jynus@cumin1001: dbctl commit (dc=all): 'Depool es1021 reads', diff saved to https://phabricator.wikimedia.org/P46029 and previous config saved to /var/cache/conftool/dbconfig/20230404-152501-jynus.json
15:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:19 jiji@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row C switches upgrade - T331882
15:18 ladsgroup@deploy2002: Finished scap: Backport for gerrit:905648external store: Depool es4 (cluster26) from writes for maintenance (T333961) (duration: 11m 30s)
15:16 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1150.eqiad.wmnet with reason: pending s3 reprovisioning
15:16 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1150.eqiad.wmnet with reason: pending s3 reprovisioning
15:12 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:08 ladsgroup@deploy2002: ladsgroup and jynus: Backport for gerrit:905648external store: Depool es4 (cluster26) from writes for maintenance (T333961) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:06 ladsgroup@deploy2002: Started scap: Backport for gerrit:905648external store: Depool es4 (cluster26) from writes for maintenance (T333961)
14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=metawiki -u 'Translation Notification Bot (T255246)' --auto # T255246
14:43 jiji@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row C switches upgrade - T331882
14:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
14:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
14:38 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
14:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
14:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
14:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:36 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:36 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:28 vgutierrez: switch cp6008 (upload) and cp6016 (text) to use a single UDS socket between haproxy and varnish - T333965
14:21 jynus: stop es1022 for debugging T333961
14:15 Lucas_WMDE: UTC afternoon backport+config window done
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:905598Use HookContainer to register hooks inside hooks (T333926) (duration: 10m 50s)
14:10 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1018.eqiad.wmnet
14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1013.eqiad.wmnet
14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1012.eqiad.wmnet
14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
14:09 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=datahubsearch1003.eqiad.wmnet
14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for gerrit:905598Use HookContainer to register hooks inside hooks (T333926) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:905598Use HookContainer to register hooks inside hooks (T333926)
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool es1022 T333961', diff saved to https://phabricator.wikimedia.org/P46027 and previous config saved to /var/cache/conftool/dbconfig/20230404-134415-ladsgroup.json
13:42 Emperor: repool thanos-fe1003 re T331882
13:41 Emperor: repool ms-fe1011 re T331882
13:38 steve_munene: leave hdfs safemode T331882
13:38 inflatador: reboot elastic2038 to clear soft lock
13:34 sukhe: run authdns-update for CR 905612, reverting depool of eqiad
13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
13:25 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
13:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
13:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
13:11 XioNoX: asw2-c-eqiad> request system reboot all-members - T331882
13:10 urbanecm@deploy2002: Finished scap: Backport for gerrit:905544ckbwiktionary: Add logo (T331831) (duration: 07m 00s)
13:05 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row C switches upgrade - T331882
13:03 urbanecm@deploy2002: Started scap: Backport for gerrit:905544ckbwiktionary: Add logo (T331831)
13:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 227 hosts with reason: eqiad row C upgrade
12:57 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 227 hosts with reason: eqiad row C upgrade
12:57 steve_munene: putting pdfs into safe mode as part of T331882
12:52 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 228 hosts with reason: eqiad row C upgrade
12:52 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 228 hosts with reason: eqiad row C upgrade
12:44 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row C switches upgrade - T331882
12:43 Emperor: depool thanos-fe1003 re T331882
12:38 Emperor: depool ms-fe1011 re T331882
12:32 sukhe: [finished] run authdns-update for CR: 905603 depool eqiad
12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 38 hosts with reason: Row c switch maint T331882
12:31 sukhe: run authdns-update for CR: 905603 depool eqiad
12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 38 hosts with reason: Row c switch maint T331882
12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1018.eqiad.wmnet
12:28 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1013.eqiad.wmnet
12:28 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1012.eqiad.wmnet
12:28 volans@cumin1001: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary
12:27 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
12:26 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=datahubsearch1003.eqiad.wmnet
12:24 TimStarling: I noticed that mw2382 was still talking to mwlog1002. It still had old php-fpm7.4 processes despite the scap. So I manually restarted php-fpm on it.
12:17 tstarling@deploy2002: Synchronized src/Profiler.php: T331882 disable profiling for switch maintenance (duration: 05m 58s)
11:35 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:24 moritzm: installing joblib security updates
10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
09:51 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.41.0-wmf.3" | T330209
09:42 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.3 refs T330209
09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46025 and previous config saved to /var/cache/conftool/dbconfig/20230404-091639-ladsgroup.json
09:19 hashar@deploy2002: Pruned MediaWiki: 1.41.0-wmf.1 (duration: 02m 16s)
09:13 hashar@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.3 refs T330209 (duration: 40m 20s)
09:09 moritzm: installing libmicrohttpd security updates
09:07 moritzm: installing libdatetime-timezone-perl updates
09:04 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
09:04 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
09:04 akosiaris@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
09:04 akosiaris@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
09:03 akosiaris@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:03 akosiaris@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:03 akosiaris@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:03 akosiaris@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:03 akosiaris@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:02 akosiaris@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:02 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:02 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:02 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
09:02 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
09:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P46024 and previous config saved to /var/cache/conftool/dbconfig/20230404-090133-ladsgroup.json
09:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P46023 and previous config saved to /var/cache/conftool/dbconfig/20230404-085553-ladsgroup.json
08:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
08:46 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:46 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P46022 and previous config saved to /var/cache/conftool/dbconfig/20230404-084627-ladsgroup.json
08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P46021 and previous config saved to /var/cache/conftool/dbconfig/20230404-084048-ladsgroup.json
08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
08:32 hashar@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.3 refs T330209
08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46020 and previous config saved to /var/cache/conftool/dbconfig/20230404-083120-ladsgroup.json
08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46019 and previous config saved to /var/cache/conftool/dbconfig/20230404-082911-ladsgroup.json
08:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:28 hashar: Deleting mediawiki/core branch `wmf/branch_cut_pretest` pointing at `430d25d1a1858edfa4a6199dfe1f0eb3743a219a` # T330209
08:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P46017 and previous config saved to /var/cache/conftool/dbconfig/20230404-082543-ladsgroup.json
08:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
08:22 godog: upgrade grafana* to grafana 9.3.11 - T333915
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P46016 and previous config saved to /var/cache/conftool/dbconfig/20230404-081039-ladsgroup.json
08:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
08:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
08:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
08:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
07:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
07:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 T333918', diff saved to https://phabricator.wikimedia.org/P46015 and previous config saved to /var/cache/conftool/dbconfig/20230404-074848-ladsgroup.json
07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary T333918', diff saved to https://phabricator.wikimedia.org/P46014 and previous config saved to /var/cache/conftool/dbconfig/20230404-074656-ladsgroup.json
07:46 Amir1: Starting s2 eqiad failover from db1162 to db1122 - T333918
07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
07:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
07:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 T333918', diff saved to https://phabricator.wikimedia.org/P46013 and previous config saved to /var/cache/conftool/dbconfig/20230404-072817-ladsgroup.json
07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T333918
07:27 hashar@deploy2002: Finished deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs (duration: 00m 08s)
07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T333918
07:27 hashar@deploy2002: Started deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs
07:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs (duration: 00m 05s)
07:23 hashar@deploy2002: Started deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs
06:09 XioNoX: stage new Junos on asw2-c-eqiad - T331882

2023-04-03

21:53 ryankemper: T331896 `sudo -E cumin -b 4 'wdqs*' 'sudo run-puppet-agent'`
21:42 maryum: undeployed mitigation for T333140
21:25 inflatador: bking@cumin ban cloudelastic1003 from all cloudelastic clusters T331882
21:22 maryum: deployed mitigation for T333140
21:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: T331882 eqiad row C maint
21:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: T331882 eqiad row C maint
21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs1003.eqiad.wmnet,wdqs[1010,1013-1014].eqiad.wmnet with reason: T331882 eqiad row C maint
21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs1003.eqiad.wmnet,wdqs[1010,1013-1014].eqiad.wmnet with reason: T331882 eqiad row C maint
20:37 kindrobot: close UTC late backport window
20:36 kindrobot@deploy2002: Finished scap: Backport for gerrit:905287make "advanced mode" default on ptwikinews mobile (T290812) (duration: 10m 47s)
20:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS bullseye
20:26 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for gerrit:905287make "advanced mode" default on ptwikinews mobile (T290812) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:25 kindrobot@deploy2002: Started scap: Backport for gerrit:905287make "advanced mode" default on ptwikinews mobile (T290812)
20:19 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:905264|[refactor] split out Minerva configuration from main config]], gerrit:904284Disable Vector js/css sharing on pl.wikipedia (T332809) (duration: 12m 05s)
20:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
20:08 kindrobot@deploy2002: kindrobot and jdlrobson: Backport for [[gerrit:905264|[refactor] split out Minerva configuration from main config]], gerrit:904284Disable Vector js/css sharing on pl.wikipedia (T332809) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
20:07 kindrobot@deploy2002: Started scap: Backport for [[gerrit:905264|[refactor] split out Minerva configuration from main config]], gerrit:904284Disable Vector js/css sharing on pl.wikipedia (T332809)
20:03 kindrobot: start UTC late backport window
19:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bullseye
19:38 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host lvs5006.eqsin.wmnet with OS bullseye
19:36 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:35 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:09 cwhite: manually upgrade vopsbot on alert2001 to version 0.3.3
18:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
18:55 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
18:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bullseye
18:14 brett: Disable Puppet/PyBal on lvs5006 in preparation for reimaging - T321309
16:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
15:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:905247 Bumping portals to master (T128546) (duration: 05m 33s)
15:51 cstone: payments-wiki upgraded from 60d0aed5 to 49a2e104
15:46 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:905247 Bumping portals to master (T128546) (duration: 06m 14s)
15:37 volans: restarted sirenbot (vopsbot) on alert2001 (msg="could not find the topic for this channel stored. Is the bot in the channel?")
15:36 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@04b4841]: (no justification provided) (duration: 00m 12s)
15:36 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@04b4841]: (no justification provided)
15:30 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
15:30 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
15:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
15:12 sukhe: rolling restart of bird.service on doh* and not doh2002
15:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
15:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
15:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@fabc2cf]: Deploy refine webrequest job on analytics_test to fix matching Oozie job (duration: 00m 11s)
15:04 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@fabc2cf]: Deploy refine webrequest job on analytics_test to fix matching Oozie job
14:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Investigate service failures from bullseye upgrade
14:30 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Investigate service failures from bullseye upgrade
13:50 claime: Testing deploy server dsh group inclusion - T329857
13:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
13:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1074.eqiad.wmnet']
13:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
13:35 taavi@deploy2002: Finished scap: Backport for gerrit:905193GrowthExperiments: add link backend amends (T308133) (duration: 07m 15s)
13:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073']
13:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11062
13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11062
13:29 taavi@deploy2002: sgimeno and taavi: Backport for gerrit:905193GrowthExperiments: add link backend amends (T308133) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:28 taavi@deploy2002: Started scap: Backport for gerrit:905193GrowthExperiments: add link backend amends (T308133)
13:25 taavi@deploy2002: Finished scap: Backport for gerrit:904631Enable visual enhancements on pages using __NEWSECTIONLINK__ on huwiki (T333570) (duration: 16m 06s)
13:18 taavi@deploy2002: matmarex and taavi: Backport for gerrit:904631Enable visual enhancements on pages using __NEWSECTIONLINK__ on huwiki (T333570) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:09 taavi@deploy2002: Started scap: Backport for gerrit:904631Enable visual enhancements on pages using __NEWSECTIONLINK__ on huwiki (T333570)
12:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:11 jbond@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=netbox
12:02 jbond: testing netbox failover cookbook
12:02 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=netbox
11:31 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:31 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:31 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:31 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:29 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:29 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
11:04 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
11:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
10:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
10:35 vgutierrez: Extend the ESI test to text@eqsin, revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/905173/ if this gives any issue - T308799
10:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
10:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
10:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
10:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
10:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
10:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
09:19 elukey: move kafka-jumbo1006's kafka broker cert to PKI - T296064
09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1006.eqiad.wmnet with reason: restart kafka, switch to PKI
09:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1006.eqiad.wmnet with reason: restart kafka, switch to PKI
08:54 elukey: move kafka-jumbo1009's kafka broker cert to PKI - T296064
08:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1009.eqiad.wmnet with reason: restart kafka, switch to PKI
08:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1009.eqiad.wmnet with reason: restart kafka, switch to PKI
08:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
08:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
08:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
08:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
08:31 vgutierrez: rolling upgrade to HAProxy 2.6.12 in A:cp-ulsfo
08:29 elukey: move kafka-main1001's kafka broker to PKI - T319372
08:26 vgutierrez: fetch HAProxy 2.6.12 on thirdparty/haproxy26 for bullseye (apt.wm.o)
08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1001.eqiad.wmnet with reason: restart kafka, switch to PKI
08:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1001.eqiad.wmnet with reason: restart kafka, switch to PKI
08:03 elukey: move kafka-jumbo1008's kafka broker cert to PKI - T296064
08:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1008.eqiad.wmnet with reason: restart kafka, switch to PKI
08:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1008.eqiad.wmnet with reason: restart kafka, switch to PKI
07:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1007.eqiad.wmnet with reason: restart kafka, switch to PKI
07:43 elukey: move kafka-jumbo1007's kafka broker cert to PKI - T296064
06:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1005.eqiad.wmnet with reason: restart kafka, switch to PKI
06:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1005.eqiad.wmnet with reason: restart kafka, switch to PKI
06:52 elukey: move kafka-jumbo1005's kafka broker cert to PKI - T296064

2023-04-01

00:13 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus5002.eqsin.wmnet with OS bullseye

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s

This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.