Server Admin Log/Archive 62

2023-01-31

23:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
23:45 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3055.esams.wmnet with OS bullseye
23:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
23:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
23:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
23:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS bullseye
22:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2040.codfw.wmnet
22:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS bullseye
22:35 zabe@deploy1002: Finished scap: Backport for gerrit:885416Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), gerrit:885431Stop writing to cuc_comment in testwiki (T233004) (duration: 07m 34s)
22:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
22:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
22:30 zabe@deploy1002: zabe: Backport for gerrit:885416Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), gerrit:885431Stop writing to cuc_comment in testwiki (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
22:28 zabe@deploy1002: Started scap: Backport for gerrit:885416Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), gerrit:885431Stop writing to cuc_comment in testwiki (T233004)
22:26 zabe@deploy1002: Finished scap: Backport for gerrit:884142Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) (duration: 08m 43s)
22:19 zabe@deploy1002: zabe and bawolff: Backport for gerrit:884142Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:17 zabe@deploy1002: Started scap: Backport for gerrit:884142Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)
22:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS bullseye
22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet
22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS bullseye
22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=cdn
22:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:44 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
21:36 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:35 kindrobot: close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
21:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:33 kindrobot@deploy1002: Finished scap: Backport for gerrit:885395Enable ClientPreferences for group0 (T327979) (duration: 10m 17s)
21:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
21:25 kindrobot@deploy1002: kindrobot and nray: Backport for gerrit:885395Enable ClientPreferences for group0 (T327979) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2039.codfw.wmnet
21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for gerrit:885395Enable ClientPreferences for group0 (T327979)
21:17 kindrobot@deploy1002: Finished scap: Backport for gerrit:885046Enable Linter write namespace, tag and template for group0 and group1 (T299612) (duration: 13m 20s)
21:06 kindrobot@deploy1002: sbailey and kindrobot: Backport for gerrit:885046Enable Linter write namespace, tag and template for group0 and group1 (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:04 kindrobot@deploy1002: Started scap: Backport for gerrit:885046Enable Linter write namespace, tag and template for group0 and group1 (T299612)
21:04 jgleeson: smashpig updated from d1434aeb to 683df497
21:03 kindrobot: start UTC late backport window
20:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
20:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS bullseye
20:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS bullseye
20:45 zabe: start running "foreachwikiindblist s5.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
20:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
20:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
20:11 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS bullseye
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=ats-be
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=cdn
20:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS bullseye
20:05 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet
20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS bullseye
20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS bullseye
20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:59 sukhe: sudo rm /etc/dhcp/automation/ttyS1-115200/cp5020.conf
19:58 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
19:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
19:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
19:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS bullseye
19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:16 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
19:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.21 refs T325584
18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=cdn
18:44 mutante: gitlab-prod-1001.devtools (cloud) - rebooted VM ; ip addr del 172.16.7.146/32 dev eth0 - T318521
18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
18:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS bullseye
18:26 mutante: gitlab-prod-1001.devtools (cloud) - ip addr del 172.16.7.146/21 dev eth0 - T318521
18:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
18:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
18:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075']
18:24 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075']
18:22 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075.eqiad.wmnet']
18:22 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075.eqiad.wmnet']
18:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1075.eqiad.wmnet
18:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1075.eqiad.wmnet
18:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
18:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS bullseye
17:53 sukhe: depool cp1075.eqiad.wmnet for iDRAC firmware testing: T321309
17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS bullseye
17:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp5019.eqsin.wmnet
17:47 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp5019.eqsin.wmnet
17:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1090.eqiad.wmnet
17:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1090.eqiad.wmnet
17:38 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1076.eqiad.wmnet
17:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1076.eqiad.wmnet
17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=ats-be
17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=cdn
17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
17:33 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet
17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=cdn
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=ats-be
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=cdn
17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS bullseye
17:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp5019.eqsin.wmnet
17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
17:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
17:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp5019.eqsin.wmnet
16:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 11s)
16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 10s)
16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
16:49 mutante: mw2271 - renabling disabled puppet
16:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2032.codfw.wmnet with OS bullseye
16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:45 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash2032.codfw.wmnet
16:44 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash1032.eqiad.wmnet
16:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
16:40 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:38 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:37 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:37 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request phab:T328456T328456" --skip-subpages # T328456
16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=ats-be
16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=cdn
16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
16:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5018.eqsin.wmnet with OS bullseye
16:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
16:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
16:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
16:01 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
15:54 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagemaster1001.eqiad.wmnet with OS bullseye
15:40 ladsgroup@deploy1002: Finished scap: Backport for gerrit:885058Set 'groupLoadsBySection' for s11 (T326980) (duration: 09m 49s)
15:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
15:32 ladsgroup@deploy1002: ladsgroup and zabe: Backport for gerrit:885058Set 'groupLoadsBySection' for s11 (T326980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
15:30 ladsgroup@deploy1002: Started scap: Backport for gerrit:885058Set 'groupLoadsBySection' for s11 (T326980)
15:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS bullseye
15:23 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
15:20 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster1001.eqiad.wmnet with OS bullseye
15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
15:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1005.eqiad.wmnet with OS bullseye
15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1004.eqiad.wmnet with OS bullseye
14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1006.eqiad.wmnet with OS bullseye
14:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS bullseye
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
14:34 urbanecm@deploy1002: Finished scap: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534) (duration: 07m 23s)
14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1006.eqiad.wmnet with OS bullseye
14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1005.eqiad.wmnet with OS bullseye
14:32 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1004.eqiad.wmnet with OS bullseye
14:28 urbanecm@deploy1002: dreamyjazz and urbanecm and daniel: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde
14:26 urbanecm@deploy1002: Started scap: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534)
14:20 urbanecm@deploy1002: Finished scap: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534) (duration: 16m 33s)
14:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:07 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:05 urbanecm@deploy1002: urbanecm and dreamyjazz and daniel: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
14:05 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
14:03 urbanecm@deploy1002: Started scap: Backport for gerrit:885041Disable write old for CheckUserLog reason field for testwiki (T233004), gerrit:885051Remove redundant definition of wgCheckUserEnableSpecialInvestigate, gerrit:885337Bump parsoid parser cache writes to 25%. (T320534)
14:01 urbanecm@deploy1002: Backport cancelled.
12:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
12:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
11:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
11:50 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad
11:21 moritzm: installing bind9 security updates (client-side tools/libs only)
10:57 jayme@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=k8s-ingress-staging
10:57 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=k8s-ingress-staging
10:18 jayme: switching active kubernetes staging cluster from eqiad to codfw - T327664
09:20 marostegui: dbmaint Install MariaDB 10.6 on db2093 (db_inventory) T328408
09:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:00 zabe@deploy1002: Finished scap: Backport for gerrit:885274Stop writing to cuc_user and cuc_user_text in testwiki (T233004) (duration: 08m 11s)
09:00 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
08:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
08:54 elukey: roll restart kafka on kafka-logging1001 to pick up new pki certs
08:53 zabe@deploy1002: zabe: Backport for gerrit:885274Stop writing to cuc_user and cuc_user_text in testwiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
08:51 zabe@deploy1002: Started scap: Backport for gerrit:885274Stop writing to cuc_user and cuc_user_text in testwiki (T233004)
08:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:45 elukey: restore previously removed password for keystore to kafka-logging clusters
08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
07:56 moritzm: installing bash bugfix updates from Bullseye point release
07:22 marostegui: dbmaint Schema change on s3 eqiad T328373
07:22 marostegui: dbmaint Schema change on s1 eqiad T328373
07:10 marostegui: Failover m2 from db1164 to db1195 - T328253
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:03 marostegui: dbmaint Schema change on s5 eqiad T328373
06:59 marostegui: dbmaint Schema change on s7 eqiad T328373
06:57 marostegui: dbmaint Schema change on s4 eqiad T328373
06:52 marostegui: dbmaint Schema change on s8 eqiad T328373
05:02 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.19 (duration: 02m 15s)
05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
05:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.21 refs T325584 (duration: 52m 56s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.21 refs T325584
02:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=ats-be
02:43 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=cdn
02:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS bullseye
02:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
01:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
01:33 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3053.esams.wmnet']
01:31 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3053.esams.wmnet']
00:50 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
00:42 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS bullseye
00:14 mutante: etherpad - maintenance downtime for about 5 minutes to test monitoring
00:09 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
00:06 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage

2023-01-30

23:30 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS bullseye
23:29 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3053.esams.wmnet with OS bullseye
23:07 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
22:58 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
22:50 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3053.esams.wmnet with OS bullseye
22:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=cdn
22:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS bullseye
21:56 urbanecm@deploy1002: Finished scap: Backport for gerrit:884138Try to determine what's adding to Parsoid init times (T328201), gerrit:885008Update interwiki cache (duration: 12m 24s)
21:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
21:46 urbanecm@deploy1002: arlolra and urbanecm: Backport for gerrit:884138Try to determine what's adding to Parsoid init times (T328201), gerrit:885008Update interwiki cache synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:44 urbanecm@deploy1002: Started scap: Backport for gerrit:884138Try to determine what's adding to Parsoid init times (T328201), gerrit:885008Update interwiki cache
21:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
21:42 urbanecm@deploy1002: Finished scap: Backport for gerrit:884153GrowthExperiments: Update campaign configuration (T321370) (duration: 08m 47s)
21:35 urbanecm@deploy1002: tgr and urbanecm: Backport for gerrit:884153GrowthExperiments: Update campaign configuration (T321370) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:34 urbanecm@deploy1002: Started scap: Backport for gerrit:884153GrowthExperiments: Update campaign configuration (T321370)
21:33 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
21:31 urbanecm@deploy1002: Finished scap: Backport for gerrit:883301Enable WelcomeSurvey at viwiki (T325376), gerrit:884930Fix grid blowout with limited width turned off (T327423), gerrit:884929Support new style of table of contents (T327942) (duration: 09m 52s)
21:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
21:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS bullseye
21:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:23 urbanecm@deploy1002: tgr and urbanecm and jdlrobson and legoktm: Backport for gerrit:883301Enable WelcomeSurvey at viwiki (T325376), gerrit:884930Fix grid blowout with limited width turned off (T327423), gerrit:884929Support new style of table of contents (T327942) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:21 urbanecm@deploy1002: Started scap: Backport for gerrit:883301Enable WelcomeSurvey at viwiki (T325376), gerrit:884930Fix grid blowout with limited width turned off (T327423), gerrit:884929Support new style of table of contents (T327942)
21:21 urbanecm@deploy1002: Finished scap: Backport for gerrit:884474InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) (duration: 19m 51s)
21:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:03 urbanecm@deploy1002: urbanecm and musikanimal: Backport for gerrit:884474InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=cdn
21:01 urbanecm@deploy1002: Started scap: Backport for gerrit:884474InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387)
20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
20:51 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
20:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
20:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS bullseye
20:23 zabe@deploy1002: Finished scap: Backport for gerrit:885002slwiki: Raise AF emergency disable treshold+count (T328366) (duration: 07m 32s)
20:17 zabe@deploy1002: zabe: Backport for gerrit:885002slwiki: Raise AF emergency disable treshold+count (T328366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:16 zabe@deploy1002: Started scap: Backport for gerrit:885002slwiki: Raise AF emergency disable treshold+count (T328366)
20:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
20:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
20:12 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
20:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=ats-be
19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=cdn
19:50 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
19:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS bullseye
19:47 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
19:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS bullseye
19:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:26 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
19:26 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
19:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
19:21 cstone: payments-wiki upgraded from 653c7cc8 to f20a2208
19:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
19:15 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3052.esams.wmnet with OS bullseye
18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:34 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS bullseye
18:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:29 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
18:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3052.esams.wmnet
18:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
18:04 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
18:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:884427|[Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled]] (duration: 07m 59s)
17:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:884427|[Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled]]
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43517 and previous config saved to /var/cache/conftool/dbconfig/20230130-174957-ladsgroup.json
17:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3052.esams.wmnet
17:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
17:43 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS bullseye
17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=cdn
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43516 and previous config saved to /var/cache/conftool/dbconfig/20230130-173450-ladsgroup.json
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
17:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
17:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
17:24 inflatador: bking@build2001 rebuilding docker images for 884351 complete
17:22 inflatador: bking@build2001 rebuilding docker images for 884351
17:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS bullseye
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43515 and previous config saved to /var/cache/conftool/dbconfig/20230130-171944-ladsgroup.json
17:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS bullseye
17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43514 and previous config saved to /var/cache/conftool/dbconfig/20230130-170437-ladsgroup.json
16:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43513 and previous config saved to /var/cache/conftool/dbconfig/20230130-165359-ladsgroup.json
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43512 and previous config saved to /var/cache/conftool/dbconfig/20230130-165348-ladsgroup.json
16:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43511 and previous config saved to /var/cache/conftool/dbconfig/20230130-163842-ladsgroup.json
16:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
16:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43510 and previous config saved to /var/cache/conftool/dbconfig/20230130-162336-ladsgroup.json
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=cdn
16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS bullseye
16:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
16:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS bullseye
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43509 and previous config saved to /var/cache/conftool/dbconfig/20230130-161324-root.json
16:11 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=ats-be
16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=cdn
16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43508 and previous config saved to /var/cache/conftool/dbconfig/20230130-160829-ladsgroup.json
16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
16:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS bullseye
16:03 sukhe: racreset cp3050.esams.wmnet: firmware cookbook iDRAC upgrade test
16:03 moritzm: upgrading idp-test to latest Java security update
15:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
15:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43507 and previous config saved to /var/cache/conftool/dbconfig/20230130-155819-root.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43506 and previous config saved to /var/cache/conftool/dbconfig/20230130-155802-ladsgroup.json
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43505 and previous config saved to /var/cache/conftool/dbconfig/20230130-155747-ladsgroup.json
15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43504 and previous config saved to /var/cache/conftool/dbconfig/20230130-154314-root.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43503 and previous config saved to /var/cache/conftool/dbconfig/20230130-154241-ladsgroup.json
15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS bullseye
15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS bullseye
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43502 and previous config saved to /var/cache/conftool/dbconfig/20230130-152809-root.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43501 and previous config saved to /var/cache/conftool/dbconfig/20230130-152734-ladsgroup.json
15:14 marostegui: Retrospective: Starting s4 codfw failover from db2110 to db2140 - T328022
15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43500 and previous config saved to /var/cache/conftool/dbconfig/20230130-151304-root.json
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43499 and previous config saved to /var/cache/conftool/dbconfig/20230130-151228-ladsgroup.json
15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43498 and previous config saved to /var/cache/conftool/dbconfig/20230130-150132-ladsgroup.json
15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43497 and previous config saved to /var/cache/conftool/dbconfig/20230130-145759-root.json
14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T328022', diff saved to https://phabricator.wikimedia.org/P43496 and previous config saved to /var/cache/conftool/dbconfig/20230130-145508-root.json
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary T328022', diff saved to https://phabricator.wikimedia.org/P43495 and previous config saved to /var/cache/conftool/dbconfig/20230130-145421-root.json
14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43494 and previous config saved to /var/cache/conftool/dbconfig/20230130-145229-ladsgroup.json
14:47 moritzm: updating puppetdb 7 hosts to 7.12.1 T321783
14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:884090Enable Linter write namespace, tag and template from core, group0 (T299612) (duration: 11m 11s)
14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS bullseye
14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43493 and previous config saved to /var/cache/conftool/dbconfig/20230130-144213-ladsgroup.json
14:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43492 and previous config saved to /var/cache/conftool/dbconfig/20230130-143723-ladsgroup.json
14:36 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sbailey: Backport for gerrit:884090Enable Linter write namespace, tag and template from core, group0 (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:884090Enable Linter write namespace, tag and template from core, group0 (T299612)
14:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:884500Revert "Remove references to mediawiki.Uri" (T328143), gerrit:884501Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) (duration: 12m 07s)
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43491 and previous config saved to /var/cache/conftool/dbconfig/20230130-142708-ladsgroup.json
14:22 lucaswerkmeister-wmde@deploy1002: matmarex and lucaswerkmeister-wmde: Backport for gerrit:884500Revert "Remove references to mediawiki.Uri" (T328143), gerrit:884501Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43490 and previous config saved to /var/cache/conftool/dbconfig/20230130-142216-ladsgroup.json
14:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:884500Revert "Remove references to mediawiki.Uri" (T328143), gerrit:884501Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143)
14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T328022', diff saved to https://phabricator.wikimedia.org/P43489 and previous config saved to /var/cache/conftool/dbconfig/20230130-141822-root.json
14:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43488 and previous config saved to /var/cache/conftool/dbconfig/20230130-141203-ladsgroup.json
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43487 and previous config saved to /var/cache/conftool/dbconfig/20230130-140710-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43486 and previous config saved to /var/cache/conftool/dbconfig/20230130-135659-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43485 and previous config saved to /var/cache/conftool/dbconfig/20230130-135632-ladsgroup.json
13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43484 and previous config saved to /var/cache/conftool/dbconfig/20230130-134406-ladsgroup.json
13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 23s)
13:29 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
13:29 godog: bounce logstash on logstash1025 -- GC unhappy causing kafka lag
13:29 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 01m 13s)
13:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43483 and previous config saved to /var/cache/conftool/dbconfig/20230130-132701-ladsgroup.json
13:23 awight@deploy1002: Finished scap: Backport for gerrit:884496Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) (duration: 08m 34s)
13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 11s)
13:21 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 00m 22s)
13:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
13:16 awight@deploy1002: awight: Backport for gerrit:884496Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:14 awight@deploy1002: Started scap: Backport for gerrit:884496Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43482 and previous config saved to /var/cache/conftool/dbconfig/20230130-131155-ladsgroup.json
13:00 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:59 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3004.wikimedia.org
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:57 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43481 and previous config saved to /var/cache/conftool/dbconfig/20230130-125648-ladsgroup.json
12:56 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:55 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:55 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:55 awight@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.20" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.2oaGSEpQR1"' returned non-zero exit status 255. (duration: 00m 00s)
12:55 awight@deploy1002: Started scap: Backport for gerrit:884496Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
12:46 awight@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian (duration: 01m 27s)
12:45 awight@deploy1002: Started deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43479 and previous config saved to /var/cache/conftool/dbconfig/20230130-124142-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43478 and previous config saved to /var/cache/conftool/dbconfig/20230130-123004-ladsgroup.json
12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43477 and previous config saved to /var/cache/conftool/dbconfig/20230130-122943-ladsgroup.json
12:25 awight@deploy1002: Finished deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad (duration: 02m 44s)
12:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:23 awight@deploy1002: Started deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad
12:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43476 and previous config saved to /var/cache/conftool/dbconfig/20230130-121437-ladsgroup.json
12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3004.wikimedia.org
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43475 and previous config saved to /var/cache/conftool/dbconfig/20230130-115930-ladsgroup.json
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6001.wikimedia.org
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6001.wikimedia.org
11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42473
11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42473
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43474 and previous config saved to /var/cache/conftool/dbconfig/20230130-114424-ladsgroup.json
11:42 moritzm: installing install4002 T327867
11:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
11:41 Amir1: dropping old wikiadmin user (T326802)
11:35 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
11:35 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43473 and previous config saved to /var/cache/conftool/dbconfig/20230130-113319-ladsgroup.json
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43472 and previous config saved to /var/cache/conftool/dbconfig/20230130-113254-ladsgroup.json
11:28 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
11:24 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
11:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4002.wikimedia.org
11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43471 and previous config saved to /var/cache/conftool/dbconfig/20230130-111748-ladsgroup.json
11:17 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
11:11 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
11:09 phedenskog@deploy1002: Finished deploy [performance/navtiming@4e5ff3f]: (no justification provided) (duration: 00m 05s)
11:09 phedenskog@deploy1002: Started deploy [performance/navtiming@4e5ff3f]: (no justification provided)
11:05 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43470 and previous config saved to /var/cache/conftool/dbconfig/20230130-110241-ladsgroup.json
11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
10:49 ladsgroup@deploy1002: Finished scap: Backport for gerrit:884837Enable write both for externallinks except s4, s7, s8 (T321662) (duration: 13m 10s)
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43468 and previous config saved to /var/cache/conftool/dbconfig/20230130-104735-ladsgroup.json
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4003.wikimedia.org
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:37 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:884837Enable write both for externallinks except s4, s7, s8 (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:36 ladsgroup@deploy1002: Started scap: Backport for gerrit:884837Enable write both for externallinks except s4, s7, s8 (T321662)
10:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43467 and previous config saved to /var/cache/conftool/dbconfig/20230130-103540-ladsgroup.json
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:30 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43466 and previous config saved to /var/cache/conftool/dbconfig/20230130-102500-ladsgroup.json
10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4003.wikimedia.org
10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
10:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43465 and previous config saved to /var/cache/conftool/dbconfig/20230130-100954-ladsgroup.json
10:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
10:00 awight@deploy1002: Finished scap: Backport for gerrit:879559Enable kartographer external data parse time fetch for all wikis (T326317) (duration: 07m 53s)
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43464 and previous config saved to /var/cache/conftool/dbconfig/20230130-095447-ladsgroup.json
09:54 awight@deploy1002: lilients and awight: Backport for gerrit:879559Enable kartographer external data parse time fetch for all wikis (T326317) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:52 awight@deploy1002: Started scap: Backport for gerrit:879559Enable kartographer external data parse time fetch for all wikis (T326317)
09:52 XioNoX: push pfw policies - T328085
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43463 and previous config saved to /var/cache/conftool/dbconfig/20230130-093941-ladsgroup.json
09:29 jynus: disabling puppet on dbprov2004 to reorganize partitions T327155
09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43462 and previous config saved to /var/cache/conftool/dbconfig/20230130-092804-ladsgroup.json
09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43461 and previous config saved to /var/cache/conftool/dbconfig/20230130-092732-ladsgroup.json
09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43460 and previous config saved to /var/cache/conftool/dbconfig/20230130-091225-ladsgroup.json
08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43459 and previous config saved to /var/cache/conftool/dbconfig/20230130-085719-ladsgroup.json
08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43458 and previous config saved to /var/cache/conftool/dbconfig/20230130-085530-ladsgroup.json
08:48 moritzm: installing install1004 T327867
08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43457 and previous config saved to /var/cache/conftool/dbconfig/20230130-084213-ladsgroup.json
08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43456 and previous config saved to /var/cache/conftool/dbconfig/20230130-084024-ladsgroup.json
08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43455 and previous config saved to /var/cache/conftool/dbconfig/20230130-083034-ladsgroup.json
08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43454 and previous config saved to /var/cache/conftool/dbconfig/20230130-082517-ladsgroup.json
08:19 zabe:: Deployed security patch for T278365
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43452 and previous config saved to /var/cache/conftool/dbconfig/20230130-081011-ladsgroup.json
07:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfbd6d7]: (no justification provided) (duration: 00m 05s)
07:54 phedenskog@deploy1002: Started deploy [performance/navtiming@bfbd6d7]: (no justification provided)
07:50 moritzm: installing install2004 T327867
07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43451 and previous config saved to /var/cache/conftool/dbconfig/20230130-074502-ladsgroup.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43450 and previous config saved to /var/cache/conftool/dbconfig/20230130-073827-ladsgroup.json
07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43449 and previous config saved to /var/cache/conftool/dbconfig/20230130-073806-ladsgroup.json
07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43448 and previous config saved to /var/cache/conftool/dbconfig/20230130-072956-ladsgroup.json
07:26 marostegui: dbmaint Schema change on s7 eqiad T328236
07:25 marostegui: dbmaint Schema change on s2 eqiad T328236
07:25 marostegui: dbmaint Schema change on s1 eqiad T328236
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43447 and previous config saved to /var/cache/conftool/dbconfig/20230130-072300-ladsgroup.json
07:21 marostegui: dbmaint Schema change on s1 eqiad T328236
07:17 marostegui: dbmaint Schema change on s4 eqiad T328236
07:16 marostegui: dbmaint Schema change on s6 eqiad T328236
07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43446 and previous config saved to /var/cache/conftool/dbconfig/20230130-071450-ladsgroup.json
07:11 marostegui: dbmaint Schema change on s5 eqiad T328236
07:10 marostegui: dbmaint Schema change on s8 eqiad T328236
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43445 and previous config saved to /var/cache/conftool/dbconfig/20230130-070753-ladsgroup.json
07:05 marostegui: dbmaint Schema change on s3 eqiad T328086
07:02 marostegui: dbmaint Schema change on s1 eqiad T328086
07:01 marostegui: dbmaint Schema change on s4 eqiad T328086
06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43444 and previous config saved to /var/cache/conftool/dbconfig/20230130-065943-ladsgroup.json
06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43443 and previous config saved to /var/cache/conftool/dbconfig/20230130-065247-ladsgroup.json
06:51 marostegui: dbmaint Schema change on s5 eqiad T328086
06:45 marostegui: dbmaint Schema change on s2 eqiad T328086
06:43 marostegui: dbmaint Schema change on s7 eqiad T328086
06:41 marostegui: dbmaint Schema change on s8 eqiad T328086
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:34 marostegui: dbmaint Schema change on s6 eqiad T328086
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T318605)', diff saved to https://phabricator.wikimedia.org/P43441 and previous config saved to /var/cache/conftool/dbconfig/20230130-061534-ladsgroup.json
06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43440 and previous config saved to /var/cache/conftool/dbconfig/20230130-061401-ladsgroup.json
06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43439 and previous config saved to /var/cache/conftool/dbconfig/20230130-053033-ladsgroup.json
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance

2023-01-29

14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet

2023-01-28

00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye

2023-01-27

23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
23:21 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
22:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
22:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
22:20 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.2-1+deb11u1_amd64.changes # T328162
22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.2-1_amd64.changes # T328162
22:00 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
21:59 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
21:51 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
21:49 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
20:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS bullseye
20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
20:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
20:05 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
20:02 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
19:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
19:38 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
19:32 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp404.ulsfo.wmnet
19:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
19:02 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
18:57 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
18:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
18:37 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
18:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
18:24 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
18:14 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS bullseye
17:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
17:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
17:38 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided) (duration: 00m 14s)
17:38 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided)
17:28 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
17:28 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS bullseye
17:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
15:50 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 04s)
15:50 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=ats-be
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=cdn
15:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=ats-be
15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=cdn
15:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS bullseye
15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
14:55 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:55 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
14:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
14:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
14:41 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
14:40 moritzm: installing install3002 T327867
14:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
14:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:27 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
14:27 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:26 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
13:46 moritzm: installing install5002 T327867
13:08 moritzm: installing install6002 T327867
12:47 hashar: gerrit1001 running Puppet to deploy https://gerrit.wikimedia.org/r/883965 and restarting Apache 2 to change the `Listen` statements # T326125
12:42 hashar: Rebooting gerrit2002
12:38 hashar: Stopped Puppet on gerrit1001 to prevent auto deployment of https://gerrit.wikimedia.org/r/883965
12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
12:23 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
12:03 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided) (duration: 00m 15s)
12:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided)
12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138915
12:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138915
11:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9318
11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9318
11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55821
11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55821
11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398143
11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398143
11:57 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26077
11:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26077
11:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 50266
11:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 50266
11:54 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
11:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
11:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14593
11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56898
11:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56898
11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8368
11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8368
11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8560
11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8560
11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34309
11:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34309
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12033
11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12033
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62537
11:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62537
11:41 XioNoX: restart keyholder on deploy1002
11:41 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
11:40 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:38 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
11:27 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:26 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 56s)
11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
11:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
11:04 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: apply on main
11:04 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:03 aborrero@cumin2002: START - Cookbook sre.dns.netbox
11:01 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: apply on main
11:01 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ldap-corp1001.wikimedia.org
10:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
10:38 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
10:37 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
10:37 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:26 aborrero@cumin2002: START - Cookbook sre.dns.netbox
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp2001.wikimedia.org
10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp2001.wikimedia.org
09:40 moritzm: disabling old bastions bast3005/bast4003/bast5002/bast6001, use bast3006/bast4004/bast5003/bast6002 instead
08:23 marostegui: Apply schema change on labtestwiki (clouddb2002-dev)T328086
08:22 marostegui: Apply schema change on db1106 (s1 enwiki) T328086
08:06 elukey: restart kube-apiserver on ml-staging-ctrl2* nodes as attempt to mitigate some LIST API high latency
07:41 elukey: restart kube-apiserver on ml-serve-ctrl2* nodes as attempt to mitigate some 504 API response errors
01:15 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
01:11 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
01:10 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS bullseye
00:56 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
00:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
00:45 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
00:33 zabe@deploy1002: Finished scap: Backport for gerrit:884137Stop setting cul_actor migration var (T233004) (duration: 07m 36s)
00:27 zabe@deploy1002: zabe: Backport for gerrit:884137Stop setting cul_actor migration var (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
00:26 zabe@deploy1002: Started scap: Backport for gerrit:884137Stop setting cul_actor migration var (T233004)
00:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:24 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
00:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:15 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
00:11 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:10 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye

2023-01-26

23:59 zabe@deploy1002: Finished scap: Backport for gerrit:883724Add a project logo on gorwiktionary (T327987) (duration: 34m 42s)
23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
23:26 zabe@deploy1002: zabe and superpes: Backport for gerrit:883724Add a project logo on gorwiktionary (T327987) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
23:24 zabe@deploy1002: Started scap: Backport for gerrit:883724Add a project logo on gorwiktionary (T327987)
23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: T326691 - remove mitigation and monitor (duration: 06m 52s)
23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
23:03 zabe@deploy1002: Finished scap: Backport for gerrit:881390Pin CheckUserEventTablesMigrationStage to read and write old (T324907) (duration: 08m 36s)
22:56 zabe@deploy1002: dreamyjazz and zabe: Backport for gerrit:881390Pin CheckUserEventTablesMigrationStage to read and write old (T324907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:54 zabe@deploy1002: Started scap: Backport for gerrit:881390Pin CheckUserEventTablesMigrationStage to read and write old (T324907)
22:45 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
22:44 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
22:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS bullseye
22:23 zabe: running migrateRevisionCommentTemp.php in cebwiki in screen with --sleep 2 # T275246
22:22 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
22:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
21:58 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:47 thcipriani@deploy1002: Finished scap: Backport for gerrit:884055Increase threshold for table of contents collapsing (T328045), gerrit:879664Remove redundant block for search descriptions (T324859) (duration: 08m 49s)
21:40 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for gerrit:884055Increase threshold for table of contents collapsing (T328045), gerrit:879664Remove redundant block for search descriptions (T324859) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:39 thcipriani@deploy1002: Started scap: Backport for gerrit:884055Increase threshold for table of contents collapsing (T328045), gerrit:879664Remove redundant block for search descriptions (T324859)
21:36 thcipriani@deploy1002: Finished scap: Backport for gerrit:884013ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) (duration: 08m 43s)
21:35 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
21:34 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS bullseye
21:29 thcipriani@deploy1002: matmarex and thcipriani: Backport for gerrit:884013ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:27 thcipriani@deploy1002: Started scap: Backport for gerrit:884013ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)
21:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:25 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
21:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
21:20 thcipriani@deploy1002: Finished scap: Backport for gerrit:883952Enable write new for CheckUserLog comment fields everywhere (T233004) (duration: 11m 18s)
21:11 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for gerrit:883952Enable write new for CheckUserLog comment fields everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:09 thcipriani@deploy1002: Started scap: Backport for gerrit:883952Enable write new for CheckUserLog comment fields everywhere (T233004)
21:01 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:36 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
20:13 ryankemper: `ryankemper@thanos-fe1001:~$ sudo run-puppet-agent` following merge of wdqs recording rule patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/883610
20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
20:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
19:56 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS bullseye
19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
19:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.20 refs T325583
19:00 brennen: 1.40.0-wmf.20 train (T325583): no current blockers, rolling to all wikis.
18:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
18:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet
18:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS bullseye
18:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
18:17 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
18:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:15 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
18:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
18:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
18:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
18:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
18:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye
17:55 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
17:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye
17:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json
17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
17:24 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
17:24 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
17:22 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
17:19 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
17:19 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
17:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json
17:12 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
17:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
17:06 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
17:06 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye
17:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
17:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
17:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
17:04 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet
17:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye
17:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
16:59 cgoubert@deploy1002: Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s)
16:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json
16:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
16:53 claime: Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794
16:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027']
16:48 sukhe: correcting earlier log: pooling lvs2007 after T326564
16:48 sukhe: pooling lvs2009 after T326564
16:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json
16:41 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
16:41 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027']
16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
16:38 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
16:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
16:31 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
16:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
16:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
16:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json
16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
16:24 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev
16:23 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev
16:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
16:21 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
16:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
16:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:19 aborrero@cumin2002: START - Cookbook sre.dns.netbox
16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T328024
16:09 moritzm: installing distro-info-data updates from Bullseye point release
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
15:55 aborrero@cumin2002: START - Cookbook sre.dns.netbox
15:51 hashar: Restarting CI Jenkins for upgrade
15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T328024', diff saved to https://phabricator.wikimedia.org/P43419 and previous config saved to /var/cache/conftool/dbconfig/20230126-155000-root.json
15:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
15:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2001-dev.codfw.wmnet
15:46 hashar: Restart Jenkins for upgrade
15:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
15:30 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:30 sukhe: install2003: rm /etc/dhcp/automation/ttyS1-115200/cp2027.conf
15:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:27 sukhe: poweroff lvs2007: T326564
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43418 and previous config saved to /var/cache/conftool/dbconfig/20230126-152329-root.json
15:12 jbond: disabl-puppet deplot requestctl ferm chage gerrit:883935
15:09 sukhe: stop pybal on lvs2007: T326564
15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
15:09 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43417 and previous config saved to /var/cache/conftool/dbconfig/20230126-150824-root.json
15:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
14:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
14:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:31 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiadmin password (T326802) (duration: 07m 04s)
14:27 moritzm: installing containerd security updates
14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43413 and previous config saved to /var/cache/conftool/dbconfig/20230126-142309-root.json
14:16 Lucas_WMDE: UTC afternoon backport+config window done
14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:883122Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) (duration: 09m 16s)
14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:11 jbond: disable puppet fleet wide to role out etcd ferm change gerrit:883888
14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43412 and previous config saved to /var/cache/conftool/dbconfig/20230126-140804-root.json
14:07 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for gerrit:883122Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T328023', diff saved to https://phabricator.wikimedia.org/P43411 and previous config saved to /var/cache/conftool/dbconfig/20230126-140716-root.json
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 primary T328023', diff saved to https://phabricator.wikimedia.org/P43410 and previous config saved to /var/cache/conftool/dbconfig/20230126-140630-root.json
14:06 marostegui: Starting s5 codfw failover from db2123 to db2113 - T328023
14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:883122Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)
14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master T328023', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T328023', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:32 ladsgroup@deploy1002: Finished scap: Backport for gerrit:883723Change time zone setting on gorwiktionary (T327986) (duration: 12m 02s)
13:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:25 moritzm: restarting turnilo for nodejs security update
13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for gerrit:883723Change time zone setting on gorwiktionary (T327986) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:20 ladsgroup@deploy1002: Started scap: Backport for gerrit:883723Change time zone setting on gorwiktionary (T327986)
13:10 moritzm: installing nodejs security updates on bullseye
13:09 hashar: Rebooting gerrit2002.wikimedia.org host to validate Apache 2 services starts AFTER network went online | T326125
13:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
12:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
12:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp3051.esams.wmnet with reason: T323717
12:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717
12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be
12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn
12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: T323717
12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
12:03 jbond: enable profile::base::firewall::defs_from_etcd: true globally
11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors
11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors
11:49 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
11:49 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
11:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
11:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts flowspec1001
11:36 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux
11:29 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
11:29 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json
11:03 hashar: Restarted Apache 2 on gerrit.wikimedia.org
10:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
10:54 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
10:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json
10:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
10:46 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
10:45 moritzm: installing postgresql-13 security updates
10:43 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
10:42 joal@deploy1002: Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)
10:42 joal@deploy1002: Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)
10:41 claime: cgoubert@authdns1001:~$ sudo -i authdns-update
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
09:58 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435] (duration: 01m 08s)
09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435]
09:57 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435] (duration: 00m 05s)
09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435]
09:56 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435] (duration: 07m 00s)
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43397 and previous config saved to /var/cache/conftool/dbconfig/20230126-095257-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43396 and previous config saved to /var/cache/conftool/dbconfig/20230126-095205-root.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43395 and previous config saved to /var/cache/conftool/dbconfig/20230126-094933-root.json
09:49 joal@deploy1002: Started deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435]
09:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:48 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43394 and previous config saved to /var/cache/conftool/dbconfig/20230126-093700-root.json
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43393 and previous config saved to /var/cache/conftool/dbconfig/20230126-093620-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43392 and previous config saved to /var/cache/conftool/dbconfig/20230126-093428-root.json
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43391 and previous config saved to /var/cache/conftool/dbconfig/20230126-093303-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T313811', diff saved to https://phabricator.wikimedia.org/P43390 and previous config saved to /var/cache/conftool/dbconfig/20230126-092512-root.json
09:24 marostegui: Starting x2 codfw failover from db2142 to db2144 - T328001
09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43389 and previous config saved to /var/cache/conftool/dbconfig/20230126-092155-root.json
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43388 and previous config saved to /var/cache/conftool/dbconfig/20230126-092115-root.json
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43387 and previous config saved to /var/cache/conftool/dbconfig/20230126-091923-root.json
09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43386 and previous config saved to /var/cache/conftool/dbconfig/20230126-091758-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43385 and previous config saved to /var/cache/conftool/dbconfig/20230126-090650-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43384 and previous config saved to /var/cache/conftool/dbconfig/20230126-090610-root.json
09:05 phedenskog@deploy1002: Finished deploy [performance/navtiming@e5fdd6e]: (no justification provided) (duration: 00m 06s)
09:05 phedenskog@deploy1002: Started deploy [performance/navtiming@e5fdd6e]: (no justification provided)
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43383 and previous config saved to /var/cache/conftool/dbconfig/20230126-090418-root.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T328000', diff saved to https://phabricator.wikimedia.org/P43382 and previous config saved to /var/cache/conftool/dbconfig/20230126-090302-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43381 and previous config saved to /var/cache/conftool/dbconfig/20230126-090253-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary T328000', diff saved to https://phabricator.wikimedia.org/P43380 and previous config saved to /var/cache/conftool/dbconfig/20230126-090212-root.json
09:02 marostegui: Starting s7 codfw failover from db2121 to db2118 - T328000
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43379 and previous config saved to /var/cache/conftool/dbconfig/20230126-085145-root.json
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43378 and previous config saved to /var/cache/conftool/dbconfig/20230126-085105-root.json
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43377 and previous config saved to /var/cache/conftool/dbconfig/20230126-084748-root.json
08:44 moritzm: added Eoghan to pwstore
08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T328000', diff saved to https://phabricator.wikimedia.org/P43376 and previous config saved to /var/cache/conftool/dbconfig/20230126-084112-root.json
08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43375 and previous config saved to /var/cache/conftool/dbconfig/20230126-083640-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43374 and previous config saved to /var/cache/conftool/dbconfig/20230126-083600-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T327999', diff saved to https://phabricator.wikimedia.org/P43373 and previous config saved to /var/cache/conftool/dbconfig/20230126-083543-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 primary T327999', diff saved to https://phabricator.wikimedia.org/P43372 and previous config saved to /var/cache/conftool/dbconfig/20230126-083459-root.json
08:34 marostegui: Starting s3 codfw failover from db2105 to db2127 - T327999
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43371 and previous config saved to /var/cache/conftool/dbconfig/20230126-083243-root.json
08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T327999', diff saved to https://phabricator.wikimedia.org/P43370 and previous config saved to /var/cache/conftool/dbconfig/20230126-082432-root.json
08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43369 and previous config saved to /var/cache/conftool/dbconfig/20230126-082055-root.json
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43368 and previous config saved to /var/cache/conftool/dbconfig/20230126-082038-root.json
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T327998', diff saved to https://phabricator.wikimedia.org/P43367 and previous config saved to /var/cache/conftool/dbconfig/20230126-081916-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 primary T327998', diff saved to https://phabricator.wikimedia.org/P43366 and previous config saved to /var/cache/conftool/dbconfig/20230126-081818-root.json
08:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - T327998
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43365 and previous config saved to /var/cache/conftool/dbconfig/20230126-081738-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43364 and previous config saved to /var/cache/conftool/dbconfig/20230126-080533-root.json
08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T327998', diff saved to https://phabricator.wikimedia.org/P43363 and previous config saved to /var/cache/conftool/dbconfig/20230126-080427-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43362 and previous config saved to /var/cache/conftool/dbconfig/20230126-080233-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T327997', diff saved to https://phabricator.wikimedia.org/P43361 and previous config saved to /var/cache/conftool/dbconfig/20230126-080159-root.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T327997', diff saved to https://phabricator.wikimedia.org/P43360 and previous config saved to /var/cache/conftool/dbconfig/20230126-080033-root.json
08:00 marostegui: Starting s1 codfw failover from db2103 to db2112 - T327997
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43359 and previous config saved to /var/cache/conftool/dbconfig/20230126-075028-root.json
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
07:48 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T327997', diff saved to https://phabricator.wikimedia.org/P43358 and previous config saved to /var/cache/conftool/dbconfig/20230126-073616-root.json
07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43357 and previous config saved to /var/cache/conftool/dbconfig/20230126-073523-root.json
07:25 marostegui@deploy1002: Finished scap: Backport for gerrit:883699ProductionServices.php: Depool pc2011 (T327925) (duration: 11m 19s)
07:25 dcausse: T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
07:23 marostegui: Failover m1 from db1195 to db1176 - T327800
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:16 marostegui@deploy1002: marostegui: Backport for gerrit:883699ProductionServices.php: Depool pc2011 (T327925) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:14 marostegui@deploy1002: Started scap: Backport for gerrit:883699ProductionServices.php: Depool pc2011 (T327925)
07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T327861
06:48 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet
06:48 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye
06:32 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiuser password (T326802) (duration: 07m 23s)
06:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
06:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 T327861', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json
06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
05:57 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye
05:53 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet
05:53 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye
05:32 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
05:28 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
05:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye
05:09 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet
05:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye
04:45 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
04:42 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
04:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye
04:22 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet
04:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye
03:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
03:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
03:29 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye
03:27 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet
03:27 ejegg: payments-wiki upgraded from 08b8c3bc to 82d89841
03:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye
03:04 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
03:01 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
02:41 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye
02:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
02:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
02:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
01:58 ejegg: restarted fundraising scheduled jobs after queue server reboot
01:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=ats-be
01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=cdn
01:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
01:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=ats-be
01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=cdn
01:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS bullseye
01:24 ejegg: payments-wiki upgraded from 15395d05 to 08b8c3bc (upgraded from MW 1.35 to MW 1.39)
01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
01:00 ejegg: turned pending transaction resolvers back on after civi deploy
00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
00:50 ejegg: civicrm upgraded from 3e6b21b6 to b5d6a790
00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
00:49 sukhe: depool cp2028 for testing firmware update cookbook: T321309
00:49 ejegg: disabled pending transaction resolvers for civi deploy
00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn

2023-01-25

23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
23:21 zabe@deploy1002: Started scap: (no justification provided)
23:20 zabe@deploy1002: Backport cancelled.
23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
21:34 samtar@deploy1002: Finished scap: Backport for gerrit:883617Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), gerrit:883616Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) (duration: 09m 27s)
21:26 samtar@deploy1002: jdrewniak and samtar: Backport for gerrit:883617Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), gerrit:883616Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) synced to the testservers: mwdebug2002.cod
21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:24 samtar@deploy1002: Started scap: Backport for gerrit:883617Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), gerrit:883616Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)
21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 ejegg: updated employers.csv on paymentswiki
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325583 (duration: 07m 04s)
19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325583
19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
19:01 brennen: 1.40.0-wmf.20 train (T325583): no blockers, rolling to group1.
19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service T327405
16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:50 robh: db1139 ilom wins/netbios disabled and ilom reset T327877
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
15:43 robh: netbios wins disabled on db1140 ilom and ilom reset T327877
15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
15:38 papaul: on going maintenance on fasw-c-eqiad
15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for T327824 (duration: 07m 57s)
15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for T327824
15:04 urbanecm@deploy1002: Finished scap: Backport for gerrit:882615Enable the Wikibase REST API on Wikidata (T324999) (duration: 08m 43s)
15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
15:01 urbanecm: Overrunning B&C window
14:57 urbanecm@deploy1002: urbanecm and migr: Backport for gerrit:882615Enable the Wikibase REST API on Wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
14:55 urbanecm@deploy1002: Started scap: Backport for gerrit:882615Enable the Wikibase REST API on Wikidata (T324999)
14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
14:53 urbanecm@deploy1002: Finished scap: Backport for gerrit:883224REST: Use error log level for unexpected errors (T327490), gerrit:883547User impact: amend incorrect parameter for the single day streak text (T327824) (duration: 32m 21s)
14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for gerrit:883224REST: Use error log level for unexpected errors (T327490), gerrit:883547User impact: amend incorrect parameter for the single day streak text (T327824) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
14:21 urbanecm@deploy1002: Started scap: Backport for gerrit:883224REST: Use error log level for unexpected errors (T327490), gerrit:883547User impact: amend incorrect parameter for the single day streak text (T327824)
14:16 urbanecm@deploy1002: Finished scap: Backport for gerrit:883222Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) (duration: 12m 59s)
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for gerrit:883222Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
14:03 urbanecm@deploy1002: Started scap: Backport for gerrit:883222Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)
13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
12:45 moritzm: restarting Exim on MXes to pick up new libtasn
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 moritzm: installing libtasn security updates on buster
11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump (T325942)
11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
09:30 Emperor: rolling depool & update of thanos front-ends T327871
08:40 XioNoX: bump SGIX max prefix limit
08:13 ladsgroup@deploy1002: Finished scap: Backport for gerrit:883221Add sandbox link to Serbo-Croatian Wikipedia (T327833) (duration: 10m 13s)
08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for gerrit:883221Add sandbox link to Serbo-Croatian Wikipedia (T327833) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:03 ladsgroup@deploy1002: Started scap: Backport for gerrit:883221Add sandbox link to Serbo-Croatian Wikipedia (T327833)
07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) T327859
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 T327859', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
07:08 AndyRussG: updated payments (config only) revision 15395d05, config 418160e9
04:10 eileen: config revision changed from dc0a0d3a to 089d0acb
04:01 eileen: civicrm upgraded from 9197ca29 to 3e6b21b6
03:27 eileen: civicrm upgraded from f6093fb2 to 9197ca29
03:05 eileen: config revision changed from 3f641fce to dc0a0d3a
01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye

2023-01-24

23:10 zabe@deploy1002: Finished scap: Backport for gerrit:883281Start reading from rev_comment_id on testcommonswiki (T299954) (duration: 08m 02s)
23:04 zabe@deploy1002: zabe: Backport for gerrit:883281Start reading from rev_comment_id on testcommonswiki (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
23:02 zabe@deploy1002: Started scap: Backport for gerrit:883281Start reading from rev_comment_id on testcommonswiki (T299954)
22:47 TheresNoTime: closing UTC late backport window
22:47 samtar@deploy1002: Finished scap: Backport for gerrit:883212Add temporary extra grid-area for content translation extension (T327715), gerrit:883217Add temporary extra grid-area for content translation extension (T327715) (duration: 09m 04s)
22:39 samtar@deploy1002: jdrewniak and samtar: Backport for gerrit:883212Add temporary extra grid-area for content translation extension (T327715), gerrit:883217Add temporary extra grid-area for content translation extension (T327715) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:37 samtar@deploy1002: Started scap: Backport for gerrit:883212Add temporary extra grid-area for content translation extension (T327715), gerrit:883217Add temporary extra grid-area for content translation extension (T327715)
22:30 samtar@deploy1002: Finished scap: Backport for [[gerrit:883282|[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], gerrit:883285newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) (duration: 07m 59s)
22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [[gerrit:883282|[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], gerrit:883285newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:22 samtar@deploy1002: Started scap: Backport for [[gerrit:883282|[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], gerrit:883285newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)
22:20 samtar@deploy1002: Finished scap: Backport for gerrit:882681newiki: Add new permissions to group reviewer (T327114) (duration: 09m 02s)
22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. T327813
22:13 samtar@deploy1002: samtar and stang: Backport for gerrit:882681newiki: Add new permissions to group reviewer (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:11 samtar@deploy1002: Started scap: Backport for gerrit:882681newiki: Add new permissions to group reviewer (T327114)
22:08 samtar@deploy1002: Finished scap: Backport for gerrit:883213Fix Wikitext editor preview layout in Vector 2022 (T327778), gerrit:883216Fix Wikitext editor preview layout in Vector 2022 (T327778) (duration: 09m 36s)
22:06 TheresNoTime: extending UTC late backport window due to late start
22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
22:00 samtar@deploy1002: samtar and jdrewniak: Backport for gerrit:883213Fix Wikitext editor preview layout in Vector 2022 (T327778), gerrit:883216Fix Wikitext editor preview layout in Vector 2022 (T327778) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:59 samtar@deploy1002: Started scap: Backport for gerrit:883213Fix Wikitext editor preview layout in Vector 2022 (T327778), gerrit:883216Fix Wikitext editor preview layout in Vector 2022 (T327778)
21:56 samtar@deploy1002: Finished scap: Backport for gerrit:882727Work around sticky-positioned layers disabling subpixel rendering (T327460) (duration: 13m 31s)
21:45 samtar@deploy1002: nray and samtar: Backport for gerrit:882727Work around sticky-positioned layers disabling subpixel rendering (T327460) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:43 samtar@deploy1002: Started scap: Backport for gerrit:882727Work around sticky-positioned layers disabling subpixel rendering (T327460)
21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # T275246
21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
21:03 TheresNoTime: holding UTC late backport window for outage, T327815
21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- T325132
20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
20:16 bblack: pool cp5032
20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
20:16 mutante: contint2001 - restarted zuul
20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: T326634
19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: T326634
19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: T326634
19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
19:39 urandom: rebooting restbase cassandra nodes, row d -- T325132
19:33 bblack: cp5032: restart varnish-frontend
19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: T326634
19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20 refs T325583
19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
18:55 jynus: deploy new dump grants for analytics dbs at db1108 T327155
18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
17:19 thcipriani: restarting ci jenkins for updates
17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
17:04 urandom: rebooting restbase cassandra nodes, row c -- T325132
16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:25 TheresNoTime: close UTC afternoon backport window
14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:20 XioNoX: repool ulsfo (maintenance over)
14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
14:17 samtar@deploy1002: Finished scap: Backport for gerrit:868127Increase PC writes from parsoid API to 10% (T320534) (duration: 07m 41s)
14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:11 samtar@deploy1002: daniel and samtar: Backport for gerrit:868127Increase PC writes from parsoid API to 10% (T320534) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:09 samtar@deploy1002: Started scap: Backport for gerrit:868127Increase PC writes from parsoid API to 10% (T320534)
13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:44 XioNoX: reboot ulsfo switches for software upgrade
13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
12:56 zabe@deploy1002: Finished scap: Backport for gerrit:881468Remove PoolCounter from extension-list (T327336) (duration: 44m 09s)
12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:48 XioNoX: restart ulsfo switches for network maintenance
12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:38 zabe@deploy1002: zabe: Backport for gerrit:881468Remove PoolCounter from extension-list (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
12:12 zabe@deploy1002: Started scap: Backport for gerrit:881468Remove PoolCounter from extension-list (T327336)
11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
11:26 zabe@deploy1002: Finished scap: Backport for gerrit:881467Stop loading PoolCounter extension (T327336) (duration: 09m 19s)
11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
11:19 zabe@deploy1002: zabe: Backport for gerrit:881467Stop loading PoolCounter extension (T327336) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:17 zabe@deploy1002: Started scap: Backport for gerrit:881467Stop loading PoolCounter extension (T327336)
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
10:49 XioNoX: depool ulsfo for network maintenance - T316532
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 T326116', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 T326116', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
10:33 vgutierrez: repool cp4046
10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:31 vgutierrez: restarting varnish on cp4046
10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:29 vgutierrez: depool cp4046
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 T327754', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
10:17 effie: depooling maps from equad && pooling maps on codfw
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T327754', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - T327754
10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T327754', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
09:41 moritzm: installing libtasn1-6 security updates on buster
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
09:14 kart_: Done: UTC morning backport window
09:13 kartik@deploy1002: Finished scap: Backport for gerrit:878853Remove Kartographer versioned mapdata flags (T326288) (duration: 09m 44s)
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
09:05 kartik@deploy1002: awight and kartik: Backport for gerrit:878853Remove Kartographer versioned mapdata flags (T326288) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:03 kartik@deploy1002: Started scap: Backport for gerrit:878853Remove Kartographer versioned mapdata flags (T326288)
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
09:01 kartik@deploy1002: Finished scap: Backport for gerrit:875463Deprecate the EnableMapFrame feature flag (T326288) (duration: 10m 42s)
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
08:52 kartik@deploy1002: awight and kartik: Backport for gerrit:875463Deprecate the EnableMapFrame feature flag (T326288) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:50 kartik@deploy1002: Started scap: Backport for gerrit:875463Deprecate the EnableMapFrame feature flag (T326288)
08:48 kartik@deploy1002: Finished scap: Backport for gerrit:882240Enable write new for CheckUserLog comment fields on testwikis (T233004) (duration: 15m 20s)
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T327745', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw T327745', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - T327745
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T327745', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for gerrit:882240Enable write new for CheckUserLog comment fields on testwikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
08:33 kartik@deploy1002: Started scap: Backport for gerrit:882240Enable write new for CheckUserLog comment fields on testwikis (T233004)
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
08:28 kartik@deploy1002: Finished scap: Backport for gerrit:883098Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) (duration: 09m 09s)
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API T327739', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 T327739', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
08:21 kartik@deploy1002: kartik and matmarex: Backport for gerrit:883098Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary T327739', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
08:19 kartik@deploy1002: Started scap: Backport for gerrit:883098Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)
08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - T327739
08:16 kartik@deploy1002: Finished scap: Backport for gerrit:882266Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) (duration: 10m 25s)
08:07 kartik@deploy1002: kartik: Backport for gerrit:882266Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:05 kartik@deploy1002: Started scap: Backport for gerrit:882266Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)
07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T327739', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20 refs T325583 (duration: 53m 01s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20 refs T325583
03:30 AndyRussG: payments-wiki upgraded from 3d882ac7 to 15395d05
02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
00:14 zabe@deploy1002: Finished scap: Backport for gerrit:881466Use core's PoolCounterClient (T327336) (duration: 12m 47s)
00:03 zabe@deploy1002: zabe: Backport for gerrit:881466Use core's PoolCounterClient (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
00:01 zabe@deploy1002: Started scap: Backport for gerrit:881466Use core's PoolCounterClient (T327336)

2023-01-23

23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
22:31 maryum: Deployed patch for T285159
21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
21:40 zabe@deploy1002: Finished scap: Backport for gerrit:882746throttle: Remove expired rule (duration: 07m 45s)
21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
21:34 zabe@deploy1002: zabe: Backport for gerrit:882746throttle: Remove expired rule synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:32 zabe@deploy1002: Started scap: Backport for gerrit:882746throttle: Remove expired rule
21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
21:12 kindrobot: close UTC late backport window
21:12 kindrobot@deploy1002: Finished scap: Backport for gerrit:882715Enable Page Tools for logged-in users on enwiki (T327686) (duration: 09m 00s)
21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for gerrit:882715Enable Page Tools for logged-in users on enwiki (T327686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:03 kindrobot@deploy1002: Started scap: Backport for gerrit:882715Enable Page Tools for logged-in users on enwiki (T327686)
21:01 kindrobot: start UTC late backport window
20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
20:45 taavi: restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: gerrit:882682 Bumping portals to master (T128546) (duration: 06m 48s)
16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:882682 Bumping portals to master (T128546) (duration: 06m 48s)
16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
15:50 urbanecm: Deploy security patch for T327613
15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
15:44 papaul: on going maintenance on fasw-codfw
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
15:09 taavi@deploy1002: Finished scap: Backport for gerrit:882661Revert "Enable Linter write namespace tag and template using core config" (duration: 07m 28s)
15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for gerrit:882661Revert "Enable Linter write namespace tag and template using core config" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
15:02 taavi@deploy1002: Started scap: Backport for gerrit:882661Revert "Enable Linter write namespace tag and template using core config"
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
15:00 taavi@deploy1002: Finished scap: Backport for gerrit:880989Enable Linter write namespace tag and template using core config (T299612) (duration: 07m 56s)
14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:53 taavi@deploy1002: taavi and sbailey: Backport for gerrit:880989Enable Linter write namespace tag and template using core config (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
14:52 taavi@deploy1002: Started scap: Backport for gerrit:880989Enable Linter write namespace tag and template using core config (T299612)
14:46 taavi@deploy1002: Finished scap: Backport for gerrit:882179SpecialUserrights: Allow updating the expiry of user groups (T327605) (duration: 08m 48s)
14:42 sukhe: rolling out pybal 1.15.10: T321191
14:39 taavi@deploy1002: taavi and func: Backport for gerrit:882179SpecialUserrights: Allow updating the expiry of user groups (T327605) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:37 taavi@deploy1002: Started scap: Backport for gerrit:882179SpecialUserrights: Allow updating the expiry of user groups (T327605)
14:37 taavi@deploy1002: Finished scap: Backport for gerrit:876196zhwiki: Install PageAssessments (T326387) (duration: 11m 24s)
14:27 taavi@deploy1002: stang and taavi: Backport for gerrit:876196zhwiki: Install PageAssessments (T326387) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
14:25 taavi@deploy1002: Started scap: Backport for gerrit:876196zhwiki: Install PageAssessments (T326387)
14:25 taavi@deploy1002: Finished scap: Backport for gerrit:882422bnwikiquote: Update logo (T323131), gerrit:882425shnwikibooks: Add project logo (T327380) (duration: 09m 22s)
14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
14:17 taavi@deploy1002: taavi and stang: Backport for gerrit:882422bnwikiquote: Update logo (T323131), gerrit:882425shnwikibooks: Add project logo (T327380) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:16 taavi@deploy1002: Started scap: Backport for gerrit:882422bnwikiquote: Update logo (T323131), gerrit:882425shnwikibooks: Add project logo (T327380)
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
11:57 marostegui: Reboot db2132 (m1 codfw master)
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - T327644
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
10:55 XioNoX: update management routers ACLs to add new bast hosts
10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
10:07 ladsgroup@deploy1002: Finished scap: Backport for gerrit:877244Remove Flow as default in techconductwiki (duration: 07m 51s)
10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
10:01 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:877244Remove Flow as default in techconductwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:59 ladsgroup@deploy1002: Started scap: Backport for gerrit:877244Remove Flow as default in techconductwiki
09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
08:45 zabe@deploy1002: Finished scap: Backport for gerrit:882217Remove oversight group from privileged groups (T112147), gerrit:882577Start reading from cuc_comment_id on wikidatawiki (T233004) (duration: 07m 48s)
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
08:39 zabe@deploy1002: zabe: Backport for gerrit:882217Remove oversight group from privileged groups (T112147), gerrit:882577Start reading from cuc_comment_id on wikidatawiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:37 zabe@deploy1002: Started scap: Backport for gerrit:882217Remove oversight group from privileged groups (T112147), gerrit:882577Start reading from cuc_comment_id on wikidatawiki (T233004)
08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
08:30 ladsgroup@deploy1002: Finished scap: Backport for gerrit:882174Tweaks for new heading HTML structure (T327328 T327469) (duration: 17m 12s)
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for gerrit:882174Tweaks for new heading HTML structure (T327328 T327469) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:12 ladsgroup@deploy1002: Started scap: Backport for gerrit:882174Tweaks for new heading HTML structure (T327328 T327469)
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
06:23 kart_: Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - T327611
04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - T327609

2023-01-20

18:22 jynus: deploying new grants for backups on m1 T327155
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
13:08 moritzm: installing node-minimatch security updates
13:01 moritzm: installing libxstream-java security updates
13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557
12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
12:17 moritzm: installing ping1003 T273509
12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:13 moritzm: installing emacs security updates on bullseye
10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci T326531
10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
08:59 moritzm: installing ping2003 T273509
08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
01:55 ejegg: payments-wiki upgraded from 3cf03933 to 3d882ac7
01:12 ejegg: payments-wiki upgraded from fcb9ab60 to 3cf03933

2023-01-19

21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
21:42 jdrewniak@deploy1002: Finished scap: Backport for gerrit:881677Enable Page tools on viwiki and itwiki (T327348) (duration: 10m 38s)
21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for gerrit:881677Enable Page tools on viwiki and itwiki (T327348) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
21:31 jdrewniak@deploy1002: Started scap: Backport for gerrit:881677Enable Page tools on viwiki and itwiki (T327348)
21:27 jdrewniak@deploy1002: Finished scap: Backport for gerrit:881612Fix grid blowout with limited width turned off (T327423) (duration: 08m 26s)
21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for gerrit:881612Fix grid blowout with limited width turned off (T327423) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:18 jdrewniak@deploy1002: Started scap: Backport for gerrit:881612Fix grid blowout with limited width turned off (T327423)
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
20:05 zabe@deploy1002: Started scap: fix k8s drift
20:02 zabe@deploy1002: Finished scap: Backport for gerrit:881706Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) (duration: 14m 01s)
19:49 zabe@deploy1002: zabe: Backport for gerrit:881706Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
19:48 zabe@deploy1002: Started scap: Backport for gerrit:881706Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)
18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # T233004
18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
17:13 zabe@deploy1002: Finished scap: T233004 (duration: 18m 50s)
17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
16:54 zabe@deploy1002: Started scap: T233004
16:54 zabe@deploy1002: backport aborted: (duration: 15m 22s)
16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - T327161
16:44 zabe@deploy1002: Started scap: Backport for gerrit:881609Add ability to start from cuc_id to populateCucComment (T233004)
16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
16:27 moritzm: installing cryptsetup updates for bullseye
16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
15:55 sukhe: update pybal to 1.15.10 on lvs4010: T321191
15:45 effie: enable puppet on C:memcached hosts
15:42 godog: bounce opensearch on logstash102[34] - T327161
15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: T321191
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
15:17 effie: disable puppet on all C:memcached servers to deploy 812173
15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
14:32 zabe: run populateCulComment on group2 wikis # T327290
14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 T323820
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:29 hnowlan: rebooting maps-codfw for updates
11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:44 hnowlan: rebooting maps-eqiad for updates
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
09:55 moritzm: installing ping3003 T273509
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
08:26 moritzm: installing sudo security updates
07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - T327372
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372

2023-01-18

23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # T327290
23:42 cstone: civicrm upgraded from 164270b0 to f6093fb2
22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
21:50 kindrobot: close UTC late backport window
21:50 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] (duration: 10m 45s)
21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:39 kindrobot@deploy1002: Started scap: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]]
21:36 kindrobot@deploy1002: Finished scap: Backport for gerrit:881451Bump English Wikipedia event logging from 0.5 to 1% (T326892), gerrit:881431Legacy Vector is not a responsive skin (T327256) (duration: 13m 01s)
21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for gerrit:881451Bump English Wikipedia event logging from 0.5 to 1% (T326892), gerrit:881431Legacy Vector is not a responsive skin (T327256) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for gerrit:881451Bump English Wikipedia event logging from 0.5 to 1% (T326892), gerrit:881431Legacy Vector is not a responsive skin (T327256)
21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
21:03 kindrobot: start UTC late backport window
20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:878927Enable the REST API on test-wikidata (T324999) (duration: 09m 38s)
18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for gerrit:878927Enable the REST API on test-wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:878927Enable the REST API on test-wikidata (T324999)
17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
17:42 jnuche@deploy1002: install-world aborted: (duration: 07m 17s)
17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
16:39 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881023|[100%] English Wikipedia uses Vector 2022 skin]] (duration: 09m 27s)
16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881023|[100%] English Wikipedia uses Vector 2022 skin]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
16:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881023|[100%] English Wikipedia uses Vector 2022 skin]]
16:20 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881022|[75%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 24s)
16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881022|[75%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
16:11 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881022|[75%] English Wikipedia uses Vector 2022 skin (T326892)]]
16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
15:58 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881021|[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] (duration: 08m 52s)
15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881021|[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:49 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881021|[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]]
15:44 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881020|[25%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 06s)
15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881020|[25%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:35 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881020|[25%] English Wikipedia uses Vector 2022 skin (T326892)]]
15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- T327001
15:29 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:879659|[10%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 11m 30s)
15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:879659|[10%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:17 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:879659|[10%] English Wikipedia uses Vector 2022 skin (T326892)]]
15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:880921Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) (duration: 09m 11s)
15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:880921Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:880921Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)
15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:880920Revert gallery changes in 1.40.0-wmf.18 (T326990) (duration: 13m 04s)
15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:880920Revert gallery changes in 1.40.0-wmf.18 (T326990) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:880920Revert gallery changes in 1.40.0-wmf.18 (T326990)
14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:881045Revert "Breaking upgrade: mapdata" (T327151) (duration: 10m 33s)
14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for gerrit:881045Revert "Breaking upgrade: mapdata" (T327151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:881045Revert "Breaking upgrade: mapdata" (T327151)
14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:879946|Write to cul_reason[_plaintext]_id everywhere (T233004)]] (duration: 19m 54s)
14:23 moritzm: installing mod-wsgi security updates
14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for [[gerrit:879946|Write to cul_reason[_plaintext]_id everywhere (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:879946|Write to cul_reason[_plaintext]_id everywhere (T233004)]]
13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
11:16 volans@cumin1001: START - Cookbook sre.network.cf
11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
11:15 volans@cumin1001: START - Cookbook sre.network.cf
11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:11 volans@cumin2002: START - Cookbook sre.network.cf
11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:10 volans@cumin1001: START - Cookbook sre.network.cf
11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:10 volans@cumin1001: START - Cookbook sre.network.cf
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
10:59 volans@cumin1001: START - Cookbook sre.network.cf
10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
10:21 zabe@deploy1002: Finished scap: Backport for gerrit:881361Start reading from cuc_comment_id from a few wikis (T233004) (duration: 09m 17s)
10:14 zabe@deploy1002: zabe and zabe: Backport for gerrit:881361Start reading from cuc_comment_id from a few wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
10:12 zabe@deploy1002: Started scap: Backport for gerrit:881361Start reading from cuc_comment_id from a few wikis (T233004)
09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:49 godog: start migration from webperf1004 to arclamp1001 - T319434
09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19 refs T325582 (duration: 08m 20s)
09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19 refs T325582
08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # T327118
00:26 zabe@deploy1002: Finished scap: Backport for gerrit:881030Add script to rename a change tag in wmf prod (T327118) (duration: 08m 29s)
00:20 zabe@deploy1002: zabe and zabe: Backport for gerrit:881030Add script to rename a change tag in wmf prod (T327118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
00:18 zabe@deploy1002: Started scap: Backport for gerrit:881030Add script to rename a change tag in wmf prod (T327118)
00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # T312153
00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # T312153

2023-01-17

23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request phab:T327149T327149" # T327149
23:33 zabe@deploy1002: Finished scap: Backport for gerrit:880905Start reading from cuc_comment_id on testwiki (T233004), gerrit:880904Start reading from cuc_actor everywhere (T233004) (duration: 09m 58s)
23:25 zabe@deploy1002: zabe and zabe: Backport for gerrit:880905Start reading from cuc_comment_id on testwiki (T233004), gerrit:880904Start reading from cuc_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
23:24 zabe@deploy1002: Started scap: Backport for gerrit:880905Start reading from cuc_comment_id on testwiki (T233004), gerrit:880904Start reading from cuc_actor everywhere (T233004)
23:19 zabe@deploy1002: Finished scap: Backport for [[gerrit:881026|Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], gerrit:880925Revert "Add read new support for cu_log comment ID columns" (T327219) (duration: 11m 46s)
23:09 zabe@deploy1002: zabe and dreamyjazz and zabe: Backport for [[gerrit:881026|Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], gerrit:880925Revert "Add read new support for cu_log comment ID columns" (T327219) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
23:07 zabe@deploy1002: Started scap: Backport for [[gerrit:881026|Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], gerrit:880925Revert "Add read new support for cu_log comment ID columns" (T327219)
23:06 zabe@deploy1002: Finished scap: Backport for gerrit:880903Stop writing to cul_user and cul_user_text everywhere (T233004), gerrit:880902Start writing to rev_comment_id everywhere (T299954) (duration: 10m 29s)
22:57 zabe@deploy1002: zabe and zabe: Backport for gerrit:880903Stop writing to cul_user and cul_user_text everywhere (T233004), gerrit:880902Start writing to rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
22:55 zabe@deploy1002: Started scap: Backport for gerrit:880903Stop writing to cul_user and cul_user_text everywhere (T233004), gerrit:880902Start writing to rev_comment_id everywhere (T299954)
22:51 bblack: repooling codfw
22:48 ebernhardson@deploy1002: Finished scap: Backport for gerrit:881016Make sticky header edit button default for all wikis (T324799) (duration: 10m 34s)
22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for gerrit:881016Make sticky header edit button default for all wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
22:38 ebernhardson@deploy1002: Started scap: Backport for gerrit:881016Make sticky header edit button default for all wikis (T324799)
22:30 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=non-existent1001
22:27 ebernhardson@deploy1002: Finished scap: Backport for gerrit:880915Resolve deprecations and type changes in elastica 7.3.0, gerrit:880917UpdateSuggesterIndex: Properly cleanup bad indices (duration: 09m 42s)
22:25 bblack: cp2031: restart ats-be
22:20 ebernhardson@deploy1002: ebernhardson and ebernhardson: Backport for gerrit:880915Resolve deprecations and type changes in elastica 7.3.0, gerrit:880917UpdateSuggesterIndex: Properly cleanup bad indices synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:18 ebernhardson@deploy1002: Started scap: Backport for gerrit:880915Resolve deprecations and type changes in elastica 7.3.0, gerrit:880917UpdateSuggesterIndex: Properly cleanup bad indices
22:14 ebernhardson@deploy1002: Finished scap: Backport for gerrit:880533Show edit button in sticky header for desktop-improvement wikis (T324799) (duration: 10m 43s)
22:05 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for gerrit:880533Show edit button in sticky header for desktop-improvement wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:04 ebernhardson@deploy1002: Started scap: Backport for gerrit:880533Show edit button in sticky header for desktop-improvement wikis (T324799)
21:54 ebernhardson: Finished scap: Backport for gerrit:880913Table of contents Collapse/Expand not working (T327064)
21:54 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:881008|Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] (duration: 09m 20s)
21:52 zabe: zabe@mwmaint1002:~$ mwscript extensions/CheckUser/maintenance/populateCulComment.php --wiki testwiki
21:46 ebernhardson@deploy1002: ebernhardson and trainbranchbot: Backport for [[gerrit:881008|Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:44 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:881008|Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]]
21:42 ebernhardson@deploy1002: Sync cancelled.
21:35 ebernhardson@deploy1002: ebernhardson and dreamyjazz: Backport for [[gerrit:879653|Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:34 ebernhardson: scap also backporting gerrit:880913Table of contents Collapse/Expand not working (T327064)
21:34 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:879653|Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]]
21:29 ebernhardson@deploy1002: Finished scap: Backport for gerrit:880568Enable Phonos on afwiktionary and arwiki (T324561) (duration: 12m 21s)
21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for gerrit:880568Enable Phonos on afwiktionary and arwiki (T324561) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:17 ebernhardson@deploy1002: Started scap: Backport for gerrit:880568Enable Phonos on afwiktionary and arwiki (T324561)
21:00 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (had been left depooled from previous powercycle)
20:47 ryankemper: [WDQS] Depooled `wdqs1016`
20:25 herron: ran preferred-replica-election on kafka-logging codfw to clear replica imbalance
20:18 ryankemper: [WDQS] Restart blazegraph on `wdqs1016` to clear alert: `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
20:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.19 refs T325582
20:04 eileen: config revision changed from 2e5cee3c to 7425df0b
19:50 ryankemper: T327175 Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
19:41 zabe@deploy1002: Finished scap: Backport for gerrit:880916Revert "Revert "Enable visual enhancements on all talk namespaces"" (duration: 10m 25s)
19:32 zabe@deploy1002: zabe and zabe: Backport for gerrit:880916Revert "Revert "Enable visual enhancements on all talk namespaces"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
19:30 zabe@deploy1002: Started scap: Backport for gerrit:880916Revert "Revert "Enable visual enhancements on all talk namespaces""
18:48 zabe@deploy1002: Finished scap: Backport for gerrit:880914Revert "Enable visual enhancements on all talk namespaces" (duration: 09m 08s)
18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
18:41 zabe@deploy1002: zabe and zabe: Backport for gerrit:880914Revert "Enable visual enhancements on all talk namespaces" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:39 zabe@deploy1002: Started scap: Backport for gerrit:880914Revert "Enable visual enhancements on all talk namespaces"
18:39 zabe@deploy1002: backport aborted: (duration: 00m 26s)
18:35 zabe@deploy1002: backport aborted: (duration: 19m 41s)
18:29 otto@deploy1002: Finished deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac] (duration: 04m 28s)
18:29 otto@deploy1002: Finished deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919] (duration: 00m 15s)
18:29 otto@deploy1002: Started deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919]
18:25 otto@deploy1002: Started deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac]
{{safesubst:SAL entry|1=18:25 zabe@deploy1002: zabe and matmarex and zabe: Backport for gerrit:880908objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), gerrit:878169Use new DiscussionTools heading markup on enwiki (T314714), gerrit:879158Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), gerrit:879159Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[}}
{{safesubst:SAL entry|1=18:23 zabe@deploy1002: Started scap: Backport for gerrit:880908objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), gerrit:878169Use new DiscussionTools heading markup on enwiki (T314714), gerrit:879158Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), gerrit:879159Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[gerrit:879103|}}
18:13 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
18:10 mutante: gerrit1002/gerrit2002: sudo rmdir /srv/gerrit/jvmlogs
18:07 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
18:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
18:05 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
18:01 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
17:58 jynus: restarted es5 codfw backup
17:54 bblack: authdns1001: restart confd
17:27 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=aqs,name=codfw
17:19 effie: pooling back codfw services
17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
17:01 effie: restarting confd on deploy1002
16:59 effie: pooling back depooled mw servers in codfw
16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: T325557
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- T327001
14:52 urandom: disabling Cassandra hinted-handoff for codfw -- T327001
14:27 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:26 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
14:12 _joe_: try to restart cassandra-a on aqs2005
13:37 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw
13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw
13:27 jynus: restarting manually replication on es2020, may require data check afterwards
13:26 _joe_: depooling all services in codfw
13:19 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance
13:15 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
13:14 oblivian@cumin1001: START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance
13:13 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance
13:13 oblivian@cumin1001: START - Cookbook sre.discovery.service-route check citoid: maintenance
13:08 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
13:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
13:01 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.*
12:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
12:35 moritzm: installing ipython security updates
11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
10:11 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.19 refs T325582 (duration: 42m 26s)
09:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s)
09:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided)
09:28 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
09:26 jnuche@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s)
09:26 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
08:47 ladsgroup@deploy1002: Finished scap: Backport for gerrit:879652Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) (duration: 13m 50s)
08:35 ladsgroup@deploy1002: ladsgroup and dreamyjazz: Backport for gerrit:879652Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
08:33 ladsgroup@deploy1002: Started scap: Backport for gerrit:879652Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)
08:29 kartik@deploy1002: Finished scap: Backport for gerrit:879998testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) (duration: 20m 56s)
08:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T327146
08:13 kartik@deploy1002: kartik and kartik: Backport for gerrit:879998testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:08 kartik@deploy1002: Started scap: Backport for gerrit:879998testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)
07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T326134', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T326134', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T326134', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T326134
06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T326134', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134

2023-01-16

17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - T304712
13:34 XioNoX: repool eqiad-eqord link - T304712
12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:50 XioNoX: drain eqiad-eqord link - T304712
12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:43 Amir1: power cycled db1198
12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:48 moritzm: installing libtasn1-6 security updates on Bullseye
10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
08:14 oblivian@deploy1002: Synchronized README: test null deployment for T327041 (duration: 07m 12s)
08:09 Emperor: stopped swift_rclone_sync on ms-be1069
07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]|10).codfw.wmnet
07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]|3[0-4]).codfw.wmnet
07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59|6[0-9]|70).codfw.wmnet
07:26 _joe_: restarting pybal on lvs2009
07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*|appservers|api)-ro,name=codfw
07:10 _joe_: depooling mediawiki in codfw
06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
01:35 Amir1: rolling restart of php-fpm across the fleet
01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
01:29 thcipriani@deploy1002: Finished scap: Backport for gerrit:879798LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) (duration: 24m 47s)
01:15 thcipriani@deploy1002: thcipriani and func: Backport for gerrit:879798LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
01:05 thcipriani@deploy1002: Started scap: Backport for gerrit:879798LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)

2023-01-14

09:46 godog: issue 'request system reboot member 2' - T327001
09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
09:19 Emperor: depool thanos-fe2002 T327001
09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
09:19 Emperor: depool ms-fe2010 T327001

2023-01-13

23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # T298707
17:34 thcipriani@deploy1002: Finished scap: Backport for gerrit:879793TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) (duration: 13m 25s)
17:22 thcipriani@deploy1002: thcipriani and abi: Backport for gerrit:879793TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
17:20 thcipriani@deploy1002: Started scap: Backport for gerrit:879793TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
12:48 moritzm: installing bast6002 T324974
12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
10:53 moritzm: installing bast5003 T324974
10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
09:41 moritzm: installing bast4004 T324974
09:06 moritzm: installing bast3006 T324974
02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED

2023-01-12

23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # T233004
23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
23:08 sbassett: Deployed (temporary) security mitigations for T326691
22:45 mutante: people2002 - apt-get remove --purge rsync
22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # T233004
22:07 thcipriani: end UTC late backport
22:06 thcipriani@deploy1002: Finished scap: Backport for gerrit:879161cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), gerrit:862343cirrus: Disable incoming link counting (T317023) (duration: 09m 23s)
21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref T326668
21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for gerrit:879161cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), gerrit:862343cirrus: Disable incoming link counting (T317023) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
21:57 thcipriani@deploy1002: Started scap: Backport for gerrit:879161cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), gerrit:862343cirrus: Disable incoming link counting (T317023)
21:56 zabe: run populateCucComment.php on testwiki # T233004
21:48 thcipriani@deploy1002: Finished scap: Backport for gerrit:879600nlwiki: Add block right to checkuser group (T326355) (duration: 09m 04s)
21:41 thcipriani@deploy1002: thcipriani and stang: Backport for gerrit:879600nlwiki: Add block right to checkuser group (T326355) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:39 thcipriani@deploy1002: Started scap: Backport for gerrit:879600nlwiki: Add block right to checkuser group (T326355)
21:37 thcipriani@deploy1002: Finished scap: Backport for gerrit:879571looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) (duration: 09m 10s)
21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for gerrit:879571looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:28 thcipriani@deploy1002: Started scap: Backport for gerrit:879571looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)
21:27 thcipriani@deploy1002: Finished scap: Backport for gerrit:879561etwikiquote: Switch logo variant back (T313698) (duration: 09m 25s)
21:21 ejegg: restarted fundraising scheduled jobs
21:19 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
21:19 thcipriani@deploy1002: thcipriani and stang: Backport for gerrit:879561etwikiquote: Switch logo variant back (T313698) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:17 thcipriani@deploy1002: Started scap: Backport for gerrit:879561etwikiquote: Switch logo variant back (T313698)
21:16 thcipriani@deploy1002: Finished scap: Backport for gerrit:868816Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) (duration: 10m 43s)
21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for gerrit:868816Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:05 thcipriani@deploy1002: Started scap: Backport for gerrit:868816Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)
20:43 ejegg: rolled back CiviCRM to 9afd2789
20:31 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18 refs T325581
18:36 mutante: stat1008 - systemctl reset-failed - clears Icinga alerts from failed things of the past
18:35 mutante: stat1007 - systemctl reset-failed - clears Icinga alerts
18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
17:45 mutante: powercycling mc2040 via mgmt ocnsole
17:34 ejegg: civicrm rolled back from 7ecb5038 to 9afd2789
17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
17:08 btullis@cumin1001: Added views for new wiki: aswikiquote T321294
17:05 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
16:31 zabe@deploy1002: Finished scap: Backport for gerrit:879590Stop writing to cul_user and cul_user_text on a few wikis (T233004), gerrit:879591Start writing to rev_comment_id on group1 wikis (T299954) (duration: 09m 49s)
16:23 zabe@deploy1002: zabe and zabe: Backport for gerrit:879590Stop writing to cul_user and cul_user_text on a few wikis (T233004), gerrit:879591Start writing to rev_comment_id on group1 wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
16:21 zabe@deploy1002: Started scap: Backport for gerrit:879590Stop writing to cul_user and cul_user_text on a few wikis (T233004), gerrit:879591Start writing to rev_comment_id on group1 wikis (T299954)
16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
16:08 btullis@cumin1001: Added views for new wiki: bjnwiktionary T312214
15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:44 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:36 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:36 btullis@cumin1001: Added views for new wiki: shnwikibooks T321256
15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
15:28 effie: Planet import in codfw (on maps2009) started at 15:26 UTC - T314472
15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
15:11 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote T321288
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
14:37 moritzm: installing sqlite3 security updates on buster
14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
14:34 taavi: UTC afternoon backports done
14:28 taavi@deploy1002: Finished scap: Backport for gerrit:879101Track callers of parseRevisionParsoidHtml. (duration: 09m 34s)
14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
14:20 taavi@deploy1002: taavi and matmarex: Backport for gerrit:879101Track callers of parseRevisionParsoidHtml. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
14:18 taavi@deploy1002: Started scap: Backport for gerrit:879101Track callers of parseRevisionParsoidHtml.
14:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
14:17 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
14:16 taavi@deploy1002: Finished scap: Backport for gerrit:871272Allow administrators to revoke autopatroller rights on sh.WP (T325938) (duration: 13m 30s)
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
14:05 taavi@deploy1002: taavi and aleksandar: Backport for gerrit:871272Allow administrators to revoke autopatroller rights on sh.WP (T325938) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
14:03 taavi@deploy1002: Started scap: Backport for gerrit:871272Allow administrators to revoke autopatroller rights on sh.WP (T325938)
13:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
13:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
13:28 ladsgroup@deploy1002: Finished scap: Backport for gerrit:879277Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. (duration: 21m 44s)
13:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:08 ladsgroup@deploy1002: ladsgroup and daniel: Backport for gerrit:879277Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:06 ladsgroup@deploy1002: Started scap: Backport for gerrit:879277Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.
13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary T326138
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
12:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
12:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
12:13 XioNoX: repool esams
12:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
11:54 XioNoX: re-seating cr2-esams fpc0 linecard - T318783
11:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
11:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
11:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
11:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
11:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
11:24 urbanecm@deploy1002: Finished scap: Backport for gerrit:879412throttle: Add new rule for cswiki course (T326792) (duration: 07m 47s)
11:17 urbanecm@deploy1002: Started scap: Backport for gerrit:879412throttle: Add new rule for cswiki course (T326792)
11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
11:11 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # T298707
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:24 XioNoX: rollback redirect ns2 to authdns1001 - T316532
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
10:01 XioNoX: reboot asw2-esams for upgrade - T316532
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
09:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
09:47 btullis@cumin1001: Added views for new wiki: pcmwiki T310879
09:46 XioNoX: redirect ns2 to authdns1001 - T316532
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
09:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:55 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
08:53 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
08:50 XioNoX: depool esams for network maintenance - T316532
08:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
08:49 zabe: deployed updated patch for T311337
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
07:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json

2023-01-11

23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
23:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.18 refs T325581 (duration: 06m 57s)
23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18 refs T325581
23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
22:51 zabe@deploy1002: Finished scap: Backport for gerrit:879055Start reading from cuc_actor on group0 and group1 wikis (T233004), gerrit:879148Start writing to rev_comment_id on group0 wikis (T299954), gerrit:879057Stop writing to cul_user and cul_user_text on testwiki (T233004) (duration: 09m 28s)
22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
22:44 zabe@deploy1002: zabe and zabe: Backport for gerrit:879055Start reading from cuc_actor on group0 and group1 wikis (T233004), gerrit:879148Start writing to rev_comment_id on group0 wikis (T299954), gerrit:879057Stop writing to cul_user and cul_user_text on testwiki (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:42 zabe@deploy1002: Started scap: Backport for gerrit:879055Start reading from cuc_actor on group0 and group1 wikis (T233004), gerrit:879148Start writing to rev_comment_id on group0 wikis (T299954), gerrit:879057Stop writing to cul_user and cul_user_text on testwiki (T233004)
22:40 effie: upload memkeys_20181031-2~bullseye0_ on bullseye-wikimedia
22:39 kindrobot: close UTC late backport window
{{safesubst:SAL entry|1=22:38 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:878154|Fix exception in `<gallery mode="slideshow">` with missing images]], gerrit:879100Fix phan error when Excimer is enabled, gerrit:879098Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T30106}}
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
{{safesubst:SAL entry|1=22:21 kindrobot@deploy1002: kindrobot and matmarex: Backport for [[gerrit:878154|Fix exception in `<gallery mode="slideshow">` with missing images]], gerrit:879100Fix phan error when Excimer is enabled, gerrit:879098Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view}}
22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
{{safesubst:SAL entry|1=21:58 kindrobot@deploy1002: Started scap: Backport for [[gerrit:878154|Fix exception in `<gallery mode="slideshow">` with missing images]], gerrit:879100Fix phan error when Excimer is enabled, gerrit:879098Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063}}
21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
21:34 kindrobot@deploy1002: Finished scap: Backport for gerrit:879094Fix mustache template rendering when TOC is rerendered after an edit (T326682), gerrit:879121Enable page tools on beta cluster (duration: 10m 17s)
21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
21:25 kindrobot@deploy1002: kindrobot and jdrewniak and jdlrobson: Backport for gerrit:879094Fix mustache template rendering when TOC is rerendered after an edit (T326682), gerrit:879121Enable page tools on beta cluster synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for gerrit:879094Fix mustache template rendering when TOC is rerendered after an edit (T326682), gerrit:879121Enable page tools on beta cluster
21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
21:06 kindrobot: start UTC late backport window
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad T321391
18:52 brett: Removing legacy vips from dns servers - T239993
18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad T321391
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
18:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
18:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:28 bblack: repool eqsin edge DC
18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
18:22 btullis@cumin1001: Added views for new wiki: blkwiki T310872
18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
18:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
18:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:07 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
18:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
17:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
17:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
17:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
17:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
17:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
17:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
17:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
17:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json
17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:20 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad T321391
17:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad T321391
16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad T321391
16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad T321391
16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad T321391
16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
16:03 volans@cumin1001: START - Cookbook sre.dns.netbox
16:01 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye
16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:53 zabe@deploy1002: Finished scap: T233004 (duration: 07m 54s)
15:45 zabe@deploy1002: Started scap: T233004
15:38 zabe@deploy1002: backport aborted: (duration: 04m 25s)
15:38 zabe@deploy1002: sync-world aborted: Backport for gerrit:878870Start reading from cul_actor everywhere (T233004) (duration: 04m 00s)
15:36 zabe@deploy1002: zabe and zabe: Backport for gerrit:878870Start reading from cul_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
15:34 zabe@deploy1002: Started scap: Backport for gerrit:878870Start reading from cul_actor everywhere (T233004)
15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:21 marostegui: Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) T326669
15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P42982 and previous config saved to /var/cache/conftool/dbconfig/20230111-151712-marostegui.json
14:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:47 Lucas_WMDE: UTC afternoon backport+config window done
14:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
14:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/tests/jest/wikibase.vector.searchClient.spec.js: Backport: gerrit:877972Add missing parentheses to vector search match text (T326633) (2/2) (duration: 06m 46s)
14:42 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/resources/wikibase.vector.searchClient.js: Backport: gerrit:877972Add missing parentheses to vector search match text (T326633) (1/2) (duration: 07m 09s)
14:28 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:877983Fix test constructing HTMLFormField without parent (T326621) (duration: 08m 38s)
14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for gerrit:877983Fix test constructing HTMLFormField without parent (T326621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:877983Fix test constructing HTMLFormField without parent (T326621)
14:14 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
14:10 moritzm: installing postgresql 11 security updates on maps/eqiad
14:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
13:55 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37002
13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37002
13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3302
13:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
12:42 moritzm: installing postgresql 11 security updates on maps/codfw
12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
12:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4004.wikimedia.org
12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
12:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
12:10 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
12:08 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
11:51 claime: repooled mw1486 in api_appserver eqiad after hardware investigation - T326425
11:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
11:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
11:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3006.wikimedia.org
11:47 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1486.eqiad.wmnet
11:38 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=aux-k8s,service=kubesvc
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
11:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
11:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3006.wikimedia.org
11:16 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
11:15 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:15 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.reboot-workers (exit_code=99) for Druid test cluster: Reboot Druid nodes
11:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
10:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
10:31 zabe@deploy1002: Finished scap: Backport for gerrit:878160Simplify expensive check (T326690), gerrit:877249Start reading from cuc_actor on test wikis (T233004) (duration: 09m 34s)
10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
10:23 zabe@deploy1002: zabe and zabe: Backport for gerrit:878160Simplify expensive check (T326690), gerrit:877249Start reading from cuc_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
10:21 zabe@deploy1002: Started scap: Backport for gerrit:878160Simplify expensive check (T326690), gerrit:877249Start reading from cuc_actor on test wikis (T233004)
10:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
10:16 moritzm: installing postgresql-11 security updates
10:02 XioNoX: asw1-eqsin> request system reboot all-members - T316532
09:49 moritzm: installing python3.7 security updates
08:31 kartik@deploy1002: Finished scap: Backport for gerrit:877223CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 11m 45s)
08:21 kartik@deploy1002: kartik and kartik: Backport for gerrit:877223CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:20 kartik@deploy1002: Started scap: Backport for gerrit:877223CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
05:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
05:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet

2023-01-10

23:58 krinkle@deploy1002: Finished deploy [integration/docroot@b7c82a3]: (no justification provided) (duration: 00m 15s)
23:58 krinkle@deploy1002: Started deploy [integration/docroot@b7c82a3]: (no justification provided)
23:46 mutante: cumin2002 - sudo systemctl status httpbb_hourly_appserver
23:30 zabe@deploy1002: Finished scap: Backport for gerrit:878207Start writing to rev_comment_id on test wikis (T299954) (duration: 09m 39s)
23:22 zabe@deploy1002: zabe and zabe: Backport for gerrit:878207Start writing to rev_comment_id on test wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
23:21 zabe@deploy1002: Started scap: Backport for gerrit:878207Start writing to rev_comment_id on test wikis (T299954)
22:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.18 refs T325581
22:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
22:28 jhuneidi@deploy1002: Pruned MediaWiki: 1.40.0-wmf.14, 1.40.0-wmf.13 (duration: 02m 35s)
22:21 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.18 refs T325581 (duration: 45m 04s)
22:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
22:09 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 T325046', diff saved to https://phabricator.wikimedia.org/P42980 and previous config saved to /var/cache/conftool/dbconfig/20230110-220942-marostegui.json
22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
21:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
21:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
21:36 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.18 refs T325581
21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
21:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
21:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
21:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42979 and previous config saved to /var/cache/conftool/dbconfig/20230110-211826-root.json
21:18 zabe@deploy1002: Finished scap: Backport for gerrit:878168Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), gerrit:878187Start reading from cul_actor on group1 wikis (T233004) (duration: 10m 08s)
21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
21:09 zabe@deploy1002: zabe and zabe and matmarex: Backport for gerrit:878168Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), gerrit:878187Start reading from cul_actor on group1 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:08 zabe@deploy1002: Started scap: Backport for gerrit:878168Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), gerrit:878187Start reading from cul_actor on group1 wikis (T233004)
21:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42978 and previous config saved to /var/cache/conftool/dbconfig/20230110-210321-root.json
20:55 mutante: repooling eqsin
20:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42977 and previous config saved to /var/cache/conftool/dbconfig/20230110-204816-root.json
20:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42976 and previous config saved to /var/cache/conftool/dbconfig/20230110-203311-root.json
20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42975 and previous config saved to /var/cache/conftool/dbconfig/20230110-201807-ladsgroup.json
20:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42974 and previous config saved to /var/cache/conftool/dbconfig/20230110-201806-root.json
20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
20:07 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:06 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
20:04 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:04 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42972 and previous config saved to /var/cache/conftool/dbconfig/20230110-200302-ladsgroup.json
20:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42971 and previous config saved to /var/cache/conftool/dbconfig/20230110-200301-root.json
20:02 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 42s)
20:01 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:01 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:00 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
20:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
19:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
19:52 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 06s)
19:51 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
19:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
19:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42970 and previous config saved to /var/cache/conftool/dbconfig/20230110-194757-ladsgroup.json
19:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42969 and previous config saved to /var/cache/conftool/dbconfig/20230110-194756-root.json
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42968 and previous config saved to /var/cache/conftool/dbconfig/20230110-194750-ladsgroup.json
19:43 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:42 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:38 dancy@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
19:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:37 dancy@deploy1002: Installing scap version "4.32.0" for 1 hosts
19:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42965 and previous config saved to /var/cache/conftool/dbconfig/20230110-193253-ladsgroup.json
19:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42964 and previous config saved to /var/cache/conftool/dbconfig/20230110-193245-ladsgroup.json
19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
19:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1158 maint', diff saved to https://phabricator.wikimedia.org/P42963 and previous config saved to /var/cache/conftool/dbconfig/20230110-192929-ladsgroup.json
19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42962 and previous config saved to /var/cache/conftool/dbconfig/20230110-191740-ladsgroup.json
19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
19:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42958 and previous config saved to /var/cache/conftool/dbconfig/20230110-190235-ladsgroup.json
19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
18:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
18:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
18:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:23 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagemaster2001.codfw.wmnet with OS bullseye
18:23 jayme@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
18:20 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
18:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
18:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
17:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster2001.codfw.wmnet with OS bullseye
17:51 zabe: run populateCulActor on all wikis # T325484
17:48 claime: Finished rolling reboots of eqiad appservers
17:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 maint', diff saved to https://phabricator.wikimedia.org/P42956 and previous config saved to /var/cache/conftool/dbconfig/20230110-173807-ladsgroup.json
17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T325652', diff saved to https://phabricator.wikimedia.org/P42955 and previous config saved to /var/cache/conftool/dbconfig/20230110-173027-marostegui.json
17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42954 and previous config saved to /var/cache/conftool/dbconfig/20230110-173002-ladsgroup.json
17:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
17:28 ayounsi@deploy1002: deploy aborted: help (duration: 00m 01s)
17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: help
17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42953 and previous config saved to /var/cache/conftool/dbconfig/20230110-171457-ladsgroup.json
17:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:03 ayounsi@deploy1002: deploy aborted: netbox-next to 3.2.9 (duration: 00m 07s)
17:03 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42952 and previous config saved to /var/cache/conftool/dbconfig/20230110-165952-ladsgroup.json
16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After the incident', diff saved to https://phabricator.wikimedia.org/P42951 and previous config saved to /var/cache/conftool/dbconfig/20230110-165406-root.json
16:48 bblack: depooling eqsin from DNS
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42950 and previous config saved to /var/cache/conftool/dbconfig/20230110-164447-ladsgroup.json
16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After the incident', diff saved to https://phabricator.wikimedia.org/P42949 and previous config saved to /var/cache/conftool/dbconfig/20230110-163901-root.json
16:36 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:24 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After the incident', diff saved to https://phabricator.wikimedia.org/P42948 and previous config saved to /var/cache/conftool/dbconfig/20230110-162356-root.json
16:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:14 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2002.codfw.wmnet with OS bullseye
16:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After the incident', diff saved to https://phabricator.wikimedia.org/P42947 and previous config saved to /var/cache/conftool/dbconfig/20230110-160851-root.json
16:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:08 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
16:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
15:59 SandraEbele: reran failed pageview-druid-hourly-coord oozie job for 2023-1-10-10.
15:59 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1373,1384-1385,1387].eqiad.wmnet
15:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1373,1384-1385,1387].eqiad.wmnet
15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After the incident', diff saved to https://phabricator.wikimedia.org/P42946 and previous config saved to /var/cache/conftool/dbconfig/20230110-155346-root.json
15:52 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2002.codfw.wmnet with OS bullseye
15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After the incident', diff saved to https://phabricator.wikimedia.org/P42945 and previous config saved to /var/cache/conftool/dbconfig/20230110-153841-root.json
15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:29 claime: Restarting rolling reboots of eqiad appservers
15:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After the incident', diff saved to https://phabricator.wikimedia.org/P42944 and previous config saved to /var/cache/conftool/dbconfig/20230110-152336-root.json
15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
15:17 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
15:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
15:02 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2037.codfw.wmnet
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:56 XioNoX: start VC link maintenance in eqiad - T325803
14:55 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
14:49 zabe: UTC afternoon deploys done
14:49 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
14:48 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:47 zabe@deploy1002: Finished scap: Backport for gerrit:878021Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) (duration: 08m 59s)
14:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:40 zabe@deploy1002: zabe and zabe: Backport for gerrit:878021Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:38 zabe@deploy1002: Started scap: Backport for gerrit:878021Start reading from cul_actor on remaining test wikis and group0 wikis (T233004)
14:36 zabe: run populateCulActor on group0 wikis # T325484
14:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
14:35 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2037.codfw.wmnet
14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host apifeatureusage2001.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2036.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:28 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:26 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:25 zabe@deploy1002: Finished scap: Backport for [[gerrit:877268|[config]: GDI Safety Survey Wave 4 (T325136)]] (duration: 17m 42s)
14:21 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
14:19 claime: Pausing reboots of eqiad appservers for deployments
14:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1369-1372].eqiad.wmnet
14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1369-1372].eqiad.wmnet
14:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
14:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2036.codfw.wmnet
14:10 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
14:09 zabe@deploy1002: zabe and essexigyan: Backport for [[gerrit:877268|[config]: GDI Safety Survey Wave 4 (T325136)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:07 zabe@deploy1002: Started scap: Backport for [[gerrit:877268|[config]: GDI Safety Survey Wave 4 (T325136)]]
14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:06 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
14:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2035.codfw.wmnet
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:49 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
13:44 godog: delete grafana dashboards from "sre dashboards for deletion" folder - T178690
13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
13:37 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2035.codfw.wmnet
13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
13:34 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
13:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb-test2001.codfw.wmnet
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
12:59 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1002.eqiad.wmnet with OS bullseye
12:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:50 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetdb-test2001.codfw.wmnet
12:49 claime: Starting rolling reboot of eqiad appservers
12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
12:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
12:31 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
12:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
12:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2034.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
12:12 claime: Finished rolling reboot of eqiad jobrunners
12:07 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:48 jiji@cumin1001: START - Cookbook sre.dns.netbox
11:35 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
11:33 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
11:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
11:00 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2034.codfw.wmnet
10:31 godog: upgrade thanos to 0.30.1 on thanos-fe2* - T303154
10:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
10:21 claime: Starting rolling reboot of eqiad jobrunners
10:21 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
10:14 claime: repooled parse1002.eqiad.wmnet - T326119
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
10:13 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
10:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2033.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
10:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
09:59 cgoubert@cumin1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
09:55 godog: upgrade thanos to 0.30.1 on prometheus hosts - T303154
09:53 moritzm: installing systemd bugfix updates from Bullseye point release
09:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478] (duration: 00m 13s)
09:45 aqu@deploy1002: Started deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478]
09:43 godog: upgrade thanos to 0.30.1 on thanos-fe100[2-3] - T303154
09:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478] (duration: 00m 11s)
09:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
09:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478]
09:25 XioNoX: repool ulsfo (maintenance cancelled) - T316532
09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
09:22 taavi: added zabe to wmf-deployment gerrit group T326327
09:19 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2033.codfw.wmnet
09:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2032.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
09:15 kart_: Done: UTC morning backport window
09:14 kartik@deploy1002: Finished scap: Backport for gerrit:877219CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 09m 20s)
09:07 kartik@deploy1002: kartik and kartik: Backport for gerrit:877219CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:05 kartik@deploy1002: Started scap: Backport for gerrit:877219CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
08:58 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
08:56 godog: upgrade thanos to 0.30.1 on thanos-fe1001 - T303154
08:54 godog: upgrade thanos to 0.30.1 on prometheus2006 - T303154
08:49 kartik@deploy1002: Finished scap: Backport for gerrit:877138CX: Fix usage of categories translation unit as array (T326278) (duration: 12m 08s)
08:38 kartik@deploy1002: kartik and kartik: Backport for gerrit:877138CX: Fix usage of categories translation unit as array (T326278) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:37 kartik@deploy1002: Started scap: Backport for gerrit:877138CX: Fix usage of categories translation unit as array (T326278)
08:20 kartik@deploy1002: Finished scap: Backport for gerrit:875192ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) (duration: 17m 21s)
08:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:08 kartik@deploy1002: kartik and kartik: Backport for gerrit:875192ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:03 kartik@deploy1002: Started scap: Backport for gerrit:875192ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721)
08:02 jiji@cumin1001: START - Cookbook sre.dns.netbox
07:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2032.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2031.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:33 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2044.codfw.wmnet
07:28 XioNoX: depool ulsfo for network maintenance - T316532
07:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
07:22 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2031.codfw.wmnet
07:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:11 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T326133', diff saved to https://phabricator.wikimedia.org/P42941 and previous config saved to /var/cache/conftool/dbconfig/20230110-070628-ladsgroup.json
07:03 XioNoX: remove static routes for legacy dns-rec-lb IPs - T239993
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T326133', diff saved to https://phabricator.wikimedia.org/P42940 and previous config saved to /var/cache/conftool/dbconfig/20230110-070223-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T326133', diff saved to https://phabricator.wikimedia.org/P42939 and previous config saved to /var/cache/conftool/dbconfig/20230110-070152-ladsgroup.json
07:01 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T326133
06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T326133', diff saved to https://phabricator.wikimedia.org/P42938 and previous config saved to /var/cache/conftool/dbconfig/20230110-062309-ladsgroup.json
06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
05:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
05:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
03:14 eileen: civicrm upgraded from 391e8482 to 9afd2789
03:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
01:50 krinkle@deploy1002: Finished deploy [integration/docroot@f59119c]: (no justification provided) (duration: 00m 14s)
01:50 krinkle@deploy1002: Started deploy [integration/docroot@f59119c]: (no justification provided)
01:28 eileen: civicrm upgraded from e3405a4e to 391e8482
00:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247

2023-01-09

22:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
22:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:32 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2030.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
22:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2030.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2029.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
21:52 kindrobot: close UTC late backport window
21:50 jiji@cumin1001: START - Cookbook sre.dns.netbox
21:47 kindrobot@deploy1002: Sync cancelled.
21:47 kindrobot@deploy1002: kindrobot and trainbranchbot: Backport for [[gerrit:877260|Revert "[config]: Deploy GDI Safety Survey Wave 4"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
21:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
21:45 kindrobot@deploy1002: Started scap: Backport for [[gerrit:877260|Revert "[config]: Deploy GDI Safety Survey Wave 4"]]
21:39 kindrobot@deploy1002: Sync cancelled.
21:38 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2029.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2027.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:34 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
21:29 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
21:26 kindrobot@deploy1002: kindrobot and essexigyan: Backport for [[gerrit:877197|[config]: Deploy GDI Safety Survey Wave 4 (T325136)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:24 kindrobot@deploy1002: Started scap: Backport for [[gerrit:877197|[config]: Deploy GDI Safety Survey Wave 4 (T325136)]]
21:21 kindrobot: starting UTC late backport window
21:21 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2027.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2026.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P42936 and previous config saved to /var/cache/conftool/dbconfig/20230109-210940-marostegui.json
21:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
21:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
20:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
20:57 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2026.codfw.wmnet
20:52 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
20:52 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:36 Amir1: deleting global usage coming from commons in commons (T322588)
20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
20:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
20:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:24 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:21 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
19:37 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:33 bblack: cp5032: set param transit_buffer=4M via varnishadm
19:26 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2025.codfw.wmnet
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
19:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2025.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2024.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:57 jiji@cumin1001: START - Cookbook sre.dns.netbox
18:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2024.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2023.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:41 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
18:36 jiji@cumin1001: START - Cookbook sre.dns.netbox
18:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
18:30 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2023.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2022.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:00 jiji@cumin1001: START - Cookbook sre.dns.netbox
17:56 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2022.codfw.wmnet
17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
17:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2021.codfw.wmnet
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:34 claime: Finished codfw jobrunner rolling reboot
17:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
17:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
16:59 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2021.codfw.wmnet
16:49 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
16:48 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
16:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2020.codfw.wmnet
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
16:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2020.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2019.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:08 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:04 XioNoX: start VC link maintenance in eqiad - T325803
16:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
15:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2019.codfw.wmnet
15:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:37 claime: Starting codfw jobrunner rolling reboot
15:35 Lucas_WMDE: UTC afternoon backport+config window done
15:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:877139CX: Allow composer/installers plugin (duration: 10m 03s)
15:29 claime: Not starting codfw jobrunner rolling reboot, deploy in progress
15:28 claime: Starting codfw jobrunner rolling reboot
15:26 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kartik: Backport for gerrit:877139CX: Allow composer/installers plugin synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
15:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:877139CX: Allow composer/installers plugin
15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:11 effie: disable puppet on all 'P:mediawiki::mcrouter_wancache' hosts to merge 875894
15:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:876311extwiki: Install SandboxLink extension (T326450) (duration: 08m 37s)
15:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
15:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
15:02 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:876311extwiki: Install SandboxLink extension (T326450) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:876311extwiki: Install SandboxLink extension (T326450)
15:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
14:59 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://en.wikipedia.org/static/images/project-logos/jawikisource.png' | mwscript purgeList.php # T326488
14:56 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:876364jawikisource: Update project logo and wordmark (T326488) (duration: 09m 24s)
14:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
14:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
14:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:876364jawikisource: Update project logo and wordmark (T326488) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:876364jawikisource: Update project logo and wordmark (T326488)
14:45 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:876310arwiki: Create extendedmover group (T326434) (duration: 08m 56s)
14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:876310arwiki: Create extendedmover group (T326434) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:876310arwiki: Create extendedmover group (T326434)
14:31 godog: upgrade thanos to 0.30.1 on prometheus2005 - T303154
14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for gerrit:871286mediawikiwiki: Disable Flow on new pages by default (T325907) (duration: 18m 19s)
14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for gerrit:871286mediawikiwiki: Disable Flow on new pages by default (T325907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for gerrit:871286mediawikiwiki: Disable Flow on new pages by default (T325907)
13:55 moritzm: installing systemd bugfix updates from Bullseye point release
13:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
13:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
13:35 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
12:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140 (duration: 18m 56s)
12:34 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140
12:18 vgutierrez: repool cp5025
11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15954
11:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15954
11:29 vgutierrez: restart purged on cp5025
11:28 vgutierrez: depool cp5025 due to purging issues
11:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
11:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
11:06 XioNoX: repool ulsfo - T316532
11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
10:55 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:55 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
10:54 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
10:54 claime: Starting codfw appserver rolling reboot
10:54 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
10:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
10:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
10:46 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
10:46 effie: switching maps to eqiad
10:45 moritzm: installing avahi security updates
10:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
10:41 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
09:35 dcausse: restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
08:58 moritzm: installing glibc security updates
08:56 XioNoX: depool ulsfo for network maintenance - T316532
08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700
08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 327700
08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237
08:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 48237
08:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035
08:21 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org
08:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32035
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
08:08 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
08:06 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
08:06 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm-test1001.wikimedia.org

2023-01-06

18:57 mutante: systemctl start docker-gc on all gitlab-runners via cumin T310593
18:56 mutante: gitlab-runner1002 - systemctl start docker-gc; run puppet on all gitlab-runners T310593
18:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: debugging
18:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: debugging
18:36 sukhe: pool cp5032 [bullseye upgrade completed]: T325797
18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=ats-be
18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=cdn
18:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:13 Krinkle: krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431
17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye
17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
17:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
16:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
16:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
15:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1486.eqiad.wmnet
15:53 claime: depooling mw1486.eqiad.wmnet for hardware troubleshooting
15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
15:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts cp5032.eqsin.wmnet
15:08 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp5032.eqsin.wmnet
15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=ats-be
15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=cdn
15:07 sukhe: depool cp5032 for bullseye upgrade (starting with NIC firmware upgrade): T325797
14:42 jbond: remove bgpalerter from apt
14:06 reedy@deploy1002: Synchronized php-1.40.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/ucoc2023/populateEditCount.php: T326408 (duration: 07m 09s)
12:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
12:36 tzatziki: running extensions/SecurePoll/cli/wm-scripts/ucoc2023/ucoc2023_tables.sql on each wiki
12:29 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
11:38 jbond: upload bgpalerter to bullseye-wikimedia
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
10:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 21245
10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21245
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36994
09:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36994
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266925
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 266925
09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5713
09:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5713
09:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37473
09:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37473
09:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 4788
09:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4788
09:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32035
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32035
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15954
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15954
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 60427
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58717
09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58717
09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45489
08:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45489
08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24482
08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9119
08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9119
08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 64049
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263237
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 263237
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51185
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51185
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201746
08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 201746
08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62597
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62597
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 327700
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56630
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56630
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21245
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21245
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37282
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37282
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37558
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37558
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13113
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13113
08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 41095
08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 41095
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61573
08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 61573
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21320
08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21320
08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39405
08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39405
08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 47794
08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 47794
08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22822
08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22822
08:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58715
08:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58715
08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51254
08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51254
08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35432
08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35432
08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132602
08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132602
08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42473
08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42473
08:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16347
08:05 XioNoX: drmrs offload Vodafone from Tata - T324955
01:08 urbanecm@deploy1002: Finished scap: Backport for gerrit:876051Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) (duration: 08m 48s)
01:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for gerrit:876051Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
00:59 urbanecm@deploy1002: Started scap: Backport for gerrit:876051Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394)
00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42928 and previous config saved to /var/cache/conftool/dbconfig/20230106-004102-ladsgroup.json
00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42927 and previous config saved to /var/cache/conftool/dbconfig/20230106-002556-ladsgroup.json
00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42926 and previous config saved to /var/cache/conftool/dbconfig/20230106-001049-ladsgroup.json

2023-01-05

23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42925 and previous config saved to /var/cache/conftool/dbconfig/20230105-235543-ladsgroup.json
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42924 and previous config saved to /var/cache/conftool/dbconfig/20230105-235325-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42923 and previous config saved to /var/cache/conftool/dbconfig/20230105-235304-ladsgroup.json
23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42922 and previous config saved to /var/cache/conftool/dbconfig/20230105-233758-ladsgroup.json
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42921 and previous config saved to /var/cache/conftool/dbconfig/20230105-232251-ladsgroup.json
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42920 and previous config saved to /var/cache/conftool/dbconfig/20230105-230745-ladsgroup.json
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42919 and previous config saved to /var/cache/conftool/dbconfig/20230105-230629-ladsgroup.json
23:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42918 and previous config saved to /var/cache/conftool/dbconfig/20230105-230607-ladsgroup.json
22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42917 and previous config saved to /var/cache/conftool/dbconfig/20230105-225101-ladsgroup.json
22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42916 and previous config saved to /var/cache/conftool/dbconfig/20230105-223554-ladsgroup.json
22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42915 and previous config saved to /var/cache/conftool/dbconfig/20230105-222048-ladsgroup.json
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42914 and previous config saved to /var/cache/conftool/dbconfig/20230105-221932-ladsgroup.json
22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42913 and previous config saved to /var/cache/conftool/dbconfig/20230105-221911-ladsgroup.json
22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42912 and previous config saved to /var/cache/conftool/dbconfig/20230105-220404-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42911 and previous config saved to /var/cache/conftool/dbconfig/20230105-214858-ladsgroup.json
21:43 TheresNoTime: closing UTC late backport window
21:42 samtar@deploy1002: Finished scap: Backport for gerrit:865097Turn off wgNavigationTimingOversampleFactor campaigns (T286703) (duration: 08m 45s)
21:35 samtar@deploy1002: samtar and krinkle: Backport for gerrit:865097Turn off wgNavigationTimingOversampleFactor campaigns (T286703) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:33 samtar@deploy1002: Started scap: Backport for gerrit:865097Turn off wgNavigationTimingOversampleFactor campaigns (T286703)
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42910 and previous config saved to /var/cache/conftool/dbconfig/20230105-213351-ladsgroup.json
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42909 and previous config saved to /var/cache/conftool/dbconfig/20230105-213235-ladsgroup.json
21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42908 and previous config saved to /var/cache/conftool/dbconfig/20230105-213214-ladsgroup.json
21:31 samtar@deploy1002: Finished scap: Backport for gerrit:875915actions: Actually store CommentFormatter in McrUndoAction (T326336) (duration: 10m 31s)
21:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
21:23 samtar@deploy1002: samtar and zabe: Backport for gerrit:875915actions: Actually store CommentFormatter in McrUndoAction (T326336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:21 samtar@deploy1002: Started scap: Backport for gerrit:875915actions: Actually store CommentFormatter in McrUndoAction (T326336)
21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42907 and previous config saved to /var/cache/conftool/dbconfig/20230105-211707-ladsgroup.json
21:16 samtar@deploy1002: Finished scap: Backport for gerrit:875438Start writing to cuc_comment_id everywhere (T233004) (duration: 10m 07s)
21:08 samtar@deploy1002: samtar and zabe: Backport for gerrit:875438Start writing to cuc_comment_id everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:06 samtar@deploy1002: Started scap: Backport for gerrit:875438Start writing to cuc_comment_id everywhere (T233004)
21:04 samtar@deploy1002: backport aborted: (duration: 01m 22s)
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42906 and previous config saved to /var/cache/conftool/dbconfig/20230105-210201-ladsgroup.json
20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42905 and previous config saved to /var/cache/conftool/dbconfig/20230105-204654-ladsgroup.json
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42904 and previous config saved to /var/cache/conftool/dbconfig/20230105-204438-ladsgroup.json
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42903 and previous config saved to /var/cache/conftool/dbconfig/20230105-204403-ladsgroup.json
20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42902 and previous config saved to /var/cache/conftool/dbconfig/20230105-202856-ladsgroup.json
20:17 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest (duration: 00m 09s)
20:17 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42901 and previous config saved to /var/cache/conftool/dbconfig/20230105-201350-ladsgroup.json
19:59 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.17 refs T325580
19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42900 and previous config saved to /var/cache/conftool/dbconfig/20230105-195843-ladsgroup.json
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42899 and previous config saved to /var/cache/conftool/dbconfig/20230105-195627-ladsgroup.json
19:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42898 and previous config saved to /var/cache/conftool/dbconfig/20230105-195606-ladsgroup.json
19:48 taavi@deploy1002: Finished scap: Backport for gerrit:875379actions: Pass CommentFormatter to McrRestoreAction (T326275) (duration: 10m 11s)
19:41 taavi@deploy1002: taavi and zabe: Backport for gerrit:875379actions: Pass CommentFormatter to McrRestoreAction (T326275) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42897 and previous config saved to /var/cache/conftool/dbconfig/20230105-194059-ladsgroup.json
19:38 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.10-1wm3_amd64.changes: T325797
19:37 taavi@deploy1002: Started scap: Backport for gerrit:875379actions: Pass CommentFormatter to McrRestoreAction (T326275)
19:31 Amir1: creating new cu tables
19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42896 and previous config saved to /var/cache/conftool/dbconfig/20230105-192553-ladsgroup.json
19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42895 and previous config saved to /var/cache/conftool/dbconfig/20230105-191046-ladsgroup.json
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42894 and previous config saved to /var/cache/conftool/dbconfig/20230105-190830-ladsgroup.json
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42893 and previous config saved to /var/cache/conftool/dbconfig/20230105-190724-ladsgroup.json
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42892 and previous config saved to /var/cache/conftool/dbconfig/20230105-185217-ladsgroup.json
18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42891 and previous config saved to /var/cache/conftool/dbconfig/20230105-183711-ladsgroup.json
18:22 taavi: delete some nostalgiawiki pages using maintenance/deleteBatch.php for T326334
18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42890 and previous config saved to /var/cache/conftool/dbconfig/20230105-182204-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42889 and previous config saved to /var/cache/conftool/dbconfig/20230105-181949-ladsgroup.json
18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42888 and previous config saved to /var/cache/conftool/dbconfig/20230105-181928-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42887 and previous config saved to /var/cache/conftool/dbconfig/20230105-180421-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42886 and previous config saved to /var/cache/conftool/dbconfig/20230105-174915-ladsgroup.json
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42885 and previous config saved to /var/cache/conftool/dbconfig/20230105-173408-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42884 and previous config saved to /var/cache/conftool/dbconfig/20230105-173154-ladsgroup.json
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42883 and previous config saved to /var/cache/conftool/dbconfig/20230105-173133-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42882 and previous config saved to /var/cache/conftool/dbconfig/20230105-171626-ladsgroup.json
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42880 and previous config saved to /var/cache/conftool/dbconfig/20230105-170119-ladsgroup.json
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42878 and previous config saved to /var/cache/conftool/dbconfig/20230105-164612-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42877 and previous config saved to /var/cache/conftool/dbconfig/20230105-164358-ladsgroup.json
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42876 and previous config saved to /var/cache/conftool/dbconfig/20230105-164258-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42875 and previous config saved to /var/cache/conftool/dbconfig/20230105-162751-ladsgroup.json
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42874 and previous config saved to /var/cache/conftool/dbconfig/20230105-161245-ladsgroup.json
16:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:04 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42873 and previous config saved to /var/cache/conftool/dbconfig/20230105-155738-ladsgroup.json
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42872 and previous config saved to /var/cache/conftool/dbconfig/20230105-155524-ladsgroup.json
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42871 and previous config saved to /var/cache/conftool/dbconfig/20230105-155503-ladsgroup.json
15:52 matthiasmullie: UTC afternoon backports done
15:51 mlitn@deploy1002: Finished scap: Backport for gerrit:875907Fix URL construction (duration: 12m 21s)
15:41 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:875907Fix URL construction synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42870 and previous config saved to /var/cache/conftool/dbconfig/20230105-153956-ladsgroup.json
15:39 mlitn@deploy1002: Started scap: Backport for gerrit:875907Fix URL construction
15:37 mlitn@deploy1002: Finished scap: Backport for gerrit:875907Fix URL construction (duration: 08m 04s)
15:31 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:875907Fix URL construction synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:29 mlitn@deploy1002: Started scap: Backport for gerrit:875907Fix URL construction
15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:26 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42869 and previous config saved to /var/cache/conftool/dbconfig/20230105-152447-ladsgroup.json
15:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:14 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:10 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42868 and previous config saved to /var/cache/conftool/dbconfig/20230105-150939-ladsgroup.json
15:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42867 and previous config saved to /var/cache/conftool/dbconfig/20230105-150825-ladsgroup.json
15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42866 and previous config saved to /var/cache/conftool/dbconfig/20230105-150804-ladsgroup.json
14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
14:58 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
14:56 claime: hard resetting mw1486
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42865 and previous config saved to /var/cache/conftool/dbconfig/20230105-145257-ladsgroup.json
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42864 and previous config saved to /var/cache/conftool/dbconfig/20230105-143751-ladsgroup.json
14:30 mlitn@deploy1002: Finished scap: Backport for gerrit:875908Also get central description (T325831) (duration: 08m 32s)
14:23 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:875908Also get central description (T325831) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42862 and previous config saved to /var/cache/conftool/dbconfig/20230105-142244-ladsgroup.json
14:21 mlitn@deploy1002: Started scap: Backport for gerrit:875908Also get central description (T325831)
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42861 and previous config saved to /var/cache/conftool/dbconfig/20230105-142029-ladsgroup.json
14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42860 and previous config saved to /var/cache/conftool/dbconfig/20230105-142008-ladsgroup.json
14:17 mlitn@deploy1002: Finished scap: Backport for gerrit:875906Also get central description (T325831) (duration: 07m 57s)
14:11 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:875906Also get central description (T325831) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:09 mlitn@deploy1002: Started scap: Backport for gerrit:875906Also get central description (T325831)
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42859 and previous config saved to /var/cache/conftool/dbconfig/20230105-140501-ladsgroup.json
13:58 Amir1: start of externallinks migration in elwiki (and rest of large wikis in s3) (T326314)
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42858 and previous config saved to /var/cache/conftool/dbconfig/20230105-134955-ladsgroup.json
13:46 ladsgroup@deploy1002: Finished scap: Backport for gerrit:875892Enable write both for externallinks in ten largest s3 wikis (T321662) (duration: 08m 54s)
13:42 urbanecm: aswikiquote: Run importDump.php to import a XML dump (per new wiki importers request, running into issues with a largish page)
13:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for gerrit:875892Enable write both for externallinks in ten largest s3 wikis (T321662) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:38 XioNoX: start [eqiad] faulty VC optics maintenance - T325803
13:37 ladsgroup@deploy1002: Started scap: Backport for gerrit:875892Enable write both for externallinks in ten largest s3 wikis (T321662)
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42857 and previous config saved to /var/cache/conftool/dbconfig/20230105-133448-ladsgroup.json
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42856 and previous config saved to /var/cache/conftool/dbconfig/20230105-133234-ladsgroup.json
13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42855 and previous config saved to /var/cache/conftool/dbconfig/20230105-133211-ladsgroup.json
13:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:21 effie: enable puppet on all mw servers
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42854 and previous config saved to /var/cache/conftool/dbconfig/20230105-131705-ladsgroup.json
13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
13:02 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42853 and previous config saved to /var/cache/conftool/dbconfig/20230105-130158-ladsgroup.json
13:02 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
13:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
13:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
13:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
13:00 hashar: Restarted Gerrit for a plugin update
12:58 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 08s)
12:58 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42852 and previous config saved to /var/cache/conftool/dbconfig/20230105-124651-ladsgroup.json
12:45 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 10s)
12:45 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42851 and previous config saved to /var/cache/conftool/dbconfig/20230105-124437-ladsgroup.json
12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
12:44 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:31 ladsgroup:: Deployed security patch for T233004 T326293
12:02 hashar: gerrit: running `copy-approvals` script to prepare for Gerrit 3.6 upgrade (T309870): `ssh -p 29418 gerrit.wikimedia.org gerrit copy-approvals --verbose`
11:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:58 hashar: Restarting Gerrit
11:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 09s)
11:57 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
11:57 hashar: Stopping Gerrit for plugin deployment
11:45 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:40 effie: disabling puppet on all hosts running mcrouter to merge 860102
11:24 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=eqiad
11:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:23 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=eqiad
11:23 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:22 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=codfw
11:20 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 10s)
11:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:20 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
11:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:19 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=codfw
11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:12 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:12 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42850 and previous config saved to /var/cache/conftool/dbconfig/20230105-105808-root.json
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42849 and previous config saved to /var/cache/conftool/dbconfig/20230105-104303-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42848 and previous config saved to /var/cache/conftool/dbconfig/20230105-102758-root.json
10:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:26 claime: Rolling reboot of api_appserver hosts in eqiad
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42847 and previous config saved to /var/cache/conftool/dbconfig/20230105-102357-root.json
10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42846 and previous config saved to /var/cache/conftool/dbconfig/20230105-101253-root.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42845 and previous config saved to /var/cache/conftool/dbconfig/20230105-100852-root.json
10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:06 claime: Restarting rolling reboot of api_appserver hosts in codfw
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42844 and previous config saved to /var/cache/conftool/dbconfig/20230105-095748-root.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42843 and previous config saved to /var/cache/conftool/dbconfig/20230105-095347-root.json
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42841 and previous config saved to /var/cache/conftool/dbconfig/20230105-094243-root.json
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42840 and previous config saved to /var/cache/conftool/dbconfig/20230105-093842-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42839 and previous config saved to /var/cache/conftool/dbconfig/20230105-092738-root.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42838 and previous config saved to /var/cache/conftool/dbconfig/20230105-092336-root.json
09:14 XioNoX: turn up BGP to NTT in drmrs - T314929
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42837 and previous config saved to /var/cache/conftool/dbconfig/20230105-090831-root.json
08:56 hashar@deploy1002: Finished scap: Backport for [[gerrit:830877|[SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)]] (duration: 11m 38s)
08:46 hashar@deploy1002: hashar and mlitn: Backport for [[gerrit:830877|[SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:44 hashar@deploy1002: Started scap: Backport for [[gerrit:830877|[SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)]]
07:58 moritzm: installing glibc security updates on bullseye
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db2151 in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42836 and previous config saved to /var/cache/conftool/dbconfig/20230105-075046-marostegui.json
07:28 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to clone db1176 T326211', diff saved to https://phabricator.wikimedia.org/P42833 and previous config saved to /var/cache/conftool/dbconfig/20230105-064153-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2151 for the first time in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42832 and previous config saved to /var/cache/conftool/dbconfig/20230105-063937-marostegui.json
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance

2023-01-04

23:01 mutante: deploy2002 - re-arming keyholder T324014
23:00 mutante: deploy1002 - re-arming keyholder T324014
22:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42831 and previous config saved to /var/cache/conftool/dbconfig/20230104-223545-marostegui.json
22:27 kindrobot: finished UTC late backport window
22:27 kindrobot@deploy1002: Finished scap: Backport for gerrit:875371Fix underlinkedness rescore logic (T301096), gerrit:875372Fix underlinkedness rescore logic (T301096) (duration: 15m 20s)
22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42828 and previous config saved to /var/cache/conftool/dbconfig/20230104-222038-marostegui.json
22:13 kindrobot@deploy1002: kindrobot and tgr: Backport for gerrit:875371Fix underlinkedness rescore logic (T301096), gerrit:875372Fix underlinkedness rescore logic (T301096) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
22:11 kindrobot@deploy1002: Started scap: Backport for gerrit:875371Fix underlinkedness rescore logic (T301096), gerrit:875372Fix underlinkedness rescore logic (T301096)
22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42827 and previous config saved to /var/cache/conftool/dbconfig/20230104-220532-marostegui.json
21:51 kindrobot@deploy1002: backport aborted: (duration: 02m 12s)
21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42826 and previous config saved to /var/cache/conftool/dbconfig/20230104-215025-marostegui.json
21:48 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "African Wikimedia Technical Community/Project Scope" "Africa Wikimedia Technical Community/Project Scope" "Taavi" --reason "per request phab:T318292" # T318292
21:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42825 and previous config saved to /var/cache/conftool/dbconfig/20230104-214616-marostegui.json
21:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42824 and previous config saved to /var/cache/conftool/dbconfig/20230104-214555-marostegui.json
21:44 kindrobot@deploy1002: Finished scap: Backport for gerrit:875386Add namespace to gorwiktionary (T326253) (duration: 11m 26s)
21:35 kindrobot@deploy1002: kindrobot and jhsoby: Backport for gerrit:875386Add namespace to gorwiktionary (T326253) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:33 kindrobot@deploy1002: Started scap: Backport for gerrit:875386Add namespace to gorwiktionary (T326253)
21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42823 and previous config saved to /var/cache/conftool/dbconfig/20230104-213049-marostegui.json
21:28 kindrobot@deploy1002: Finished scap: Backport for gerrit:874957Start writing to cuc_comment_id on group0 and group1 wikis (T233004) (duration: 17m 28s)
21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42820 and previous config saved to /var/cache/conftool/dbconfig/20230104-211542-marostegui.json
21:12 kindrobot@deploy1002: kindrobot and zabe: Backport for gerrit:874957Start writing to cuc_comment_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:10 kindrobot@deploy1002: Started scap: Backport for gerrit:874957Start writing to cuc_comment_id on group0 and group1 wikis (T233004)
21:05 kindrobot: starting UTC late backport window
21:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42819 and previous config saved to /var/cache/conftool/dbconfig/20230104-210036-marostegui.json
20:58 Amir1: running refreshGlobalimagelinks.php on all wikis (T322588)
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42818 and previous config saved to /var/cache/conftool/dbconfig/20230104-205628-marostegui.json
20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42817 and previous config saved to /var/cache/conftool/dbconfig/20230104-205607-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42816 and previous config saved to /var/cache/conftool/dbconfig/20230104-204100-marostegui.json
20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42815 and previous config saved to /var/cache/conftool/dbconfig/20230104-202554-marostegui.json
20:14 cstone: payments-wiki upgraded from ede93d62 to f075991f
20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42814 and previous config saved to /var/cache/conftool/dbconfig/20230104-201047-marostegui.json
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42813 and previous config saved to /var/cache/conftool/dbconfig/20230104-200638-marostegui.json
20:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42812 and previous config saved to /var/cache/conftool/dbconfig/20230104-200617-marostegui.json
19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42811 and previous config saved to /var/cache/conftool/dbconfig/20230104-195110-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42810 and previous config saved to /var/cache/conftool/dbconfig/20230104-193604-marostegui.json
19:32 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.17 refs T325580 (duration: 06m 58s)
19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.17 refs T325580
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42809 and previous config saved to /var/cache/conftool/dbconfig/20230104-192057-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42808 and previous config saved to /var/cache/conftool/dbconfig/20230104-191648-marostegui.json
19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42807 and previous config saved to /var/cache/conftool/dbconfig/20230104-191627-marostegui.json
19:07 dancy@deploy1002: Installing scap version "4.32.0" for 560 hosts
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42806 and previous config saved to /var/cache/conftool/dbconfig/20230104-190121-marostegui.json
18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42805 and previous config saved to /var/cache/conftool/dbconfig/20230104-184614-marostegui.json
18:40 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided) (duration: 00m 05s)
18:40 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided)
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42804 and previous config saved to /var/cache/conftool/dbconfig/20230104-183108-marostegui.json
18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42803 and previous config saved to /var/cache/conftool/dbconfig/20230104-182700-marostegui.json
18:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42802 and previous config saved to /var/cache/conftool/dbconfig/20230104-182425-marostegui.json
18:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules) (duration: 00m 54s)
18:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules)
18:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 03m 54s)
18:09 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42801 and previous config saved to /var/cache/conftool/dbconfig/20230104-180918-marostegui.json
18:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42800 and previous config saved to /var/cache/conftool/dbconfig/20230104-175412-marostegui.json
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42799 and previous config saved to /var/cache/conftool/dbconfig/20230104-173905-marostegui.json
17:37 dancy@deploy1002: Installing scap version "4.31.1" for 560 hosts
17:36 dancy@deploy1002: Finished scap: testing (duration: 07m 50s)
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42798 and previous config saved to /var/cache/conftool/dbconfig/20230104-173455-marostegui.json
17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42797 and previous config saved to /var/cache/conftool/dbconfig/20230104-173434-marostegui.json
17:28 dancy@deploy1002: Started scap: testing
17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42796 and previous config saved to /var/cache/conftool/dbconfig/20230104-171928-marostegui.json
17:10 mutante: new Wikipedia (and other projects) language added: guc - https://en.wikipedia.org/wiki/Wayuu_language - https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Wayuu T321880
17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42795 and previous config saved to /var/cache/conftool/dbconfig/20230104-170421-marostegui.json
17:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:55 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest (duration: 00m 17s)
16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest
16:49 dancy@deploy1002: Installing scap version "4.30.3-1" for 560 hosts
16:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42794 and previous config saved to /var/cache/conftool/dbconfig/20230104-164915-marostegui.json
16:48 dancy@deploy1002: Finished scap: testing (duration: 13m 16s)
16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42793 and previous config saved to /var/cache/conftool/dbconfig/20230104-164504-marostegui.json
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:37 dancy@deploy1002: Started scap: testing
16:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:30 dancy@deploy1002: Installing scap version "4.31.0" for 560 hosts
16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42792 and previous config saved to /var/cache/conftool/dbconfig/20230104-162828-marostegui.json
16:29 dancy@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 13s)
16:27 dancy@deploy1002: Started scap: (no justification provided)
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42791 and previous config saved to /var/cache/conftool/dbconfig/20230104-161321-marostegui.json
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2402.*
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2401.*
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2400.*
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42790 and previous config saved to /var/cache/conftool/dbconfig/20230104-155815-marostegui.json
15:51 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42789 and previous config saved to /var/cache/conftool/dbconfig/20230104-154308-marostegui.json
15:34 moritzm: installing glibc security updates on bullseye
15:34 moritzm: installing glibc security updates
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42788 and previous config saved to /var/cache/conftool/dbconfig/20230104-153435-marostegui.json
15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42787 and previous config saved to /var/cache/conftool/dbconfig/20230104-153413-marostegui.json
15:33 ladsgroup@deploy1002: Finished scap: Backport for gerrit:874899Disable LoadMonitor in CLI (T322156) (duration: 09m 48s)
15:32 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:32 claime: Restarting rolling reboot of api_appserver hosts in codfw
15:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for gerrit:874899Disable LoadMonitor in CLI (T322156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:23 ladsgroup@deploy1002: Started scap: Backport for gerrit:874899Disable LoadMonitor in CLI (T322156)
15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42786 and previous config saved to /var/cache/conftool/dbconfig/20230104-151907-marostegui.json
15:06 marostegui: dbmaint deploy schema change on s5 eqiad T326224
15:05 marostegui: dbmaint deploy schema change on s3 eqiad T326224
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42785 and previous config saved to /var/cache/conftool/dbconfig/20230104-150400-marostegui.json
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42784 and previous config saved to /var/cache/conftool/dbconfig/20230104-144853-marostegui.json
14:46 marostegui: dbmaint deploy schema change on s3 eqiad T326222
14:44 marostegui: dbmaint deploy schema change on s5 eqiad T326222
14:42 XioNoX: fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838
14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42783 and previous config saved to /var/cache/conftool/dbconfig/20230104-144025-marostegui.json
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 urbanecm: UTC afternoon B&C window done
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42782 and previous config saved to /var/cache/conftool/dbconfig/20230104-143949-marostegui.json
14:38 marostegui: dbmaint deploy schema change on s3 eqiad T326223
14:38 urbanecm@deploy1002: Finished scap: Backport for gerrit:875305Start reading from cul_actor on testwiki (T233004), gerrit:875312aswikiquote: Set timezone to Asia/Kolkata (T321246) (duration: 09m 50s)
14:37 marostegui: dbmaint deploy schema change on s5 eqiad T326223
14:32 XioNoX: fix inconsistent mtu on mr1-eqiad - T315838
14:30 urbanecm@deploy1002: urbanecm and urbanecm and zabe: Backport for gerrit:875305Start reading from cul_actor on testwiki (T233004), gerrit:875312aswikiquote: Set timezone to Asia/Kolkata (T321246) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:28 urbanecm@deploy1002: Started scap: Backport for gerrit:875305Start reading from cul_actor on testwiki (T233004), gerrit:875312aswikiquote: Set timezone to Asia/Kolkata (T321246)
14:27 urbanecm@deploy1002: Finished scap: Backport for gerrit:870978plwiki: Add editcontentmodel to interface-admin (T325819), gerrit:874884Mark active sections even when their headings are in wrapper elements (T318044 T324869) (duration: 09m 32s)
14:27 XioNoX: fix inconsistent mtu on mr1-codfw - T315838
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42781 and previous config saved to /var/cache/conftool/dbconfig/20230104-142442-marostegui.json
14:24 marostegui: dbmaint deploy schema change on s7 eqiad T326227
14:22 XioNoX: fix inconsistent mtu on mr1-eqsin - T315838
14:19 urbanecm@deploy1002: urbanecm and stang and matmarex: Backport for gerrit:870978plwiki: Add editcontentmodel to interface-admin (T325819), gerrit:874884Mark active sections even when their headings are in wrapper elements (T318044 T324869) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:18 urbanecm@deploy1002: Started scap: Backport for gerrit:870978plwiki: Add editcontentmodel to interface-admin (T325819), gerrit:874884Mark active sections even when their headings are in wrapper elements (T318044 T324869)
14:16 urbanecm@deploy1002: backport aborted: (duration: 00m 07s)
14:16 urbanecm@deploy1002: Finished scap: Backport for gerrit:870920Revert "trwiki: Add 20 years celebration logos" (T325823), gerrit:870988kuwiki: Install SandboxLink (T325469) (duration: 09m 37s)
14:16 marostegui: Sanitize new wikis T326138 T321294 T321288 T321256
14:15 XioNoX: fix inconsistent mtu on mr1-esams - T315838
14:14 marostegui: dbmaint deploy schema change on s7 eqiad T326228
14:13 marostegui: dbmaint deploy schema change on s7 eqiad T326226
14:11 marostegui: dbmaint deploy schema change on s8 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s7 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s6 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s5 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s4 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s3 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s2 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s1 eqiad T326221
14:10 marostegui: dbmaint deploy schema change on s7 eqiad T326225
14:10 marostegui: dbmaint deploy schema change on s7 T326225
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42780 and previous config saved to /var/cache/conftool/dbconfig/20230104-140936-marostegui.json
14:08 urbanecm@deploy1002: urbanecm and stang: Backport for gerrit:870920Revert "trwiki: Add 20 years celebration logos" (T325823), gerrit:870988kuwiki: Install SandboxLink (T325469) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:06 urbanecm@deploy1002: Started scap: Backport for gerrit:870920Revert "trwiki: Add 20 years celebration logos" (T325823), gerrit:870988kuwiki: Install SandboxLink (T325469)
14:04 XioNoX: fix inconsistent mtu on mr1-ulsfo - T315838
14:02 marostegui: dbmaint deploy schema change on s3 T326221
14:02 moritzm: updating buster nodes running 5.10 to 5.10.158-2~deb10u1 (only rollout of the new kernel, no reboots)
14:02 urbanecm@deploy1002: Finished scap: Backport for gerrit:874844Update interwiki cache (duration: 08m 00s)
13:58 marostegui: dbmaint deploy schema change on s7 T326221
13:57 marostegui: dbmaint deploy schema change on s8 T326221
13:57 marostegui: dbmaint deploy schema change on s6 T326221
13:56 marostegui: dbmaint deploy schema change on s5 T326221
13:55 marostegui: dbmaint deploy schema change on s4 T326221
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42779 and previous config saved to /var/cache/conftool/dbconfig/20230104-135429-marostegui.json
13:54 urbanecm@deploy1002: Started scap: Backport for gerrit:874844Update interwiki cache
13:54 marostegui: dbmaint deploy schema change on s2 T326221
13:53 marostegui: dbmaint deploy schema change on s1 T326221
13:52 urbanecm@deploy1002: Finished scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246) (duration: 07m 52s)
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42778 and previous config saved to /var/cache/conftool/dbconfig/20230104-134544-marostegui.json
13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:45 XioNoX: repool esams-eqiad link for mtu change - T315838
13:44 urbanecm@deploy1002: Started scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246)
13:41 XioNoX: drain esams-eqiad link for mtu change - T315838
13:39 urbanecm@deploy1002: Finished scap: Backport for gerrit:874883Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), gerrit:874882Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137) (duration: 38m 23s)
13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42777 and previous config saved to /var/cache/conftool/dbconfig/20230104-133830-marostegui.json
13:33 XioNoX: fix missmatch MTU on pfw3-codfw - T315838
13:31 urbanecm: New wiki creation will run over by a couple of minutes
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42776 and previous config saved to /var/cache/conftool/dbconfig/20230104-132323-marostegui.json
13:15 XioNoX: fix missmatch MTU on cloudsw switches - T315838
13:11 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42775 and previous config saved to /var/cache/conftool/dbconfig/20230104-130816-marostegui.json
13:00 urbanecm@deploy1002: Started scap: Backport for gerrit:874883Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), gerrit:874882Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137)
12:58 urbanecm@deploy1002: Finished scap: Creating shnwikibooks (T321248) (duration: 07m 38s)
12:56 moritzm: installing emacs security updates
12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42774 and previous config saved to /var/cache/conftool/dbconfig/20230104-125330-root.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42773 and previous config saved to /var/cache/conftool/dbconfig/20230104-125310-marostegui.json
12:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:50 urbanecm@deploy1002: Started scap: Creating shnwikibooks (T321248)
12:48 urbanecm@deploy1002: Finished scap: Creating guwwikiquote (T321247) (duration: 07m 44s)
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42772 and previous config saved to /var/cache/conftool/dbconfig/20230104-124424-marostegui.json
12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42771 and previous config saved to /var/cache/conftool/dbconfig/20230104-124403-marostegui.json
12:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:41 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:41 urbanecm@deploy1002: Started scap: Creating guwwikiquote (T321247)
12:40 claime: Rolling reboot of api_appserver hosts in codfw paused for https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230104T1200
12:38 urbanecm@deploy1002: Finished scap: Creating aswikiquote (T321246) (duration: 07m 49s)
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42770 and previous config saved to /var/cache/conftool/dbconfig/20230104-123825-root.json
12:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
12:31 urbanecm@deploy1002: Started scap: Creating aswikiquote (T321246)
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42769 and previous config saved to /var/cache/conftool/dbconfig/20230104-122857-marostegui.json
12:27 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
12:26 urbanecm@deploy1002: Finished scap: Backport for gerrit:874880Add namespace translations in Wayuu (T321881), gerrit:874879Add namespace translations in Wayuu (T321881) (duration: 10m 36s)
12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42768 and previous config saved to /var/cache/conftool/dbconfig/20230104-122320-root.json
12:18 urbanecm@deploy1002: urbanecm and urbanecm: Backport for gerrit:874880Add namespace translations in Wayuu (T321881), gerrit:874879Add namespace translations in Wayuu (T321881) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
12:16 urbanecm@deploy1002: Started scap: Backport for gerrit:874880Add namespace translations in Wayuu (T321881), gerrit:874879Add namespace translations in Wayuu (T321881)
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42767 and previous config saved to /var/cache/conftool/dbconfig/20230104-121350-marostegui.json
12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42766 and previous config saved to /var/cache/conftool/dbconfig/20230104-120815-root.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42765 and previous config saved to /var/cache/conftool/dbconfig/20230104-115844-marostegui.json
11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42764 and previous config saved to /var/cache/conftool/dbconfig/20230104-115310-root.json
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42763 and previous config saved to /var/cache/conftool/dbconfig/20230104-115011-marostegui.json
11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42761 and previous config saved to /var/cache/conftool/dbconfig/20230104-113805-root.json
11:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb2003.codfw.wmnet
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2151 to dbctl depooled T326206', diff saved to https://phabricator.wikimedia.org/P42759 and previous config saved to /var/cache/conftool/dbconfig/20230104-112801-marostegui.json
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42758 and previous config saved to /var/cache/conftool/dbconfig/20230104-112300-root.json
11:02 vgutierrez: testing HAProxy 2.4.20 in cp4037 and cp4045
10:56 vgutierrez: (apt1001) import HAproxy 2.4.20 from third-party repo for buster and bullseye
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 894 hosts
10:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 894 hosts
10:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124 T326206', diff saved to https://phabricator.wikimedia.org/P42756 and previous config saved to /var/cache/conftool/dbconfig/20230104-103109-marostegui.json
10:29 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:29 claime: Rolling reboot of api_appserver hosts in codfw
10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:14 claime: Rolling reboot of mwdebug hosts in eqiad
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:04 marostegui: dbmaint eqiad deploy schema change on s5 T326011
10:04 claime: Rolling reboot of mwdebug hosts in codfw
10:04 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:04 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:04 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:03 filippo@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:02 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:01 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:01 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:00 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
09:53 effie: Upload imposm3_0.11.1-1 to buster-wikimedia - T325293
09:48 XioNoX: drmrs: offload traffic from Tata - T324955
09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56286
09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56286
09:37 marostegui: dbmaint codfw deploy schema change on s5 T326011
09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
09:08 matthiasmullie: UTC morning backports done
09:07 mlitn@deploy1002: Finished scap: Backport for gerrit:874887Squashed diff to catch up to wmf/1.40.0-wmf.17 (duration: 08m 13s)
09:01 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:874887Squashed diff to catch up to wmf/1.40.0-wmf.17 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
09:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
08:59 mlitn@deploy1002: Started scap: Backport for gerrit:874887Squashed diff to catch up to wmf/1.40.0-wmf.17
08:57 mlitn@deploy1002: Finished scap: Backport for gerrit:874889Change IW breakpoint to be enabled on smaller screen (T321377) (duration: 08m 56s)
08:50 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:874889Change IW breakpoint to be enabled on smaller screen (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:48 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
08:48 mlitn@deploy1002: Started scap: Backport for gerrit:874889Change IW breakpoint to be enabled on smaller screen (T321377)
08:32 mlitn@deploy1002: Finished scap: Backport for gerrit:874890Always show search results at full width (T321377) (duration: 08m 22s)
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P42755 and previous config saved to /var/cache/conftool/dbconfig/20230104-082942-root.json
08:26 marostegui: dbmaint codfw deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T255174
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T255174
08:25 mlitn@deploy1002: mlitn and mlitn: Backport for gerrit:874890Always show search results at full width (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:23 mlitn@deploy1002: Started scap: Backport for gerrit:874890Always show search results at full width (T321377)
08:22 marostegui: dbmaint eqiad deploy schema change on s8 T255174
08:20 marostegui: dbmaint codfw deploy schema change on s8 T255174
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P42754 and previous config saved to /var/cache/conftool/dbconfig/20230104-081437-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P42753 and previous config saved to /var/cache/conftool/dbconfig/20230104-075932-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P42752 and previous config saved to /var/cache/conftool/dbconfig/20230104-074427-root.json
07:38 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
07:38 marostegui: Switch x1 back to RBR T255174
07:35 marostegui: dbmaint codfw deploy schema change on x1 T255174
07:35 marostegui: dbmaint eqiad deploy schema change on x1 T255174
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P42751 and previous config saved to /var/cache/conftool/dbconfig/20230104-072922-root.json
07:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: After testing', diff saved to https://phabricator.wikimedia.org/P42750 and previous config saved to /var/cache/conftool/dbconfig/20230104-071417-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: After testing', diff saved to https://phabricator.wikimedia.org/P42749 and previous config saved to /var/cache/conftool/dbconfig/20230104-065912-root.json

2023-01-03

22:47 eileen: config 34754c69 -> 03c4d7a6
22:33 eileen: config revision changed from 5c73975a to 34754c69
21:55 mutante: gitlab-runner* - correction: allowing connections TO kubestagemaster.svc.eqiad.wmnet port 6443 FROM trusted runners, of course - T325385
21:53 mutante: gitlab-runner* - allowing kubestagemaster.svc.eqiad.wmnet to connect to port 6443, run puppet via cumin, deploy gerrit:868737 - T325385
21:47 taavi: UTC late backports done
21:46 taavi@deploy1002: Finished scap: Backport for gerrit:869226Specify Citoid RESTBase URL separately (T325425), gerrit:874855Use new DiscussionTools heading markup on group1 wikis (T314714) (duration: 12m 12s)
21:35 taavi@deploy1002: taavi and matmarex: Backport for gerrit:869226Specify Citoid RESTBase URL separately (T325425), gerrit:874855Use new DiscussionTools heading markup on group1 wikis (T314714) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:34 taavi@deploy1002: Started scap: Backport for gerrit:869226Specify Citoid RESTBase URL separately (T325425), gerrit:874855Use new DiscussionTools heading markup on group1 wikis (T314714)
21:30 taavi@deploy1002: Finished scap: Backport for gerrit:874443Start writing to cuc_comment_id on test wikis (T233004) (duration: 12m 54s)
21:19 taavi@deploy1002: taavi and zabe: Backport for gerrit:874443Start writing to cuc_comment_id on test wikis (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:17 taavi@deploy1002: Started scap: Backport for gerrit:874443Start writing to cuc_comment_id on test wikis (T233004)
21:15 taavi@deploy1002: Finished scap: Backport for gerrit:873880Stop setting $wgActorTableSchemaMigrationStage (T215466), gerrit:873887Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), gerrit:874418Pin cu_changes comment migration to old schema (T233004) (duration: 08m 49s)
21:08 taavi@deploy1002: taavi and zabe: Backport for gerrit:873880Stop setting $wgActorTableSchemaMigrationStage (T215466), gerrit:873887Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), gerrit:874418Pin cu_changes comment migration to old schema (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:06 taavi@deploy1002: Started scap: Backport for gerrit:873880Stop setting $wgActorTableSchemaMigrationStage (T215466), gerrit:873887Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), gerrit:874418Pin cu_changes comment migration to old schema (T233004)
19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.17 refs T325580
19:18 dduvall@deploy1002: deploy-promote aborted: (duration: 08m 55s)
19:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
17:37 claime: Finished parse reboots in eqiad
17:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
17:30 sukhe: sudo cumin -b 1 -s 5 'A:codfw and P{O:swift::proxy}' 'depool && sleep 3 && systemctl restart swift-proxy && sleep 3 && pool'
16:40 ejegg: fundraising EOY receipt calculation finished, restarted scheduled jobs
16:21 ejegg: fundraising scheduled jobs disabled for EOY receipt calculation
15:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
15:30 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
15:10 andrewbogott: upgrading and rebooting wikitech-static
15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:06 claime: Starting rolling reboot of parse* hosts in eqiad
15:05 taavi: UTC afternoon backports done
15:04 taavi@deploy1002: Finished scap: Backport for gerrit:874871SecurePoll: Add files for UCoC 2023 vote (T324793), gerrit:874872ucoc2023: Update populateEditCount to count Flow edits (T324793), gerrit:874873ucoc2023: Update populateEditCount to count Flow edits (T324793) (duration: 08m 10s)
15:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts graphite1004.eqiad.wmnet
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:58 taavi@deploy1002: taavi and taavi: Backport for gerrit:874871SecurePoll: Add files for UCoC 2023 vote (T324793), gerrit:874872ucoc2023: Update populateEditCount to count Flow edits (T324793), gerrit:874873ucoc2023: Update populateEditCount to count Flow edits (T324793) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:56 taavi@deploy1002: Started scap: Backport for gerrit:874871SecurePoll: Add files for UCoC 2023 vote (T324793), gerrit:874872ucoc2023: Update populateEditCount to count Flow edits (T324793), gerrit:874873ucoc2023: Update populateEditCount to count Flow edits (T324793)
14:53 taavi@deploy1002: Finished scap: Backport for gerrit:874870Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) (duration: 09m 13s)
14:48 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:45 taavi@deploy1002: taavi and matmarex: Backport for gerrit:874870Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:44 filippo@cumin1001: START - Cookbook sre.dns.netbox
14:44 taavi@deploy1002: Started scap: Backport for gerrit:874870Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961)
14:41 taavi@deploy1002: Finished scap: Backport for gerrit:874866Log token for the DiscussionTools mobile a/b test (T321961), gerrit:874867Log bucket/token for the DiscussionTools mobile a/b test (T321961), gerrit:874868a/b test anonymous ID was being reset because of cookie prefixes (T321961), gerrit:874869Log bucket/token for the DiscussionTools mobile a/b test (T321961) (duration: 08m 31s)
14:39 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts graphite1004.eqiad.wmnet
14:34 taavi@deploy1002: taavi and matmarex: Backport for gerrit:874866Log token for the DiscussionTools mobile a/b test (T321961), gerrit:874867Log bucket/token for the DiscussionTools mobile a/b test (T321961), gerrit:874868a/b test anonymous ID was being reset because of cookie prefixes (T321961), gerrit:874869Log bucket/token for the DiscussionTools mobile a/b test (T321961) synced to the testservers:
14:33 taavi@deploy1002: Started scap: Backport for gerrit:874866Log token for the DiscussionTools mobile a/b test (T321961), gerrit:874867Log bucket/token for the DiscussionTools mobile a/b test (T321961), gerrit:874868a/b test anonymous ID was being reset because of cookie prefixes (T321961), gerrit:874869Log bucket/token for the DiscussionTools mobile a/b test (T321961)
14:13 oblivian@deploy1002: Finished scap: Backport for gerrit:841139etcd: use the v3-style SRV record (T320397) (duration: 07m 58s)
14:07 oblivian@deploy1002: oblivian and oblivian: Backport for gerrit:841139etcd: use the v3-style SRV record (T320397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:05 oblivian@deploy1002: Started scap: Backport for gerrit:841139etcd: use the v3-style SRV record (T320397)
13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
13:46 moritzm: installing libksba security updates
13:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
13:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
12:33 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 02m 49s)
12:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
12:28 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling (duration: 01m 12s)
12:27 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling
11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P42744 and previous config saved to /var/cache/conftool/dbconfig/20230103-114030-marostegui.json
11:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:34 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:33 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:25 claime: Starting rolling reboot of parse* hosts in codfw
11:06 hashar: contint2001: starting Jenkins manually
11:04 marostegui: Change x1 binlog format to STATEMENT T255174
11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
10:58 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
10:53 marostegui: Restart eqiad sanitarium T326105
10:53 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
10:50 marostegui: Restart codfw sanitarium masters T326105
10:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
10:43 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
10:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1001.wikimedia.org
10:31 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1001.wikimedia.org
10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
09:27 vgutierrez: restarting varnish on cp5032 to clear VarnishChildRestarted alert - T325797
08:19 kartik@deploy1002: Finished scap: Backport for gerrit:869347Content Translation: Move ttwiki out of Beta (T319177) (duration: 16m 09s)
08:16 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
08:12 moritzm: installing Linux 4.19.269 on Buster hosts
08:12 phedenskog@deploy1002: Finished deploy [performance/navtiming@4f8c010]: (no justification provided) (duration: 00m 08s)
08:12 phedenskog@deploy1002: Started deploy [performance/navtiming@4f8c010]: (no justification provided)
08:05 kartik@deploy1002: kartik and kartik: Backport for gerrit:869347Content Translation: Move ttwiki out of Beta (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:03 kartik@deploy1002: Started scap: Backport for gerrit:869347Content Translation: Move ttwiki out of Beta (T319177)
04:58 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.17 refs T325580 (duration: 55m 31s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.17 refs T325580

2023-01-02

10:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
10:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s

This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.