Server Admin Log/Archive 74

2023-12-30

16:55 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:984627Add eventlogging_MediaWikiPingback stream (T323828) (duration: 15m 10s)

2023-12-29

22:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:00 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:00 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:12 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:09 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:02 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:01 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-28

23:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:50 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:45 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:35 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:20 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-27

22:53 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:53 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-23

20:22 _joe_: downgraded vopsbot on alert1001, hopefully should not keep panicing in this unexpected situation
15:40 taavi: fix date-time on mw2448 (which thought it is the year 2098) by manually setting it once and then restarting systemd-timesyncd.service after bios was reset in T353679
01:19 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
01:19 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.

2023-12-22

17:28 krinkle@deploy2002: Synchronized php-1.42.0-wmf.10/includes/skins/Skin.php: Ice6d6c (duration: 06m 25s)
15:16 jgiannelos@deploy2002: Finished deploy [restbase/deploy@5f2756a]: (no justification provided) (duration: 17m 36s)
14:58 jgiannelos@deploy2002: Started deploy [restbase/deploy@5f2756a]: (no justification provided)
14:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f0c9f9f]: (no justification provided) (duration: 09m 32s)
14:48 jgiannelos@deploy2002: Started deploy [restbase/deploy@f0c9f9f]: (no justification provided)
14:01 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4f56fff]: (no justification provided) (duration: 16m 57s)
13:45 reedy@deploy2002: Finished scap: T353920 (duration: 08m 02s)
13:44 jgiannelos@deploy2002: Started deploy [restbase/deploy@4f56fff]: (no justification provided)
13:37 reedy@deploy2002: Started scap: T353920
11:31 vgutierrez: upload golang-github-intel-go-cpuid_0.0~git20210602.5747e5c-2+deb12u1 to apt.wm.o (bookworm)
10:42 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:42 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:57 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .

2023-12-21

21:42 wfan: payment-wiki revision 1c96980a -> 3b281d10
19:31 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T346919 (duration: 06m 26s)
19:14 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.10 refs T350086
18:39 mutante: releases1003 - sudo chmod -R g+w /srv/org/wikimedia/releases/mediawiki/1.*
17:26 mutante: mirror1001 - when syncing tails mirror - @ERROR: max connections (23) reached -- try again later
17:23 mutante: [mirror1001:~] $ sudo systemctl start update-tails-mirror
17:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:18 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
16:17 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
16:10 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1008.eqiad.wmnet
16:10 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 volans@cumin1002: START - Cookbook sre.dns.netbox
16:03 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1008.eqiad.wmnet
15:59 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:54 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1007.eqiad.wmnet
15:54 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:53 volans@cumin1002: START - Cookbook sre.dns.netbox
15:47 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1007.eqiad.wmnet
15:44 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:44 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:38 kharlan@deploy2002: Finished scap: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target (duration: 10m 37s)
15:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:32 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
15:30 kharlan@deploy2002: kharlan and dreamyjazz: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:28 kharlan@deploy2002: Started scap: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target
15:24 kharlan@deploy2002: Finished scap: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target (duration: 11m 38s)
15:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:19 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:18 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
15:15 kharlan@deploy2002: kharlan and dreamyjazz: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:13 kharlan@deploy2002: Started scap: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target
15:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793) (duration: 09m 27s)
14:46 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Continuing with sync
14:44 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:43 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793)
14:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:31 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:27 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860) (duration: 08m 33s)
14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860)
14:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes bdwikimedia --fix # T351903 – 62 pages to fix, 62 were resolvable. 56 links to fix, 54 were resolvable, 2 were deleted.
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723) (duration: 09m 28s)
14:13 moritzm: re-added Eoghan to pwstore
14:09 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
14:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 10 hosts with reason: T352878
14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 10 hosts with reason: T352878
14:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 13 hosts with reason: T352878
14:08 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 13 hosts with reason: T352878
14:06 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723)
13:50 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:23 moritzm: installing libde265 security updates
12:29 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1006.eqiad.wmnet
12:29 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:27 volans@cumin1002: START - Cookbook sre.dns.netbox
12:20 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
12:18 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
12:14 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
11:37 claime: Manually restarted cassandra-a service on restbase2028 following OOM - T353456
11:23 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
11:22 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
11:16 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
11:13 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
10:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:29 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1006
09:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
08:59 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:54 apergos: UTC morning backport and config window done
08:50 ariel@deploy2002: Finished scap: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 12m 39s)
08:44 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:39 ariel@deploy2002: ariel and matmarex: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:37 ariel@deploy2002: Started scap: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
08:25 ariel@deploy2002: Finished scap: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 11m 07s)
08:19 ariel@deploy2002: matmarex and ariel: Continuing with sync
08:16 ariel@deploy2002: matmarex and ariel: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:14 ariel@deploy2002: Started scap: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
05:56 kart_: Updated MinT to 2023-12-20-071058-production
05:50 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
05:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
05:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
05:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:29 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:26 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
00:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage

2023-12-20

23:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
23:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
23:24 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host netbox1002
23:24 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host netbox1002
23:19 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wdqs1006
23:19 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
22:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1020-1021].eqiad.wmnet
22:59 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wdqs[1020-1021].eqiad.wmnet
22:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
22:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
22:25 ryankemper@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs[1006-1008].eqiad.wmnet
22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
22:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2080.codfw.wmnet with OS bullseye
22:24 ryankemper@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2079.codfw.wmnet with OS bullseye
22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
22:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
22:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
22:20 ryankemper@cumin1002: START - Cookbook sre.dns.netbox
22:18 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:18 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2077.codfw.wmnet with OS bullseye
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:17 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
22:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2076.codfw.wmnet with OS bullseye
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 ryankemper@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs[1006-1008].eqiad.wmnet
22:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:13 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:12 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:09 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:08 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
22:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:02 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
21:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:56 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
21:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:54 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:53 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
21:48 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
21:48 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:47 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:47 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:46 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:45 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
21:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
21:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:40 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:39 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:37 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2079.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2077.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2076.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
21:30 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.10 refs T350086 (duration: 05m 57s)
21:28 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
21:26 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
21:24 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.10 refs T350086
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2074.codfw.wmnet with OS bullseye
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:15 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835) (duration: 08m 13s)
21:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
21:08 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:07 ladsgroup@deploy2002: Started scap: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835)
21:04 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:02 aqu@deploy2002: Finished deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 28s)
21:01 aqu@deploy2002: Started deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
20:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
20:49 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835) (duration: 08m 19s)
20:47 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
20:47 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
20:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
20:42 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 ladsgroup@deploy2002: Started scap: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835)
20:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
20:31 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
20:30 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
19:51 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
19:48 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
19:30 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
19:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs1022.eqiad.wmnet
19:27 dancy@deploy2002: Finished php-fpm-restarts
19:24 dancy@deploy2002: Starting php-fpm-restarts
19:18 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
18:59 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
18:59 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
18:59 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
18:58 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
18:58 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
18:57 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
18:38 krinkle@deploy2002: Finished deploy [integration/docroot@355ddbb]: (no justification provided) (duration: 00m 07s)
18:38 krinkle@deploy2002: Started deploy [integration/docroot@355ddbb]: (no justification provided)
18:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
18:06 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
18:05 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
18:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
18:05 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
18:05 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
17:26 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
17:25 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1022.eqiad.wmnet
17:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
17:05 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:03 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:03 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
15:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2074.codfw.wmnet with OS bullseye
15:18 Lucas_WMDE: UTC afternoon backport+config window done
15:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) (duration: 08m 22s)
15:15 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
15:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:09 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1024.eqiad.wmnet
15:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751)
15:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1023.eqiad.wmnet
15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1023.eqiad.wmnet
15:05 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:04 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
15:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:01 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
14:58 inflatador: bking@cumin2002 disable/mask wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories on wdqs102[24] T352878
14:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) (duration: 10m 30s)
14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:46 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265)
14:43 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708) (duration: 10m 16s)
14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Continuing with sync
14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708)
14:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758) (duration: 09m 38s)
14:26 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
14:22 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758)
14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203) (duration: 15m 54s)
14:16 moritzm: installing distro-info-data updates from Bookworm point release
14:14 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Continuing with sync
14:12 moritzm: installing debootstrap bugfix updates from Bookworm point release
14:06 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 moritzm: installing cups updates from bookworm point release
14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203)
13:38 aqu@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513] (duration: 00m 05s)
13:38 aqu@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513]
13:38 aqu@deploy2002: Finished deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 30s)
13:37 aqu@deploy2002: Started deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:37 aqu@deploy2002: Finished deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162] (duration: 00m 06s)
13:37 aqu@deploy2002: Started deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162]
13:36 aqu@deploy2002: Finished deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 25s)
13:36 aqu@deploy2002: Started deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:35 aqu@deploy2002: Finished deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 09s)
13:35 aqu@deploy2002: Started deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 05s)
13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 11s)
13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:32 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
13:32 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:31 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
13:31 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
12:12 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:30 kostajh: T353703 Manual run: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/mediamoderation.dblist extensions/MediaModeration/maintenance/updateMetrics.php --verbose
10:22 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
10:22 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
09:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5002.wikimedia.org
09:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh5002.wikimedia.org
09:10 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh2001.wikimedia.org
09:10 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh2001.wikimedia.org
08:47 fabfur@cumin1001: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
06:31 ryankemper: T351671 Pooled `wdqs10[17-21]*`; data xfers completed and test queries are passing on `wdqs1018`. Will decom related hosts tomorrow (2023-12-20)
02:47 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
02:45 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
02:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
02:43 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
02:43 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
02:41 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
02:39 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
02:37 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
00:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671

2023-12-19

22:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
22:26 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
21:43 mforns@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: (no justification provided) (duration: 00m 11s)
21:43 mforns@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: (no justification provided)
21:43 mforns@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: (no justification provided) (duration: 00m 27s)
21:43 mforns@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: (no justification provided)
21:39 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984277Disable listings extension in more wikis (T253216) (duration: 07m 42s)
21:33 ladsgroup@deploy2002: ladsgroup: Continuing with sync
21:32 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984277Disable listings extension in more wikis (T253216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:31 ladsgroup@deploy2002: Started scap: Backport for gerrit:984277Disable listings extension in more wikis (T253216)
21:26 kostajh: UTC late deploys done
21:26 kharlan@deploy2002: Finished scap: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353) (duration: 10m 00s)
21:20 kharlan@deploy2002: kharlan and dani: Continuing with sync
21:17 kharlan@deploy2002: kharlan and dani: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 kharlan@deploy2002: Started scap: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353)
21:14 kharlan@deploy2002: Finished scap: Backport for gerrit:984269MediaModeration: Add dblist (T353703) (duration: 07m 44s)
21:08 kharlan@deploy2002: kharlan: Continuing with sync
21:08 kharlan@deploy2002: kharlan: Backport for gerrit:984269MediaModeration: Add dblist (T353703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 kharlan@deploy2002: Started scap: Backport for gerrit:984269MediaModeration: Add dblist (T353703)
19:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bullseye
18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:49 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 00m 05s)
18:48 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:43 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 03m 16s)
18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe] (duration: 00m 06s)
18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe]
18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe] (duration: 09m 18s)
18:29 mforns@deploy2002: Started deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe]
18:29 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance (duration: 00m 31s)
18:28 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance
18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
18:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
18:06 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testhost2001.codfw.wmnet with OS bullseye
17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
16:23 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:15 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
16:12 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
16:04 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 327700
16:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
16:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 139901
16:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
16:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
15:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5398
15:55 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
15:55 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
15:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5398
15:42 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559) (duration: 07m 01s)
15:38 moritzm: installing gnutls28 security updates on bookworm
15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Continuing with sync
15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:35 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559)
15:33 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php (duration: 07m 47s)
15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php
15:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
15:23 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
15:22 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334) (duration: 08m 49s)
15:16 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
15:15 moritzm: installing exim4 bugfix updates from Bookworm point release
15:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:13 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334)
15:10 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:33 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
14:32 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
14:31 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
14:30 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
14:29 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:29 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
14:25 Lucas_WMDE: UTC afternoon backport+config window done
14:25 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399) (duration: 07m 53s)
14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:21 kamila@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
14:21 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:19 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Continuing with sync
14:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:17 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399)
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298) (duration: 10m 22s)
14:10 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Continuing with sync
14:08 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:05 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298)
13:45 jgiannelos@deploy2002: Finished deploy [restbase/deploy@40c15b1]: (no justification provided) (duration: 27m 26s)
13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:35 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:32 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:17 jgiannelos@deploy2002: Started deploy [restbase/deploy@40c15b1]: (no justification provided)
13:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:05 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:05 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
12:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
12:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
11:31 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:46 moritzm: installing perl security updates on bookworm
10:19 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:14 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
10:14 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
09:23 elukey: reload thanos-rule on titan2001
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1003.wikimedia.org
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
08:26 jmm@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
08:22 jmm@cumin1002: START - Cookbook sre.dns.netbox
08:17 jmm@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1003.wikimedia.org
06:13 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
05:10 kart_: Updated MinT to 2023-12-12-065316-production
04:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.10 refs T350086 (duration: 51m 03s)
04:49 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
04:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
04:43 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
04:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
04:36 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:09 cstone: civicrm upgraded from e2d49d10 to c3cc80c7
04:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.10 refs T350086

2023-12-18

23:40 taavi: conftool codfw/appserver/nginx/mw2448.codfw.wmnet: pooled changed yes => inactive # T353679, not sure why it was not logged automatically
22:35 maryum: Deployed patch for T347704
22:08 dancy: UTC late backport window completed.
22:07 dancy@deploy2002: Finished scap: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) (duration: 13m 34s)
21:57 dancy@deploy2002: dancy and kemayo: Continuing with sync
21:56 dancy@deploy2002: dancy and kemayo: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:54 dancy@deploy2002: Started scap: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129)
21:17 dancy@deploy2002: Finished scap: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393) (duration: 08m 30s)
21:11 dancy@deploy2002: dani and dancy: Continuing with sync
21:10 dancy@deploy2002: dani and dancy: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 dancy@deploy2002: Started scap: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393)
21:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:05 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
21:03 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:48 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:983939Add message_key_fields to page_content_change stream (T338231) (duration: 06m 32s)
20:31 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:19 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:19 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:14 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
17:12 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
16:55 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
16:54 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:983756 Bumping portals to master (T128546) (duration: 06m 28s)
16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:52 akosiaris@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:983756 Bumping portals to master (T128546) (duration: 06m 08s)
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2076']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2074']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2079']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2077']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2078']
16:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2080
16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2080
16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
16:33 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
16:31 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
16:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
16:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
16:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2080']
16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2078']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2077']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2076']
16:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2075']
16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2074']
16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
16:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
15:16 fabfur: repooling cp4037 (T352876)
15:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4037.ulsfo.wmnet
15:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp4037.ulsfo.wmnet
15:04 urbanecm@deploy2002: Finished scap: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) (duration: 10m 20s)
14:58 urbanecm@deploy2002: cwhite and urbanecm and chlod: Continuing with sync
14:55 urbanecm@deploy2002: cwhite and urbanecm and chlod: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 urbanecm@deploy2002: Started scap: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076)
14:52 urbanecm@deploy2002: Finished scap: Backport for gerrit:981714Enable action blocks for zhwiki (T353120) (duration: 08m 58s)
14:47 urbanecm@deploy2002: milkydefer and urbanecm: Continuing with sync
14:45 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
14:45 moritzm: installing nagios-plugins-contrib bugfix updates
14:44 urbanecm@deploy2002: milkydefer and urbanecm: Backport for gerrit:981714Enable action blocks for zhwiki (T353120) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:44 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided) (duration: 00m 32s)
14:44 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided)
14:43 urbanecm@deploy2002: Started scap: Backport for gerrit:981714Enable action blocks for zhwiki (T353120)
14:43 urbanecm@deploy2002: Finished scap: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829) (duration: 19m 00s)
14:37 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Continuing with sync
14:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
14:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
14:34 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:24 urbanecm@deploy2002: Started scap: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829)
14:13 moritzm: installing node-undici security updates
13:15 moritzm: installing intel-microcode security updates on buster hosts
13:08 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
12:56 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:55 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:50 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:50 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:45 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
12:41 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-canary
12:26 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-canary
12:26 kamila@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:25 kamila@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:24 kamila@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:23 kamila@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:20 kamila@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:20 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
12:20 kamila@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:19 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
12:19 kamila@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:18 kamila@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:14 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:13 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:12 Emperor: restart swift-proxy and envoyproxy on ms-fe1012
12:10 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:04 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:03 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:01 moritzm: installing ncurses security updates
11:59 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:58 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:51 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:51 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
11:41 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
11:41 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
11:39 moritzm: installing qemu security updates on bookworm
11:38 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:37 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
11:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:36 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
10:56 moritzm: restarting apache/FPM on mw canaries to pick up gnutls update
10:52 moritzm: installing gnutls28 security updates
10:47 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
10:44 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
10:39 moritzm: installing jetty9 security updates
10:29 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:29 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:17 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
10:13 XioNoX: remove VRRP pinning on cr1-eqiad/cr2-eqiad/cr2-codfw
10:09 moritzm: installing Linux 6.1.67 updates on Bookworm hosts
09:45 XioNoX: make eqiad-codfw 100G link primary
09:10 vgutierrez: vgutierrez@acmechief1002:~$ sudo -i keyholder arm - T352242

2023-12-17

12:59 elukey: restart kubelet on ml-serve1001 (errors while syncing old containers)

2023-12-16

01:21 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided) (duration: 00m 10s)
01:21 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided)
00:44 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@63804c4]: (no justification provided) (duration: 00m 25s)
00:44 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@63804c4]: (no justification provided)
00:05 jhathaway: unbreaking my puppet change with, https://gerrit.wikimedia.org/r/c/operations/puppet/+/983504

2023-12-15

23:46 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@9600237]: (no justification provided) (duration: 00m 27s)
23:46 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@9600237]: (no justification provided)
23:06 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided) (duration: 00m 25s)
23:06 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided)
22:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:03 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided) (duration: 00m 25s)
22:03 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided)
21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS (duration: 00m 06s)
21:48 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS
21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS (duration: 81m 46s)
21:26 mutante: running puppet on all prometheus*
20:26 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS
15:44 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:25 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:01 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:00 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
14:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54482 and previous config saved to /var/cache/conftool/dbconfig/20231215-144624-arnaudb.json
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
14:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
14:40 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54481 and previous config saved to /var/cache/conftool/dbconfig/20231215-143812-arnaudb.json
14:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 80%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54480 and previous config saved to /var/cache/conftool/dbconfig/20231215-143118-arnaudb.json
14:27 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
14:27 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54479 and previous config saved to /var/cache/conftool/dbconfig/20231215-142307-arnaudb.json
14:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 40%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54478 and previous config saved to /var/cache/conftool/dbconfig/20231215-141613-arnaudb.json
14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54477 and previous config saved to /var/cache/conftool/dbconfig/20231215-140802-arnaudb.json
14:07 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:07 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 20%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54476 and previous config saved to /var/cache/conftool/dbconfig/20231215-140108-arnaudb.json
13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54475 and previous config saved to /var/cache/conftool/dbconfig/20231215-135257-arnaudb.json
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db2179 to repool w/ api', diff saved to https://phabricator.wikimedia.org/P54474 and previous config saved to /var/cache/conftool/dbconfig/20231215-135228-arnaudb.json
13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54473 and previous config saved to /var/cache/conftool/dbconfig/20231215-134603-arnaudb.json
13:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
13:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
12:25 hashar@deploy2002: Finished deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515 (duration: 00m 06s)
12:25 hashar@deploy2002: Started deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515
11:28 hashar@deploy2002: Finished deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181 (duration: 00m 08s)
11:28 hashar@deploy2002: Started deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181
10:56 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
10:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
10:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
09:05 moritzm: installing Linux 6.1.67 packages on Bookworm hosts
08:56 XioNoX: shutdown already down IPv6 BGP session from ulsfo to the office

2023-12-14

23:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1002.eqiad.wmnet with OS bookworm
23:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
22:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
22:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
21:24 ssastry@deploy2002: Finished scap: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering" (duration: 10m 38s)
21:18 ssastry@deploy2002: ssastry: Continuing with sync
21:14 ssastry@deploy2002: ssastry: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 ssastry@deploy2002: Started scap: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering"
20:52 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
20:49 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
20:48 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:47 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:47 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
20:46 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
20:45 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
20:45 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[1009-1010].eqiad.wmnet
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
20:40 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
20:37 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
20:37 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
20:31 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
20:23 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[1009-1010].eqiad.wmnet
20:06 jmm@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
20:02 jmm@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.9 refs T350085
19:03 brennen: 1.42.0-wmf.9 (T350085) status: no current blockers, although we should keep an eye on T353400. rolling to all wikis.
18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54462 and previous config saved to /var/cache/conftool/dbconfig/20231214-183508-arnaudb.json
18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54461 and previous config saved to /var/cache/conftool/dbconfig/20231214-183459-arnaudb.json
18:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54460 and previous config saved to /var/cache/conftool/dbconfig/20231214-182003-arnaudb.json
18:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54459 and previous config saved to /var/cache/conftool/dbconfig/20231214-181954-arnaudb.json
18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54458 and previous config saved to /var/cache/conftool/dbconfig/20231214-180458-arnaudb.json
18:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54457 and previous config saved to /var/cache/conftool/dbconfig/20231214-180449-arnaudb.json
17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54456 and previous config saved to /var/cache/conftool/dbconfig/20231214-174953-arnaudb.json
17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54455 and previous config saved to /var/cache/conftool/dbconfig/20231214-174944-arnaudb.json
17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54453 and previous config saved to /var/cache/conftool/dbconfig/20231214-173448-arnaudb.json
17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54452 and previous config saved to /var/cache/conftool/dbconfig/20231214-173439-arnaudb.json
17:24 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:23 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54451 and previous config saved to /var/cache/conftool/dbconfig/20231214-171943-arnaudb.json
17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54450 and previous config saved to /var/cache/conftool/dbconfig/20231214-171934-arnaudb.json
17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54449 and previous config saved to /var/cache/conftool/dbconfig/20231214-170438-arnaudb.json
17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54448 and previous config saved to /var/cache/conftool/dbconfig/20231214-170428-arnaudb.json
16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54446 and previous config saved to /var/cache/conftool/dbconfig/20231214-164925-arnaudb.json
16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54445 and previous config saved to /var/cache/conftool/dbconfig/20231214-164921-arnaudb.json
16:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:43 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54444 and previous config saved to /var/cache/conftool/dbconfig/20231214-163420-arnaudb.json
16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54443 and previous config saved to /var/cache/conftool/dbconfig/20231214-163416-arnaudb.json
16:24 akosiaris: updates of all wikikube services done T352906
16:20 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:18 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:17 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
16:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
16:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
16:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:15 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
16:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:13 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:12 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
16:12 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
16:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
16:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
16:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
16:09 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
16:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
16:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
16:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
16:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
16:08 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
16:07 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
16:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
16:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
16:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
16:06 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
16:06 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:06 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
16:02 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:01 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
16:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:57 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
15:55 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet2002.codfw.wmnet
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
15:53 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
15:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
15:52 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
15:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
15:51 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
15:51 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
15:51 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
15:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
15:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
15:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
15:48 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
15:46 dzahn@cumin2002: START - Cookbook sre.dns.netbox
15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
15:42 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
15:42 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet2002.codfw.wmnet
15:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
15:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
15:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:35 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
15:31 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided) (duration: 00m 48s)
15:30 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided)
15:29 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:28 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
15:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
15:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
15:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
15:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
15:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:46 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:43 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:43 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:22 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:07 moritzm: installing ruby-rails-html-sanitizer security updates
14:01 moritzm: installing ruby-loofah security updates
13:56 moritzm: installing reportbug bugfix updates on buster
13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1137.eqiad.wmnet
13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:53 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:52 moritzm: installing netty security updates
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1148.eqiad.wmnet
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:51 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1132.eqiad.wmnet
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
13:50 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
13:48 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1148.eqiad.wmnet
13:43 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1137.eqiad.wmnet
13:42 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1132.eqiad.wmnet
13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'decommissionning hosts', diff saved to https://phabricator.wikimedia.org/P54437 and previous config saved to /var/cache/conftool/dbconfig/20231214-134203-arnaudb.json
13:21 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1234.eqiad.wmnet
13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1134 in db1234 for T344036', diff saved to https://phabricator.wikimedia.org/P54436 and previous config saved to /var/cache/conftool/dbconfig/20231214-131913-arnaudb.json
13:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:12 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1149 in db1249 for T344036', diff saved to https://phabricator.wikimedia.org/P54435 and previous config saved to /var/cache/conftool/dbconfig/20231214-131017-arnaudb.json
13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
12:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
12:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:42 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:10 cgoubert@deploy2002: Finished scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 04m 16s)
12:05 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
12:03 cgoubert@deploy2002: sync-world aborted: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 00m 02s)
12:03 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
12:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
12:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54434 and previous config saved to /var/cache/conftool/dbconfig/20231214-115332-arnaudb.json
11:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:49 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
11:42 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54433 and previous config saved to /var/cache/conftool/dbconfig/20231214-113826-arnaudb.json
11:31 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
11:30 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
11:25 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
11:24 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54432 and previous config saved to /var/cache/conftool/dbconfig/20231214-112321-arnaudb.json
11:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54431 and previous config saved to /var/cache/conftool/dbconfig/20231214-110816-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54430 and previous config saved to /var/cache/conftool/dbconfig/20231214-110754-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54429 and previous config saved to /var/cache/conftool/dbconfig/20231214-110733-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54428 and previous config saved to /var/cache/conftool/dbconfig/20231214-110714-arnaudb.json
11:06 _joe_: restarted apache2 on lists1001
10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54427 and previous config saved to /var/cache/conftool/dbconfig/20231214-105814-arnaudb.json
10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54426 and previous config saved to /var/cache/conftool/dbconfig/20231214-105311-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54425 and previous config saved to /var/cache/conftool/dbconfig/20231214-105248-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54424 and previous config saved to /var/cache/conftool/dbconfig/20231214-105228-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54423 and previous config saved to /var/cache/conftool/dbconfig/20231214-105209-arnaudb.json
10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
10:45 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 90%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54422 and previous config saved to /var/cache/conftool/dbconfig/20231214-104308-arnaudb.json
10:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54421 and previous config saved to /var/cache/conftool/dbconfig/20231214-103806-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54420 and previous config saved to /var/cache/conftool/dbconfig/20231214-103743-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54419 and previous config saved to /var/cache/conftool/dbconfig/20231214-103723-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54418 and previous config saved to /var/cache/conftool/dbconfig/20231214-103704-arnaudb.json
10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 80%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54417 and previous config saved to /var/cache/conftool/dbconfig/20231214-102803-arnaudb.json
10:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54416 and previous config saved to /var/cache/conftool/dbconfig/20231214-102301-arnaudb.json
10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54415 and previous config saved to /var/cache/conftool/dbconfig/20231214-102238-arnaudb.json
10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54414 and previous config saved to /var/cache/conftool/dbconfig/20231214-102218-arnaudb.json
10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54413 and previous config saved to /var/cache/conftool/dbconfig/20231214-102159-arnaudb.json
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
10:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
10:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
10:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 70%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54412 and previous config saved to /var/cache/conftool/dbconfig/20231214-101258-arnaudb.json
10:12 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
10:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
10:08 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54411 and previous config saved to /var/cache/conftool/dbconfig/20231214-100756-arnaudb.json
10:07 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
10:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54410 and previous config saved to /var/cache/conftool/dbconfig/20231214-100733-arnaudb.json
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54409 and previous config saved to /var/cache/conftool/dbconfig/20231214-100713-arnaudb.json
10:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
10:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54408 and previous config saved to /var/cache/conftool/dbconfig/20231214-100654-arnaudb.json
10:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
10:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
10:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
10:05 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
10:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
09:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
09:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 60%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54407 and previous config saved to /var/cache/conftool/dbconfig/20231214-095753-arnaudb.json
09:56 godog: remove >= 3 months old thanos blocks for prometheus/ops in eqiad/codfw and only for a single replica - T351927
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54406 and previous config saved to /var/cache/conftool/dbconfig/20231214-095228-arnaudb.json
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54405 and previous config saved to /var/cache/conftool/dbconfig/20231214-095208-arnaudb.json
09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54404 and previous config saved to /var/cache/conftool/dbconfig/20231214-095149-arnaudb.json
09:51 hashar: Restarting CI Jenkins
09:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54402 and previous config saved to /var/cache/conftool/dbconfig/20231214-094248-arnaudb.json
09:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
09:39 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
09:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1002.eqiad.wmnet with OS bullseye
09:30 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 40%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54401 and previous config saved to /var/cache/conftool/dbconfig/20231214-092743-arnaudb.json
09:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
09:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
09:25 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris: update all the other services. T352906
09:24 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
09:22 godog: delete raw replica blocks for prometheus/ops (only one replica) in codfw - T351927
09:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
09:21 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
09:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 30%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54400 and previous config saved to /var/cache/conftool/dbconfig/20231214-091238-arnaudb.json
09:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host cumin1002.eqiad.wmnet
09:10 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cumin1002.eqiad.wmnet with OS bullseye
09:10 apergos: UTC morning backport and config window done
09:09 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
09:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
09:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
09:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
09:07 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
09:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
09:06 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
09:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
09:03 ariel@deploy2002: Finished scap: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) (duration: 09m 05s)
09:00 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 20%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54399 and previous config saved to /var/cache/conftool/dbconfig/20231214-085733-arnaudb.json
08:56 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:56 ariel@deploy2002: ariel and matmarex: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:54 ariel@deploy2002: Started scap: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262)
08:47 ariel@deploy2002: Finished scap: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262) (duration: 08m 39s)
08:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54398 and previous config saved to /var/cache/conftool/dbconfig/20231214-084228-arnaudb.json
08:40 ariel@deploy2002: matmarex and ariel: Continuing with sync
08:39 ariel@deploy2002: matmarex and ariel: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:38 ariel@deploy2002: Started scap: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262)
08:35 ariel@deploy2002: Finished scap: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947) (duration: 10m 20s)
08:34 XioNoX: drain eqiad-codfw Arelion link for 100G migration
08:27 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:26 ariel@deploy2002: ariel and matmarex: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:24 ariel@deploy2002: Started scap: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947)
08:20 ariel@deploy2002: Finished scap: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486) (duration: 10m 33s)
08:13 ariel@deploy2002: ariel: Continuing with sync
08:11 ariel@deploy2002: ariel: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:10 ariel@deploy2002: Started scap: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486)
08:08 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:02 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
08:01 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cumin1002.eqiad.wmnet on all recursors
07:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache cumin1002.eqiad.wmnet on all recursors
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host cumin1002.eqiad.wmnet
07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
07:49 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
07:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
07:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
03:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
03:06 bvibber: cleanupOrphanedTranscodes complete. requeueTranscodes continues... forever and ever and ever
02:54 bvibber: brion running cleanupOrphanedTranscodes on commonswiki on mwmaint2002
01:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:25 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
00:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:40 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:38 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:38 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=93) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:34 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1002.eqiad.wmnet
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
00:17 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
00:15 dzahn@cumin2002: START - Cookbook sre.dns.netbox
00:11 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet

2023-12-13

23:48 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host acmechief1002.eqiad.wmnet
23:48 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host acmechief1002.eqiad.wmnet with OS bookworm
23:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
23:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
23:17 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
23:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
23:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
22:58 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:57 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]] (duration: 09m 42s)
22:57 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
22:54 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:53 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:50 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Continuing with sync
22:49 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:48 jhuneidi@deploy2002: Started scap: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]]
22:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:45 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering (duration: 10m 08s)
22:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
22:45 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
22:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1004.eqiad.wmnet with OS bullseye
22:40 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
22:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
22:38 jhuneidi@deploy2002: ssastry and jhuneidi: Continuing with sync
22:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:37 jhuneidi@deploy2002: ssastry and jhuneidi: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
22:36 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
22:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
22:35 jhuneidi@deploy2002: Started scap: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering
22:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
22:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
22:18 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685) (duration: 12m 33s)
22:18 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:17 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief1002.eqiad.wmnet on all recursors
22:16 brett@cumin2002: START - Cookbook sre.dns.wipe-cache acmechief1002.eqiad.wmnet on all recursors
22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:16 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:15 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:12 brett@cumin2002: START - Cookbook sre.dns.netbox
22:11 brett@cumin2002: START - Cookbook sre.ganeti.makevm for new host acmechief1002.eqiad.wmnet
22:11 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Continuing with sync
22:09 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1004.eqiad.wmnet with OS bullseye
22:07 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:05 jhuneidi@deploy2002: Started scap: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685)
22:01 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) (duration: 10m 28s)
21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
21:54 jhuneidi@deploy2002: jhuneidi and jdlrobson: Continuing with sync
21:52 jhuneidi@deploy2002: jhuneidi and jdlrobson: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 jhuneidi@deploy2002: Started scap: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099)
21:04 cstone: civicrm upgraded from 834606ef to e2d49d10
20:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts planet1002.eqiad.wmnet
20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet
19:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2031.codfw.wmnet
19:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2031.codfw.wmnet
19:19 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.9 refs T350085 (duration: 07m 29s)
19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.9 refs T350085
19:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
19:01 brennen: 1.42.0-wmf.9 (T350085) status: no blockers, rolling to group1
18:07 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:07 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:06 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:05 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:58 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:57 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
17:27 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:25 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1006']
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1005']
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1004']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1006']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1005']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1004']
16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
16:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
16:39 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
16:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
16:38 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54395 and previous config saved to /var/cache/conftool/dbconfig/20231213-163657-arnaudb.json
16:36 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
16:36 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:36 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:31 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
16:30 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
16:29 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1006
16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
16:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1006
16:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
16:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1005
16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
16:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1005
16:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
16:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1004
16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 90%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54394 and previous config saved to /var/cache/conftool/dbconfig/20231213-162152-arnaudb.json
16:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1004
16:19 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
16:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
16:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
16:16 ladsgroup@deploy2002: Finished scap: Backport for gerrit:982824Fix my email in the key list (duration: 08m 45s)
16:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
16:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
16:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
16:13 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:12 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
16:09 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:982824Fix my email in the key list synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:07 ladsgroup@deploy2002: Started scap: Backport for gerrit:982824Fix my email in the key list
16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 80%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54393 and previous config saved to /var/cache/conftool/dbconfig/20231213-160647-arnaudb.json
16:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
16:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
16:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
16:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
16:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
16:00 akosiaris: upgrade apertium, bluebberoid everywhere to use the latest service_proxy image, 1.23.10-2-s4-20231203 T352906
15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
15:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
15:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
15:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
15:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
15:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
15:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 70%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54392 and previous config saved to /var/cache/conftool/dbconfig/20231213-155142-arnaudb.json
15:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
15:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
15:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
15:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
15:46 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
15:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
15:40 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
15:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
15:39 claime: Deploying shellbox: update php-fpm-exporter version - 982432
15:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54389 and previous config saved to /var/cache/conftool/dbconfig/20231213-153636-arnaudb.json
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1147.eqiad.wmnet
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:35 Amir1: tagging 1.41.0-rc.0 in core
15:35 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:33 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:28 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1147.eqiad.wmnet
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1129.eqiad.wmnet
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:24 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:21 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54387 and previous config saved to /var/cache/conftool/dbconfig/20231213-152131-arnaudb.json
15:17 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
15:16 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1129.eqiad.wmnet
15:15 ladsgroup@deploy2002: Finished scap: Backport for gerrit:982499docroot: Add my pgp key (duration: 09m 50s)
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1128.eqiad.wmnet
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:12 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:10 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
15:07 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:982499docroot: Add my pgp key synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54386 and previous config saved to /var/cache/conftool/dbconfig/20231213-150626-arnaudb.json
15:06 ladsgroup@deploy2002: Started scap: Backport for gerrit:982499docroot: Add my pgp key
15:05 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1128.eqiad.wmnet
15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1128 29 and 47', diff saved to https://phabricator.wikimedia.org/P54385 and previous config saved to /var/cache/conftool/dbconfig/20231213-150425-arnaudb.json
15:00 Lucas_WMDE: UTC afternoon backport+config window done
15:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829) (duration: 08m 29s)
14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:51 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829)
14:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 30%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54384 and previous config saved to /var/cache/conftool/dbconfig/20231213-145121-arnaudb.json
14:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496) (duration: 19m 09s)
14:43 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:43 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:42 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Continuing with sync
14:42 hashar: Restarted Gerrit on gerrit1003 and gerrit2002
14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54383 and previous config saved to /var/cache/conftool/dbconfig/20231213-143616-arnaudb.json
14:33 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:30 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496)
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54381 and previous config saved to /var/cache/conftool/dbconfig/20231213-142111-arnaudb.json
14:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
14:02 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
14:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1148 in db1248 for T344036', diff saved to https://phabricator.wikimedia.org/P54380 and previous config saved to /var/cache/conftool/dbconfig/20231213-140017-arnaudb.json
13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:53 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
13:51 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:50 moritzm: installing postgresql-11 security updates
13:49 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1233 for T344036', diff saved to https://phabricator.wikimedia.org/P54379 and previous config saved to /var/cache/conftool/dbconfig/20231213-134632-arnaudb.json
13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:27 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1132 in db1232 for T344036', diff saved to https://phabricator.wikimedia.org/P54376 and previous config saved to /var/cache/conftool/dbconfig/20231213-132511-arnaudb.json
13:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:05 godog: delete raw replica blocks for prometheus/ops (only one replica) in eqiad - T351927
12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
12:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:40 moritzm: installing OpenSSH security updates on bullseye
12:25 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:25 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:09 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bookworm
12:02 vgutierrez: setting cp4037 as inactive - T352876
11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
11:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
11:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
11:33 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bookworm
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
10:50 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
10:49 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
10:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bookworm
10:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
10:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:24 claime: Updating mw-debug prometheus-php-fpm-exporter to 0.0.3
10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
10:11 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646 (duration: 00m 42s)
10:11 hashar@deploy2002: Started deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646
10:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54374 and previous config saved to /var/cache/conftool/dbconfig/20231213-100708-arnaudb.json
10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54373 and previous config saved to /var/cache/conftool/dbconfig/20231213-100651-arnaudb.json
10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54372 and previous config saved to /var/cache/conftool/dbconfig/20231213-100555-arnaudb.json
10:00 moritzm: failover ganeti master in eqsin to ganeti5007
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
09:57 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bookworm
09:56 hashar: Disabled puppet agent on contint1002, contint2002, releases1003 and releases2003 to progressively deploy https://gerrit.wikimedia.org/r/922555
09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54371 and previous config saved to /var/cache/conftool/dbconfig/20231213-095203-arnaudb.json
09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54370 and previous config saved to /var/cache/conftool/dbconfig/20231213-095146-arnaudb.json
09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54369 and previous config saved to /var/cache/conftool/dbconfig/20231213-095049-arnaudb.json
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
09:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54368 and previous config saved to /var/cache/conftool/dbconfig/20231213-093658-arnaudb.json
09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54367 and previous config saved to /var/cache/conftool/dbconfig/20231213-093641-arnaudb.json
09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54366 and previous config saved to /var/cache/conftool/dbconfig/20231213-093544-arnaudb.json
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
09:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:25 brouberol: increasing pod max requested memory to a higher value than the container max requested memory for dse-k8s-eqiad - T351722
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54365 and previous config saved to /var/cache/conftool/dbconfig/20231213-092153-arnaudb.json
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54364 and previous config saved to /var/cache/conftool/dbconfig/20231213-092136-arnaudb.json
09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54363 and previous config saved to /var/cache/conftool/dbconfig/20231213-092039-arnaudb.json
09:20 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54362 and previous config saved to /var/cache/conftool/dbconfig/20231213-090648-arnaudb.json
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54361 and previous config saved to /var/cache/conftool/dbconfig/20231213-090631-arnaudb.json
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54360 and previous config saved to /var/cache/conftool/dbconfig/20231213-090534-arnaudb.json
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 202120
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 202120
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54359 and previous config saved to /var/cache/conftool/dbconfig/20231213-085143-arnaudb.json
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54358 and previous config saved to /var/cache/conftool/dbconfig/20231213-085125-arnaudb.json
08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54357 and previous config saved to /var/cache/conftool/dbconfig/20231213-085027-arnaudb.json
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
08:48 XioNoX: delete bgp group Confed_drmrs from cr1-esams - T347892
08:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
08:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 46997
08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46997
08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54356 and previous config saved to /var/cache/conftool/dbconfig/20231213-083638-arnaudb.json
08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54355 and previous config saved to /var/cache/conftool/dbconfig/20231213-083620-arnaudb.json
08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54354 and previous config saved to /var/cache/conftool/dbconfig/20231213-083522-arnaudb.json
08:30 XioNoX: delete bgp group Confed_esams from cr2-drmrs - T347892
08:25 mlitn@deploy2002: Finished scap: Backport for gerrit:979113No custom UW licensing config (duration: 09m 43s)
08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54353 and previous config saved to /var/cache/conftool/dbconfig/20231213-082133-arnaudb.json
08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54352 and previous config saved to /var/cache/conftool/dbconfig/20231213-082115-arnaudb.json
08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54351 and previous config saved to /var/cache/conftool/dbconfig/20231213-082017-arnaudb.json
08:18 mlitn@deploy2002: mlitn: Continuing with sync
08:17 mlitn@deploy2002: mlitn: Backport for gerrit:979113No custom UW licensing config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:16 mlitn@deploy2002: Started scap: Backport for gerrit:979113No custom UW licensing config
08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bookworm
08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54350 and previous config saved to /var/cache/conftool/dbconfig/20231213-080628-arnaudb.json
08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54349 and previous config saved to /var/cache/conftool/dbconfig/20231213-080610-arnaudb.json
08:06 moritzm: installing openssh security updates
08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54348 and previous config saved to /var/cache/conftool/dbconfig/20231213-080512-arnaudb.json
07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54347 and previous config saved to /var/cache/conftool/dbconfig/20231213-075123-arnaudb.json
07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54346 and previous config saved to /var/cache/conftool/dbconfig/20231213-075105-arnaudb.json
07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54345 and previous config saved to /var/cache/conftool/dbconfig/20231213-075006-arnaudb.json
07:43 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bookworm
06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bookworm
06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bookworm
03:41 hashar@deploy2002: Finished deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109 (duration: 00m 08s)
03:41 hashar@deploy2002: Started deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109

2023-12-12

23:56 ejegg: donorwiki upgraded from f7407053 to bc49e5a6
23:26 tzatziki: removing 2 files for legal compliance
23:05 tzatziki: removing 2 files for legal compliance
22:57 mutante: planet - switched to eqiad and bookworm backend (T348392 T345617) - https://meta.wikimedia.org/wiki/Planet_Wikimedia
22:43 mutante: planet2003 -manually upgrade rawdog package to 3.0.2 T348392
21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: reimage
21:18 samtar@deploy2002: Finished scap: Backport for gerrit:980963Add stream config for Android article instruments (T351292) (duration: 11m 59s)
21:10 samtar@deploy2002: cjming and samtar: Continuing with sync
21:07 samtar@deploy2002: cjming and samtar: Backport for gerrit:980963Add stream config for Android article instruments (T351292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 samtar@deploy2002: Started scap: Backport for gerrit:980963Add stream config for Android article instruments (T351292)
20:42 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
20:40 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
20:38 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
20:37 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
20:33 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
20:30 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
20:28 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:17 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:05 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
20:04 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
19:59 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:57 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:56 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:46 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:46 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:43 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.9 refs T350085
19:33 brennen@deploy2002: Finished scap: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) (duration: 09m 41s)
19:26 brennen@deploy2002: brennen and ssastry: Continuing with sync
19:25 brennen@deploy2002: brennen and ssastry: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:23 brennen@deploy2002: Started scap: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257)
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:08 brennen: 1.42.0-wmf.9 (T350085) status: deploying a fix for T353257 and then will proceed to group0.
19:03 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab2002.codfw.wmnet with OS bullseye
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:32 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
18:31 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
18:29 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:28 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:12 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
18:10 mutante: reimaging phab2002 (stand-by phorge server with bullseye - T327068
17:42 ejegg: fundraising civicrm upgraded from 8c107215 to 834606ef
17:33 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
17:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
17:32 ejegg: payments-wiki upgraded from 1d24dc90 to c1181b95
17:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
17:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
17:16 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
16:33 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1060']
16:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1060']
16:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274 (duration: 00m 48s)
16:04 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274
16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274 (duration: 00m 32s)
16:03 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274
16:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
16:03 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
16:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
15:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
15:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:26 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:24 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:22 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:22 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:21 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:21 claime: Deploying new calico BGPPeers for codfw rows a/b - T352893
14:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1137 in db1237 for T344036', diff saved to https://phabricator.wikimedia.org/P54339 and previous config saved to /var/cache/conftool/dbconfig/20231212-145205-arnaudb.json
14:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:48 phuedx: UTC afternoon backport window done
14:47 phuedx@deploy2002: Finished scap: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393) (duration: 24m 33s)
14:39 phuedx@deploy2002: phuedx and dani: Continuing with sync
14:35 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
14:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
14:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1211 in db1226 for T344036', diff saved to https://phabricator.wikimedia.org/P54336 and previous config saved to /var/cache/conftool/dbconfig/20231212-143233-arnaudb.json
14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:24 phuedx@deploy2002: phuedx and dani: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:22 phuedx@deploy2002: Started scap: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393)
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:45 brouberol: increasing max container memory requests in dse-k8s from 3GB to 8GB - T351722
13:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
13:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
13:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
13:00 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
12:57 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1011.eqiad.wmnet
12:55 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
12:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
12:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
12:51 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
12:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup1011.eqiad.wmnet
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1010.eqiad.wmnet
12:45 jayme: increasing memory of ganeti instance kubemaster2001.codfw.wmnet from 4G to 12G (requires reboot) - T353233
12:38 claime: Uncordoning kubernetes10[59-62].eqiad.wmnet - T353135
12:37 claime: Pooling kubernetes10[59-62].eqiad.wmnet - T353135
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2011.codfw.wmnet
12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2011.codfw.wmnet
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2010.codfw.wmnet
12:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2010.codfw.wmnet
11:43 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
11:43 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
11:28 moritzm: installing postgresql-11 security updates
10:50 samtar@deploy2002: Finished scap: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041) (duration: 09m 51s)
10:47 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1229 for T344036', diff saved to https://phabricator.wikimedia.org/P54335 and previous config saved to /var/cache/conftool/dbconfig/20231212-104404-arnaudb.json
10:43 samtar@deploy2002: samtar and samwilson: Continuing with sync
10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:41 samtar@deploy2002: samtar and samwilson: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:40 samtar@deploy2002: Started scap: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041)
10:30 moritzm: installing nghttp2 security updates
10:16 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:15 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:13 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:57 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 clone from db1128 ', diff saved to https://phabricator.wikimedia.org/P54334 and previous config saved to /var/cache/conftool/dbconfig/20231212-095352-arnaudb.json
09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:43 moritzm: installing ca-certificates-java updates from Bookworm point release
09:08 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1147 in db1247 for T344036', diff saved to https://phabricator.wikimedia.org/P54333 and previous config saved to /var/cache/conftool/dbconfig/20231212-090652-arnaudb.json
09:05 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
08:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
07:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
07:52 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
07:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
07:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
07:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
06:46 marostegui@deploy2002: Finished scap: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1" (duration: 09m 00s)
06:38 marostegui@deploy2002: marostegui: Continuing with sync
06:38 marostegui@deploy2002: marostegui: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:37 marostegui@deploy2002: Started scap: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1"
06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bookworm
06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bookworm
05:59 marostegui@deploy2002: Finished scap: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787) (duration: 08m 35s)
05:52 marostegui@deploy2002: marostegui: Continuing with sync
05:52 marostegui@deploy2002: marostegui: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:51 marostegui@deploy2002: Started scap: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787)
05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
04:58 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.5 (duration: 02m 17s)
04:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.9 refs T350085 (duration: 53m 03s)
04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.9 refs T350085

2023-12-11

22:39 jdrewniak@deploy2002: Finished scap: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]] (duration: 12m 14s)
22:32 jdrewniak@deploy2002: jdrewniak: Continuing with sync
22:28 jdrewniak@deploy2002: jdrewniak: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:26 jdrewniak@deploy2002: Started scap: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]]
22:23 ladsgroup@deploy2002: Finished scap: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) (duration: 10m 42s)
22:15 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
22:14 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:12 ladsgroup@deploy2002: Started scap: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237)
22:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
22:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
18:34 claime: Raised replicas for mw-web
18:32 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:48 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:47 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
17:47 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:45 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:45 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
17:43 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
17:43 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
17:42 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
17:04 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:982110 Bumping portals to master (T128546) (duration: 08m 15s)
17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2004.codfw.wmnet with OS bullseye
17:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:56 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:982110 Bumping portals to master (T128546) (duration: 10m 12s)
16:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
16:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
16:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
16:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
16:27 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
16:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bullseye
16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
16:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2005.codfw.wmnet with OS bullseye
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
16:19 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:968344Enable canary events for all MediaWiki event streams (T266798) (duration: 08m 25s)
16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
16:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
16:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
16:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:13 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 effectively enabling IPIP encapsulation on ncredir@eqiad - T351069
16:10 ottomata: enabling canary events for all mediawiki state change event streams - T266798
16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
16:02 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
16:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
16:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:00 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:59 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:58 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:57 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:57 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:56 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:55 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:55 claime: homer lsw1-*eqiad* commit "Put kubernetes10[59-62] in production - T353135"
15:55 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
15:55 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:54 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:53 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:53 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
15:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:48 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2006.codfw.wmnet with OS bullseye
15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1143.eqiad.wmnet
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
15:32 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:30 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:25 brouberol: provisioning TLS certificates for the spark-history and spark-history-test namespaces in dse-k8s-eqiad - T352639
15:25 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1143.eqiad.wmnet
15:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1142.eqiad.wmnet
15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:20 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:18 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1142.eqiad.wmnet
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1141.eqiad.wmnet
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:01 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
14:57 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
14:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
14:56 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
14:53 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
14:53 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1141 42 and 43', diff saved to https://phabricator.wikimedia.org/P54330 and previous config saved to /var/cache/conftool/dbconfig/20231211-145300-arnaudb.json
14:52 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
14:52 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
14:51 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1141.eqiad.wmnet
14:51 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
14:51 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:50 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
14:50 otto@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:49 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
14:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:48 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
14:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:45 ottomata: deploying changeprop to pick up https://phabricator.wikimedia.org/T351247
14:37 TheresNoTime: close UTC afternoon backport window
14:25 samtar@deploy2002: Finished scap: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981) (duration: 10m 35s)
14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
14:17 samtar@deploy2002: samtar and anzx: Continuing with sync
14:16 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
14:15 samtar@deploy2002: samtar and anzx: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:14 samtar@deploy2002: Started scap: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981)
14:11 samtar@deploy2002: Finished scap: Backport for gerrit:979986Enable read new on group0 wikis (T341829) (duration: 07m 57s)
14:05 samtar@deploy2002: samtar and dreamyjazz: Continuing with sync
14:05 samtar@deploy2002: samtar and dreamyjazz: Backport for gerrit:979986Enable read new on group0 wikis (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 samtar@deploy2002: Started scap: Backport for gerrit:979986Enable read new on group0 wikis (T341829)
13:59 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:58 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:27 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1138.eqiad.wmnet
13:26 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:25 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1138', diff saved to https://phabricator.wikimedia.org/P54328 and previous config saved to /var/cache/conftool/dbconfig/20231211-132250-arnaudb.json
13:20 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1138.eqiad.wmnet
13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
13:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
12:57 claime: Rebuilding production-images for python3-build-bookworm - T352733
12:12 urbanecm@deploy2002: Finished scap: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) (duration: 08m 20s)
12:11 brouberol: Adding spark-history(-test).svc.eqiad.wmnet CNAMEs pointing to k8s-ingress-dse.svc.eqiad.wmnet. - T352639
12:05 urbanecm@deploy2002: urbanecm: Continuing with sync
12:05 urbanecm@deploy2002: urbanecm: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:03 urbanecm@deploy2002: Started scap: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)
11:20 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069
11:18 claime: sudo confctl --object-type discovery select 'name=eqiad,dnsdisc=k8s-ingress-dse' set/pooled=true - T352639
11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:12 brouberol: Add discovery records for the k8s-ingress-dse LVS service - T352639
10:55 dcausse: (properly) restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
10:50 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
10:46 claime: Running puppet on O:lvs::balancer - T352639
10:45 claime: Disabling puppet on O:lvs::balancer - T352639
10:42 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
10:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
10:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
10:37 claime: Repooling dse-k8s-worker nodes - sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=yes - T352639
10:03 jayme: removed cergen certs of all k8s servies from private puppet in commit d36a97a - T300033
09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38753
09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38753
09:55 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
09:55 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1547
09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1547
09:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
09:50 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
09:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
09:44 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
08:43 kostajh: UTC morning deploys done
08:43 kharlan@deploy2002: Finished scap: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) (duration: 22m 02s)
08:40 dcausse: restarted blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
08:36 kharlan@deploy2002: kharlan and d3r1ck01: Continuing with sync
08:22 kharlan@deploy2002: kharlan and d3r1ck01: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:21 kharlan@deploy2002: Started scap: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)
08:16 kharlan@deploy2002: Finished scap: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false (duration: 09m 55s)
08:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
08:09 kharlan@deploy2002: kharlan: Continuing with sync
08:07 kharlan@deploy2002: kharlan: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 kharlan@deploy2002: Started scap: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false
07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
07:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:24 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 T351864
07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 org
06:44 marostegui@deploy2002: Finished scap: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1" (duration: 08m 22s)
06:37 marostegui@deploy2002: marostegui: Continuing with sync
06:37 marostegui@deploy2002: marostegui: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:35 marostegui@deploy2002: Started scap: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1"
06:35 _joe_: update sirenbot to 0.3.7
06:34 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bookworm
06:29 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:16 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
06:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
05:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bookworm
05:54 marostegui@deploy2002: Finished scap: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787) (duration: 16m 54s)
05:47 marostegui@deploy2002: marostegui: Continuing with sync
05:46 marostegui@deploy2002: marostegui: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:37 marostegui@deploy2002: Started scap: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787)
05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787

2023-12-09

15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
15:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
01:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
00:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
00:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
00:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
00:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-08

23:49 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
23:47 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bullseye
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
23:04 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:03 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
22:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
22:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
22:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bullseye
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
21:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
20:02 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:02 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:49 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:49 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
16:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
16:08 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:50 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) (duration: 00m 52s)
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:49 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@049cf03]: (no justification provided)
15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
15:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
14:44 XioNoX: drain eqiad-codfw lumen transport for maintenance - T342502
14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
12:42 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:40 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54322 and previous config saved to /var/cache/conftool/dbconfig/20231208-101337-arnaudb.json
09:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54321 and previous config saved to /var/cache/conftool/dbconfig/20231208-095830-arnaudb.json
09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54320 and previous config saved to /var/cache/conftool/dbconfig/20231208-094324-arnaudb.json
09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:41 brouberol: Creating the echoserver namespace in dse-k8s-eqiad - T353004
09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54319 and previous config saved to /var/cache/conftool/dbconfig/20231208-092817-arnaudb.json
09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54318 and previous config saved to /var/cache/conftool/dbconfig/20231208-091628-arnaudb.json
09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
09:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 237
07:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 237
06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54317 and previous config saved to /var/cache/conftool/dbconfig/20231208-062636-ladsgroup.json
06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54316 and previous config saved to /var/cache/conftool/dbconfig/20231208-061130-ladsgroup.json
05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54315 and previous config saved to /var/cache/conftool/dbconfig/20231208-055623-ladsgroup.json
05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54314 and previous config saved to /var/cache/conftool/dbconfig/20231208-054116-ladsgroup.json
05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54313 and previous config saved to /var/cache/conftool/dbconfig/20231208-050624-ladsgroup.json
05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54312 and previous config saved to /var/cache/conftool/dbconfig/20231208-041826-ladsgroup.json
04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54311 and previous config saved to /var/cache/conftool/dbconfig/20231208-040319-ladsgroup.json
03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54310 and previous config saved to /var/cache/conftool/dbconfig/20231208-034813-ladsgroup.json
03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54309 and previous config saved to /var/cache/conftool/dbconfig/20231208-033306-ladsgroup.json
03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54308 and previous config saved to /var/cache/conftool/dbconfig/20231208-030005-ladsgroup.json
03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54307 and previous config saved to /var/cache/conftool/dbconfig/20231208-025942-ladsgroup.json
02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54306 and previous config saved to /var/cache/conftool/dbconfig/20231208-024435-ladsgroup.json
02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54305 and previous config saved to /var/cache/conftool/dbconfig/20231208-022929-ladsgroup.json
02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
02:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
02:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
02:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2004']
02:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2004']
02:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
02:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54304 and previous config saved to /var/cache/conftool/dbconfig/20231208-021422-ladsgroup.json
02:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54303 and previous config saved to /var/cache/conftool/dbconfig/20231208-012115-ladsgroup.json
01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54302 and previous config saved to /var/cache/conftool/dbconfig/20231208-012051-ladsgroup.json
01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54301 and previous config saved to /var/cache/conftool/dbconfig/20231208-010545-ladsgroup.json
00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54300 and previous config saved to /var/cache/conftool/dbconfig/20231208-005038-ladsgroup.json
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bullseye
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bullseye
00:43 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bullseye
00:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bullseye
00:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54299 and previous config saved to /var/cache/conftool/dbconfig/20231208-003532-ladsgroup.json
00:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
00:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
00:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
00:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bullseye

2023-12-07

23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54298 and previous config saved to /var/cache/conftool/dbconfig/20231207-235333-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54297 and previous config saved to /var/cache/conftool/dbconfig/20231207-235310-ladsgroup.json
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:47 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54296 and previous config saved to /var/cache/conftool/dbconfig/20231207-233802-ladsgroup.json
23:23 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:23 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54295 and previous config saved to /var/cache/conftool/dbconfig/20231207-232256-ladsgroup.json
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:17 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:15 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54294 and previous config saved to /var/cache/conftool/dbconfig/20231207-230749-ladsgroup.json
23:05 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
22:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
22:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
22:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
22:31 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
22:29 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54293 and previous config saved to /var/cache/conftool/dbconfig/20231207-222656-ladsgroup.json
22:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
22:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54292 and previous config saved to /var/cache/conftool/dbconfig/20231207-222633-ladsgroup.json
22:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
22:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54291 and previous config saved to /var/cache/conftool/dbconfig/20231207-221127-ladsgroup.json
22:10 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54290 and previous config saved to /var/cache/conftool/dbconfig/20231207-215620-ladsgroup.json
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54289 and previous config saved to /var/cache/conftool/dbconfig/20231207-214114-ladsgroup.json
21:38 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@049cf03]: (no justification provided) (duration: 00m 28s)
21:37 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@049cf03]: (no justification provided)
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1082.eqiad.wmnet with OS bullseye
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:23 jdrewniak@deploy2002: Finished scap: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873) (duration: 09m 54s)
21:17 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Continuing with sync
21:15 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 jdrewniak@deploy2002: Started scap: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873)
21:06 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:977075Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventStreamConfig (T329718) (duration: 09m 16s)
21:02 dcausse: restarting blazegraph on wdqs2017 (BlazegraphFreeAllocatorsDecreasingRapidly)
20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54288 and previous config saved to /var/cache/conftool/dbconfig/20231207-205817-ladsgroup.json
20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54287 and previous config saved to /var/cache/conftool/dbconfig/20231207-205753-ladsgroup.json
20:56 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Config: gerrit:977075Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventLogging config (T329718) (duration: 07m 07s)
20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54286 and previous config saved to /var/cache/conftool/dbconfig/20231207-204247-ladsgroup.json
20:30 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54285 and previous config saved to /var/cache/conftool/dbconfig/20231207-202740-ladsgroup.json
20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54283 and previous config saved to /var/cache/conftool/dbconfig/20231207-201234-ladsgroup.json
20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
20:05 urandom: bootstrap Cassandra/restbase2030-a — T352468
20:02 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:59 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:59 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
19:38 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54282 and previous config saved to /var/cache/conftool/dbconfig/20231207-192949-ladsgroup.json
19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54281 and previous config saved to /var/cache/conftool/dbconfig/20231207-192926-ladsgroup.json
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54280 and previous config saved to /var/cache/conftool/dbconfig/20231207-191420-ladsgroup.json
18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54279 and previous config saved to /var/cache/conftool/dbconfig/20231207-185913-ladsgroup.json
18:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54278 and previous config saved to /var/cache/conftool/dbconfig/20231207-184406-ladsgroup.json
18:42 mutante: puppetmaster1001 - revoke cert for miscweb.discovery.wmnet
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54277 and previous config saved to /var/cache/conftool/dbconfig/20231207-180427-ladsgroup.json
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1024.eqiad.wmnet
17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1024.eqiad.wmnet
17:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
17:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
17:39 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
17:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
17:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
17:09 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
17:08 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
17:05 herron@cumin1001: START - Cookbook sre.dns.netbox
16:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
16:38 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
16:27 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:26 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:02 sukhe: run dummy authdns-update on dns6001
16:00 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop (duration: 00m 07s)
16:00 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop
15:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54274 and previous config saved to /var/cache/conftool/dbconfig/20231207-155712-arnaudb.json
15:55 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178]: hotfix: sqoop (duration: 10m 08s)
15:53 sukhe: running authdns-update with broken resolv.conf on dns6001
15:48 sukhe: clear out dns6001 resolv.conf to check for SSH config-based authdns-update
15:45 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178]: hotfix: sqoop
15:45 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54273 and previous config saved to /var/cache/conftool/dbconfig/20231207-154205-arnaudb.json
15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
15:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
15:28 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
15:28 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
15:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54272 and previous config saved to /var/cache/conftool/dbconfig/20231207-152659-arnaudb.json
15:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
15:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54271 and previous config saved to /var/cache/conftool/dbconfig/20231207-151152-arnaudb.json
15:08 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:08 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54270 and previous config saved to /var/cache/conftool/dbconfig/20231207-150750-arnaudb.json
15:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:07 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:07 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:06 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:06 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
15:03 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
15:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
15:01 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
15:00 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
14:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
14:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
14:32 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:31 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:30 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:29 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:49 ladsgroup@deploy2002: Finished scap: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237) (duration: 10m 55s)
13:42 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
13:40 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 ladsgroup@deploy2002: Started scap: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237)
13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:32 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:32 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:31 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:31 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:25 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
13:25 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:25 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: sync
13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:10 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:08 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:07 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:07 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:52 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:48 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
12:18 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:18 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:16 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1001.eqiad.wmnet
11:51 btullis@deploy2002: Finished deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17] (duration: 03m 17s)
11:48 btullis@deploy2002: Started deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17]
11:33 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:33 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:17 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
11:17 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
11:14 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:14 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:13 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:13 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
11:12 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
11:10 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:10 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
11:01 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
10:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
10:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cluster::management
10:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:38 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:34 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:34 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:33 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:33 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:27 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
10:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
09:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
09:41 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
09:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
09:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
08:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1119.eqiad.wmnet
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:52 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:44 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1119.eqiad.wmnet
06:35 marostegui: Failover m5-master from dbproxy1021 to dbproxy1027 T351864
00:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1081.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1080.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"

2023-12-06

23:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1082.eqiad.wmnet with OS bullseye
23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
23:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
23:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
23:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
23:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1081.eqiad.wmnet with OS bullseye
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1080.eqiad.wmnet with OS bullseye
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1080']
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1082']
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1081']
22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
22:43 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1081']
22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
22:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1080']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1082']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
21:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:43 urbanecm@deploy2002: Finished scap: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339) (duration: 10m 51s)
21:36 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
21:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:34 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:33 urbanecm@deploy2002: Started scap: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339)
21:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:31 urbanecm@deploy2002: Finished scap: Backport for gerrit:980920DiscussionTools: Rename config (duration: 10m 01s)
21:25 urbanecm@deploy2002: esanders and urbanecm: Continuing with sync
21:22 urbanecm@deploy2002: esanders and urbanecm: Backport for gerrit:980920DiscussionTools: Rename config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:21 urbanecm@deploy2002: Started scap: Backport for gerrit:980920DiscussionTools: Rename config
21:20 urbanecm@deploy2002: Finished scap: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232) (duration: 10m 43s)
21:13 urbanecm@deploy2002: urbanecm and esanders: Continuing with sync
21:11 urbanecm@deploy2002: urbanecm and esanders: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 urbanecm@deploy2002: Started scap: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232)
20:55 ejegg: fundraising civicrm upgraded from 6ca683b2 to 8c107215
19:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wdqs1024.eqiad.wmnet
18:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
18:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
18:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
18:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
17:47 ejegg: standalone SmashPig upgraded from 83d509ed to fc74ccca
17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
17:34 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
17:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
17:06 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
17:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
16:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
16:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
16:29 urandom: bootstrapping Cassandra/restbase2020-a — T352468
16:07 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job (duration: 01m 28s)
16:05 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job
15:56 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
15:56 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
15:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:46 urandom: restarting Cassandra on aqs2001-{a,b,c} (testing puppet 7 migration)
15:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: sessionstore
15:39 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
15:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
15:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: sessionstore
15:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
15:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
15:30 jforrester@deploy2002: Finished scap: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages (duration: 11m 33s)
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
15:28 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
15:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: restbase::production
15:23 sukhe: depool cp4037 for reimage testing: T350179
15:23 jforrester@deploy2002: jforrester: Continuing with sync
15:21 jforrester@deploy2002: jforrester: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
15:19 jforrester@deploy2002: Started scap: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages
15:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
15:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: restbase::production
15:02 moritzm: installing mariadb bugfix updates from Bookworm point release (as packaged in Debian, unrelated to wmf-mariadb packages)
14:43 moritzm: installing debian-archive-keyring updates from Bookworm point release
14:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dnsbox
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dnsbox
14:21 fabfur: repooling cp4052 after reimage (bookworm -> bullseye) due to possible impacting T352744
13:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4052.ulsfo.wmnet
13:45 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
13:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4052.ulsfo.wmnet
13:12 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4] (duration: 00m 11s)
13:12 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4]
13:08 moritzm: installing systemd bugfix updates from Bookworm point release
12:52 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
12:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4041.ulsfo.wmnet
12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4041.ulsfo.wmnet
12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
12:33 moritzm: installing pam bugfix updates from Bookworm point release
12:30 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
12:15 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
11:48 hnowlan: rollback changeprop-jobqueue
11:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::analytics::worker
11:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::analytics::worker
11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4044.ulsfo.wmnet
11:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4044.ulsfo.wmnet
10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4050.ulsfo.wmnet
10:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4050.ulsfo.wmnet
10:26 moritzm: installing gtk+3.0 bug fix updates from Bookworm point release
08:49 godog: test rsyslog version from bullseye-backports on centrallog - T351710
08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54264 and previous config saved to /var/cache/conftool/dbconfig/20231206-084928-arnaudb.json
08:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54263 and previous config saved to /var/cache/conftool/dbconfig/20231206-083422-arnaudb.json
08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54262 and previous config saved to /var/cache/conftool/dbconfig/20231206-081915-arnaudb.json
08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4047.ulsfo.wmnet
08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54261 and previous config saved to /var/cache/conftool/dbconfig/20231206-080409-arnaudb.json
07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4047.ulsfo.wmnet
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54260 and previous config saved to /var/cache/conftool/dbconfig/20231206-075333-arnaudb.json
07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54259 and previous config saved to /var/cache/conftool/dbconfig/20231206-075309-arnaudb.json
07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54258 and previous config saved to /var/cache/conftool/dbconfig/20231206-073803-arnaudb.json
07:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54257 and previous config saved to /var/cache/conftool/dbconfig/20231206-072256-arnaudb.json
07:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54256 and previous config saved to /var/cache/conftool/dbconfig/20231206-070749-arnaudb.json
06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54255 and previous config saved to /var/cache/conftool/dbconfig/20231206-062922-arnaudb.json
06:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54254 and previous config saved to /var/cache/conftool/dbconfig/20231206-062859-arnaudb.json
06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54252 and previous config saved to /var/cache/conftool/dbconfig/20231206-061352-arnaudb.json
05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54251 and previous config saved to /var/cache/conftool/dbconfig/20231206-055846-arnaudb.json
05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54250 and previous config saved to /var/cache/conftool/dbconfig/20231206-054339-arnaudb.json
05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54249 and previous config saved to /var/cache/conftool/dbconfig/20231206-053321-arnaudb.json
05:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54248 and previous config saved to /var/cache/conftool/dbconfig/20231206-053256-arnaudb.json
05:19 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade T351616 (duration: 00m 09s)
05:19 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade T351616
05:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54247 and previous config saved to /var/cache/conftool/dbconfig/20231206-051750-arnaudb.json
05:09 ejegg: fundraising civicrm upgraded from 6bb8a67f to 6ca683b2
05:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54246 and previous config saved to /var/cache/conftool/dbconfig/20231206-050243-arnaudb.json
04:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54245 and previous config saved to /var/cache/conftool/dbconfig/20231206-044737-arnaudb.json
04:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54244 and previous config saved to /var/cache/conftool/dbconfig/20231206-043718-arnaudb.json
04:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54243 and previous config saved to /var/cache/conftool/dbconfig/20231206-043638-arnaudb.json
04:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54242 and previous config saved to /var/cache/conftool/dbconfig/20231206-042132-arnaudb.json
04:14 ejegg: standalone (payments listener) SmashPig upgraded from f24afba3 to 83d509ed
04:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54241 and previous config saved to /var/cache/conftool/dbconfig/20231206-040625-arnaudb.json
03:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54240 and previous config saved to /var/cache/conftool/dbconfig/20231206-035119-arnaudb.json
03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54239 and previous config saved to /var/cache/conftool/dbconfig/20231206-034045-arnaudb.json
03:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54238 and previous config saved to /var/cache/conftool/dbconfig/20231206-034022-arnaudb.json
03:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54237 and previous config saved to /var/cache/conftool/dbconfig/20231206-032516-arnaudb.json
03:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54236 and previous config saved to /var/cache/conftool/dbconfig/20231206-031009-arnaudb.json
02:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54235 and previous config saved to /var/cache/conftool/dbconfig/20231206-025503-arnaudb.json
02:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54234 and previous config saved to /var/cache/conftool/dbconfig/20231206-024108-arnaudb.json
02:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54233 and previous config saved to /var/cache/conftool/dbconfig/20231206-024045-arnaudb.json
02:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54232 and previous config saved to /var/cache/conftool/dbconfig/20231206-022538-arnaudb.json
02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54231 and previous config saved to /var/cache/conftool/dbconfig/20231206-021031-arnaudb.json
02:08 eileen: civicrm upgraded from 7fb98ee8 to 6bb8a67f
02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54230 and previous config saved to /var/cache/conftool/dbconfig/20231206-015519-arnaudb.json
01:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:51 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54229 and previous config saved to /var/cache/conftool/dbconfig/20231206-014506-arnaudb.json
01:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
01:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
01:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54228 and previous config saved to /var/cache/conftool/dbconfig/20231206-014443-arnaudb.json
01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54227 and previous config saved to /var/cache/conftool/dbconfig/20231206-012936-arnaudb.json
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:21 eileen: civicrm upgraded from d8238788 to 7fb98ee8
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54226 and previous config saved to /var/cache/conftool/dbconfig/20231206-011430-arnaudb.json
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54225 and previous config saved to /var/cache/conftool/dbconfig/20231206-005923-arnaudb.json
00:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54224 and previous config saved to /var/cache/conftool/dbconfig/20231206-004820-arnaudb.json
00:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
00:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
00:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54223 and previous config saved to /var/cache/conftool/dbconfig/20231206-004756-arnaudb.json
00:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54222 and previous config saved to /var/cache/conftool/dbconfig/20231206-003249-arnaudb.json
00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54221 and previous config saved to /var/cache/conftool/dbconfig/20231206-001742-arnaudb.json
00:17 ejegg: civicrm upgraded from 297a091d to d8238788
00:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54220 and previous config saved to /var/cache/conftool/dbconfig/20231206-000236-arnaudb.json

2023-12-05

23:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54219 and previous config saved to /var/cache/conftool/dbconfig/20231205-235213-arnaudb.json
23:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54218 and previous config saved to /var/cache/conftool/dbconfig/20231205-234425-arnaudb.json
23:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54217 and previous config saved to /var/cache/conftool/dbconfig/20231205-232918-arnaudb.json
23:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54216 and previous config saved to /var/cache/conftool/dbconfig/20231205-231412-arnaudb.json
22:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54215 and previous config saved to /var/cache/conftool/dbconfig/20231205-225905-arnaudb.json
22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54214 and previous config saved to /var/cache/conftool/dbconfig/20231205-224838-arnaudb.json
22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54213 and previous config saved to /var/cache/conftool/dbconfig/20231205-224816-arnaudb.json
22:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54212 and previous config saved to /var/cache/conftool/dbconfig/20231205-223309-arnaudb.json
22:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54211 and previous config saved to /var/cache/conftool/dbconfig/20231205-221803-arnaudb.json
22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54210 and previous config saved to /var/cache/conftool/dbconfig/20231205-220256-arnaudb.json
22:01 jforrester@deploy2002: Finished scap: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298) (duration: 19m 01s)
21:53 jforrester@deploy2002: ksarabia and jforrester and cjming: Continuing with sync
21:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54209 and previous config saved to /var/cache/conftool/dbconfig/20231205-215135-arnaudb.json
21:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:43 jforrester@deploy2002: ksarabia and jforrester and cjming: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
21:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
21:42 jforrester@deploy2002: Started scap: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298)
21:40 jforrester@deploy2002: Finished scap: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) (duration: 12m 50s)
21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
21:34 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Continuing with sync
21:30 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:27 jforrester@deploy2002: Started scap: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464)
21:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
21:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54208 and previous config saved to /var/cache/conftool/dbconfig/20231205-212707-arnaudb.json
21:27 jforrester@deploy2002: Finished scap: Backport for gerrit:980028Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339) (duration: 13m 44s)
21:19 jforrester@deploy2002: bwang and jforrester: Continuing with sync
21:13 jforrester@deploy2002: Started scap: Backport for gerrit:980028Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339)
21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54207 and previous config saved to /var/cache/conftool/dbconfig/20231205-211200-arnaudb.json
21:11 jforrester@deploy2002: Finished scap: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719) (duration: 08m 45s)
21:04 jforrester@deploy2002: tgr and jforrester: Continuing with sync
21:04 jforrester@deploy2002: tgr and jforrester: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:02 jforrester@deploy2002: Started scap: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719)
20:58 inflatador: bking@prometheus1006 disable puppet for troubleshooting T347355
20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54206 and previous config saved to /var/cache/conftool/dbconfig/20231205-205654-arnaudb.json
20:53 inflatador: bking@prometheus1006 reload prometheus-blackbox service T347355
20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54205 and previous config saved to /var/cache/conftool/dbconfig/20231205-204147-arnaudb.json
20:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54204 and previous config saved to /var/cache/conftool/dbconfig/20231205-203158-arnaudb.json
20:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54203 and previous config saved to /var/cache/conftool/dbconfig/20231205-203136-arnaudb.json
20:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54202 and previous config saved to /var/cache/conftool/dbconfig/20231205-201629-arnaudb.json
20:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54201 and previous config saved to /var/cache/conftool/dbconfig/20231205-200123-arnaudb.json
19:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54200 and previous config saved to /var/cache/conftool/dbconfig/20231205-194616-arnaudb.json
19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54199 and previous config saved to /var/cache/conftool/dbconfig/20231205-193627-arnaudb.json
19:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
19:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54198 and previous config saved to /var/cache/conftool/dbconfig/20231205-193604-arnaudb.json
19:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54197 and previous config saved to /var/cache/conftool/dbconfig/20231205-192057-arnaudb.json
19:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54196 and previous config saved to /var/cache/conftool/dbconfig/20231205-190551-arnaudb.json
18:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54195 and previous config saved to /var/cache/conftool/dbconfig/20231205-185044-arnaudb.json
18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54194 and previous config saved to /var/cache/conftool/dbconfig/20231205-184108-arnaudb.json
18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
18:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
18:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54193 and previous config saved to /var/cache/conftool/dbconfig/20231205-184045-arnaudb.json
18:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54192 and previous config saved to /var/cache/conftool/dbconfig/20231205-182539-arnaudb.json
18:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
18:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54191 and previous config saved to /var/cache/conftool/dbconfig/20231205-181032-arnaudb.json
17:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54190 and previous config saved to /var/cache/conftool/dbconfig/20231205-175526-arnaudb.json
17:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
17:46 vgutierrez: rolling restart of text|secondary LVS on drmrs effectively enabling IPIP encapsulation for ncredir@drmrs- T351069
17:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
17:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['testhost2001']
17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
17:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
17:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
16:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54189 and previous config saved to /var/cache/conftool/dbconfig/20231205-165503-arnaudb.json
16:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54188 and previous config saved to /var/cache/conftool/dbconfig/20231205-165439-arnaudb.json
16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
16:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
16:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
16:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54187 and previous config saved to /var/cache/conftool/dbconfig/20231205-163933-arnaudb.json
16:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
16:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54186 and previous config saved to /var/cache/conftool/dbconfig/20231205-162426-arnaudb.json
16:24 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1019 - T352639
16:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:18 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1020 - T352639
16:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:14 samtar@deploy2002: Finished scap: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951) (duration: 07m 53s)
16:11 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
16:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
16:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
16:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54185 and previous config saved to /var/cache/conftool/dbconfig/20231205-160920-arnaudb.json
16:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
16:08 samtar@deploy2002: samtar: Continuing with sync
16:08 samtar@deploy2002: samtar: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 samtar@deploy2002: Started scap: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951)
16:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
15:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
15:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54184 and previous config saved to /var/cache/conftool/dbconfig/20231205-155858-arnaudb.json
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54183 and previous config saved to /var/cache/conftool/dbconfig/20231205-155814-arnaudb.json
15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:56 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
15:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4040.ulsfo.wmnet
15:49 claime: sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=inactive - T352639
15:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4040.ulsfo.wmnet
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54182 and previous config saved to /var/cache/conftool/dbconfig/20231205-154308-arnaudb.json
15:42 moritzm: installing monitoring-plugins bugfix updates from Bookworm point release
15:42 claime: Manually restarting pybal on lvs1020 - T352639
15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1471.eqiad.wmnet with OS bullseye
15:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2005']
15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2005']
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2006']
15:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2006']
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54181 and previous config saved to /var/cache/conftool/dbconfig/20231205-152801-arnaudb.json
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aqs2001.codfw.wmnet
15:22 claime: Manually restarting pybal on lvs1019 - T352639
15:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:16 claime: Manually restarting pybal on lvs1020 - T352639
15:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host aqs2001.codfw.wmnet
15:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:13 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54180 and previous config saved to /var/cache/conftool/dbconfig/20231205-151255-arnaudb.json
15:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=1) rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
15:11 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:10 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
15:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:06 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:06 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4043.ulsfo.wmnet
15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54179 and previous config saved to /var/cache/conftool/dbconfig/20231205-150243-arnaudb.json
15:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54178 and previous config saved to /var/cache/conftool/dbconfig/20231205-150220-arnaudb.json
15:01 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
14:58 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
14:58 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1471.eqiad.wmnet with OS bullseye
14:57 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
14:57 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
14:57 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
14:54 brouberol: adding k8s-ingress-dse backend to LVS - T352639
14:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4043.ulsfo.wmnet
14:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54177 and previous config saved to /var/cache/conftool/dbconfig/20231205-144714-arnaudb.json
14:45 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
14:45 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
14:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
14:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:41 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:40 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::master
14:38 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
14:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:32 urbanecm@deploy2002: Finished scap: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234) (duration: 08m 55s)
14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54176 and previous config saved to /var/cache/conftool/dbconfig/20231205-143207-arnaudb.json
14:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::master
14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
14:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:26 urbanecm@deploy2002: kharlan and urbanecm: Continuing with sync
14:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:25 urbanecm@deploy2002: kharlan and urbanecm: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:23 urbanecm@deploy2002: Started scap: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234)
14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54175 and previous config saved to /var/cache/conftool/dbconfig/20231205-141701-arnaudb.json
14:13 urbanecm@deploy2002: Finished scap: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266) (duration: 09m 33s)
14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54174 and previous config saved to /var/cache/conftool/dbconfig/20231205-140742-arnaudb.json
14:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
14:07 urbanecm@deploy2002: urbanecm: Continuing with sync
14:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54173 and previous config saved to /var/cache/conftool/dbconfig/20231205-140720-arnaudb.json
14:06 urbanecm@deploy2002: urbanecm: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
14:05 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
14:04 urbanecm@deploy2002: Started scap: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266)
14:03 moritzm: installing cups security updates
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54172 and previous config saved to /var/cache/conftool/dbconfig/20231205-135213-arnaudb.json
13:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4048.ulsfo.wmnet
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1078.eqiad.wmnet with OS bullseye
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1079.eqiad.wmnet with OS bullseye
13:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1470.eqiad.wmnet with OS bullseye
13:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:43 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1465.eqiad.wmnet with OS bullseye
13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4048.ulsfo.wmnet
13:38 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1464.eqiad.wmnet with OS bullseye
13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54171 and previous config saved to /var/cache/conftool/dbconfig/20231205-133706-arnaudb.json
13:30 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1076.eqiad.wmnet with OS bullseye
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:26 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
13:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:24 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
13:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
13:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
13:23 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54169 and previous config saved to /var/cache/conftool/dbconfig/20231205-132200-arnaudb.json
13:21 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
13:21 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
13:18 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::slave
13:14 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1470.eqiad.wmnet with OS bullseye
13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54168 and previous config saved to /var/cache/conftool/dbconfig/20231205-131240-arnaudb.json
13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
13:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
13:08 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1465.eqiad.wmnet with OS bullseye
13:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
13:06 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS bullseye
13:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1464.eqiad.wmnet with OS bullseye
13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
13:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
13:04 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:04 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
13:03 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:02 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1463.eqiad.wmnet with OS bullseye
12:59 cmooney@cumin2002: START - Cookbook sre.dns.netbox
12:58 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS bullseye
12:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::slave
12:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54167 and previous config saved to /var/cache/conftool/dbconfig/20231205-125641-arnaudb.json
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4042.ulsfo.wmnet
12:50 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS bullseye
12:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
12:47 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
12:47 ladsgroup@deploy2002: Finished scap: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237) (duration: 12m 30s)
12:45 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:45 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS bullseye
12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:44 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
12:42 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54165 and previous config saved to /var/cache/conftool/dbconfig/20231205-124134-arnaudb.json
12:41 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
12:40 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:39 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
12:37 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:36 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
12:34 ladsgroup@deploy2002: Started scap: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237)
12:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4042.ulsfo.wmnet
12:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
12:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4051.ulsfo.wmnet
12:28 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1463.eqiad.wmnet with OS bullseye
12:28 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
12:27 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:26 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54164 and previous config saved to /var/cache/conftool/dbconfig/20231205-122628-arnaudb.json
12:26 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:25 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS bullseye
12:24 moritzm: installing unbound bugfix updates from Bookworm point release
12:23 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
12:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4051.ulsfo.wmnet
12:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4039.ulsfo.wmnet
12:18 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS bullseye
12:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54163 and previous config saved to /var/cache/conftool/dbconfig/20231205-121121-arnaudb.json
12:10 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS bullseye
12:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS bullseye
12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4039.ulsfo.wmnet
12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54162 and previous config saved to /var/cache/conftool/dbconfig/20231205-120206-arnaudb.json
12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54161 and previous config saved to /var/cache/conftool/dbconfig/20231205-120145-arnaudb.json
12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4049.ulsfo.wmnet
11:53 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
11:52 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
11:51 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:51 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4049.ulsfo.wmnet
11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54160 and previous config saved to /var/cache/conftool/dbconfig/20231205-114638-arnaudb.json
11:40 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:38 ladsgroup@deploy2002: Finished scap: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604) (duration: 07m 47s)
11:33 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:32 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
11:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:32 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54159 and previous config saved to /var/cache/conftool/dbconfig/20231205-113132-arnaudb.json
11:30 ladsgroup@deploy2002: Started scap: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604)
11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bookworm
11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54158 and previous config saved to /var/cache/conftool/dbconfig/20231205-111625-arnaudb.json
11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
11:08 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:08 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
11:07 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:07 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54157 and previous config saved to /var/cache/conftool/dbconfig/20231205-110448-arnaudb.json
11:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54156 and previous config saved to /var/cache/conftool/dbconfig/20231205-110426-arnaudb.json
11:02 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1002.eqiad.wmnet with OS bookworm
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bookworm
10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54155 and previous config saved to /var/cache/conftool/dbconfig/20231205-104919-arnaudb.json
10:45 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54154 and previous config saved to /var/cache/conftool/dbconfig/20231205-103413-arnaudb.json
10:21 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
10:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
10:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54153 and previous config saved to /var/cache/conftool/dbconfig/20231205-101906-arnaudb.json
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54152 and previous config saved to /var/cache/conftool/dbconfig/20231205-100744-arnaudb.json
10:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54151 and previous config saved to /var/cache/conftool/dbconfig/20231205-100722-arnaudb.json
10:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15305
10:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
10:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15305
09:57 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63927
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54150 and previous config saved to /var/cache/conftool/dbconfig/20231205-095215-arnaudb.json
09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63927
09:42 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
09:37 brouberol: running authdns-update on dns1004.wikimedia.org - T352639
09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54149 and previous config saved to /var/cache/conftool/dbconfig/20231205-093709-arnaudb.json
09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54148 and previous config saved to /var/cache/conftool/dbconfig/20231205-092202-arnaudb.json
09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54147 and previous config saved to /var/cache/conftool/dbconfig/20231205-091232-arnaudb.json
09:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58952
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58952
09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
09:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
08:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:26 marostegui: Failover m2-master dbproxy1023.eqiad.wmnet -> dbproxy1025.eqiad.wmnet T351864
06:55 vgutierrez: rolling restart of text|secondary LVS on eqsin effectively enabling IPIP encapsulation for ncredir@eqsin - T351069
06:23 marostegui: Failover m5 from db1119 to db1176 - T352631
06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
01:18 mutante: LDAP - added user xqt to group nda (T348520)
01:12 ejegg: payments-wiki upgraded from 5284fc99 to 1d24dc90
00:06 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet

2023-12-04

23:53 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet
23:52 eevans@cumin1001: START - Cookbook sre.puppet.migrate-host for host restbase2028.codfw.wmnet
22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54146 and previous config saved to /var/cache/conftool/dbconfig/20231204-225336-arnaudb.json
22:53 eileen: civicrm upgraded from 83816165 to 297a091d
22:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54145 and previous config saved to /var/cache/conftool/dbconfig/20231204-223830-arnaudb.json
22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54144 and previous config saved to /var/cache/conftool/dbconfig/20231204-222323-arnaudb.json
22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54142 and previous config saved to /var/cache/conftool/dbconfig/20231204-220817-arnaudb.json
22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54141 and previous config saved to /var/cache/conftool/dbconfig/20231204-220345-arnaudb.json
22:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54140 and previous config saved to /var/cache/conftool/dbconfig/20231204-220322-arnaudb.json
21:58 ebernhardson@deploy2002: Finished scap: Backport for gerrit:979693Always load transcode state from db when opting in to primary db (duration: 08m 37s)
21:52 ebernhardson@deploy2002: ebernhardson and brion: Continuing with sync
21:51 ebernhardson@deploy2002: ebernhardson and brion: Backport for gerrit:979693Always load transcode state from db when opting in to primary db synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 ebernhardson@deploy2002: Started scap: Backport for gerrit:979693Always load transcode state from db when opting in to primary db
21:49 ebernhardson@deploy2002: Finished scap: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335) (duration: 09m 23s)
21:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54138 and previous config saved to /var/cache/conftool/dbconfig/20231204-214816-arnaudb.json
21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:42 ebernhardson@deploy2002: ebernhardson: Continuing with sync
21:41 ebernhardson@deploy2002: ebernhardson: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:39 ebernhardson@deploy2002: Started scap: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335)
21:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54137 and previous config saved to /var/cache/conftool/dbconfig/20231204-213309-arnaudb.json
21:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1077.eqiad.wmnet with OS bullseye
21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
21:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54136 and previous config saved to /var/cache/conftool/dbconfig/20231204-211803-arnaudb.json
21:14 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54135 and previous config saved to /var/cache/conftool/dbconfig/20231204-211305-arnaudb.json
21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54134 and previous config saved to /var/cache/conftool/dbconfig/20231204-211241-arnaudb.json
21:09 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:06 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54133 and previous config saved to /var/cache/conftool/dbconfig/20231204-205735-arnaudb.json
20:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
20:50 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
20:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54132 and previous config saved to /var/cache/conftool/dbconfig/20231204-204228-arnaudb.json
20:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
20:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54131 and previous config saved to /var/cache/conftool/dbconfig/20231204-202722-arnaudb.json
19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
19:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54130 and previous config saved to /var/cache/conftool/dbconfig/20231204-194103-arnaudb.json
19:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54129 and previous config saved to /var/cache/conftool/dbconfig/20231204-194039-arnaudb.json
19:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54128 and previous config saved to /var/cache/conftool/dbconfig/20231204-192532-arnaudb.json
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
19:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
19:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54126 and previous config saved to /var/cache/conftool/dbconfig/20231204-191026-arnaudb.json
19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
19:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
19:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
18:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54125 and previous config saved to /var/cache/conftool/dbconfig/20231204-185519-arnaudb.json
18:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54124 and previous config saved to /var/cache/conftool/dbconfig/20231204-184630-arnaudb.json
18:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54123 and previous config saved to /var/cache/conftool/dbconfig/20231204-184607-arnaudb.json
18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54122 and previous config saved to /var/cache/conftool/dbconfig/20231204-183100-arnaudb.json
18:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54121 and previous config saved to /var/cache/conftool/dbconfig/20231204-181554-arnaudb.json
18:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
18:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54120 and previous config saved to /var/cache/conftool/dbconfig/20231204-180047-arnaudb.json
17:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
17:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54119 and previous config saved to /var/cache/conftool/dbconfig/20231204-175448-arnaudb.json
17:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54118 and previous config saved to /var/cache/conftool/dbconfig/20231204-175426-arnaudb.json
17:41 ladsgroup@deploy2002: Finished scap: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628) (duration: 08m 07s)
17:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54117 and previous config saved to /var/cache/conftool/dbconfig/20231204-173919-arnaudb.json
17:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
17:34 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:33 ladsgroup@deploy2002: Started scap: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628)
17:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54116 and previous config saved to /var/cache/conftool/dbconfig/20231204-172413-arnaudb.json
17:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
17:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54115 and previous config saved to /var/cache/conftool/dbconfig/20231204-170906-arnaudb.json
17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54114 and previous config saved to /var/cache/conftool/dbconfig/20231204-170604-arnaudb.json
17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54113 and previous config saved to /var/cache/conftool/dbconfig/20231204-170525-arnaudb.json
16:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:979990 Bumping portals to master (T128546) (duration: 07m 45s)
16:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54112 and previous config saved to /var/cache/conftool/dbconfig/20231204-165018-arnaudb.json
16:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33604
16:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33604
16:44 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:979990 Bumping portals to master (T128546) (duration: 06m 40s)
16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54111 and previous config saved to /var/cache/conftool/dbconfig/20231204-163511-arnaudb.json
16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54110 and previous config saved to /var/cache/conftool/dbconfig/20231204-162005-arnaudb.json
16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54109 and previous config saved to /var/cache/conftool/dbconfig/20231204-161408-arnaudb.json
16:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54108 and previous config saved to /var/cache/conftool/dbconfig/20231204-161346-arnaudb.json
15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54107 and previous config saved to /var/cache/conftool/dbconfig/20231204-155840-arnaudb.json
15:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54105 and previous config saved to /var/cache/conftool/dbconfig/20231204-154333-arnaudb.json
15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54104 and previous config saved to /var/cache/conftool/dbconfig/20231204-152826-arnaudb.json
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1077']
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1078']
15:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
15:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1077']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1078']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
14:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4046.ulsfo.wmnet
14:51 vgutierrez: upload tcp-mss-clamper 0.4 to apt.wm.o (bookworm)
14:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4046.ulsfo.wmnet
14:46 Lucas_WMDE: UTC afternoon backport+config window done
14:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) (duration: 11m 48s)
14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4038.ulsfo.wmnet
14:43 sukhe: running authdns-update for CR 979976 [revert of T349665]
14:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Continuing with sync
14:37 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4038.ulsfo.wmnet
14:36 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:34 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903)
14:33 sukhe: running authdns-update for T352579
14:32 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829) (duration: 10m 42s)
14:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
14:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54103 and previous config saved to /var/cache/conftool/dbconfig/20231204-142754-arnaudb.json
14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
14:25 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
14:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
14:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
14:22 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829)
14:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54102 and previous config saved to /var/cache/conftool/dbconfig/20231204-141848-arnaudb.json
14:15 jforrester@deploy2002: Finished scap: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) (duration: 07m 41s)
14:10 jforrester@deploy2002: jforrester and terasail: Continuing with sync
14:09 jforrester@deploy2002: jforrester and terasail: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:08 jforrester@deploy2002: Started scap: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495)
14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54101 and previous config saved to /var/cache/conftool/dbconfig/20231204-140341-arnaudb.json
13:59 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
13:59 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
13:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:57 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:56 moritzm: installing postgresql-13 security updates
13:52 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54100 and previous config saved to /var/cache/conftool/dbconfig/20231204-134835-arnaudb.json
13:43 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54099 and previous config saved to /var/cache/conftool/dbconfig/20231204-133328-arnaudb.json
13:30 moritzm: instaling dbus security updates on buster
13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54098 and previous config saved to /var/cache/conftool/dbconfig/20231204-132859-arnaudb.json
13:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54097 and previous config saved to /var/cache/conftool/dbconfig/20231204-132836-arnaudb.json
13:22 moritzm: installing libde265 security updates
13:22 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54096 and previous config saved to /var/cache/conftool/dbconfig/20231204-131329-arnaudb.json
13:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:05 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54095 and previous config saved to /var/cache/conftool/dbconfig/20231204-125823-arnaudb.json
12:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54094 and previous config saved to /var/cache/conftool/dbconfig/20231204-124316-arnaudb.json
12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54093 and previous config saved to /var/cache/conftool/dbconfig/20231204-124037-arnaudb.json
12:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
12:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54092 and previous config saved to /var/cache/conftool/dbconfig/20231204-124015-arnaudb.json
12:35 urbanecm@deploy2002: Finished scap: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) (duration: 09m 43s)
12:29 urbanecm@deploy2002: urbanecm: Continuing with sync
12:28 urbanecm@deploy2002: urbanecm: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-druid1005.eqiad.wmnet
12:25 urbanecm@deploy2002: Started scap: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)
12:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54091 and previous config saved to /var/cache/conftool/dbconfig/20231204-122508-arnaudb.json
12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-druid1005.eqiad.wmnet
12:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bookworm
12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54090 and previous config saved to /var/cache/conftool/dbconfig/20231204-121002-arnaudb.json
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host druid1011.eqiad.wmnet
12:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host druid1011.eqiad.wmnet
11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54089 and previous config saved to /var/cache/conftool/dbconfig/20231204-115455-arnaudb.json
11:54 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS bullseye
11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54088 and previous config saved to /var/cache/conftool/dbconfig/20231204-115217-arnaudb.json
11:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
11:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54087 and previous config saved to /var/cache/conftool/dbconfig/20231204-115154-arnaudb.json
11:51 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1462.eqiad.wmnet with OS bullseye
11:43 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:43 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 44592
11:42 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 44592
11:42 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
11:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:39 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bookworm
11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54086 and previous config saved to /var/cache/conftool/dbconfig/20231204-113648-arnaudb.json
11:36 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
11:33 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
11:32 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
11:30 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54085 and previous config saved to /var/cache/conftool/dbconfig/20231204-112141-arnaudb.json
11:17 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1462.eqiad.wmnet with OS bullseye
11:15 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS bullseye
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: eventschemas::service
11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54084 and previous config saved to /var/cache/conftool/dbconfig/20231204-110635-arnaudb.json
11:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54083 and previous config saved to /var/cache/conftool/dbconfig/20231204-110156-arnaudb.json
11:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54082 and previous config saved to /var/cache/conftool/dbconfig/20231204-110134-arnaudb.json
10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: eventschemas::service
10:51 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:51 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
10:50 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
10:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54081 and previous config saved to /var/cache/conftool/dbconfig/20231204-104628-arnaudb.json
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23856
10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23856
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63927
10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63927
10:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31898
10:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31898
10:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58952
10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58952
10:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 44592
10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 44592
10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4800
10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4800
10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 33604
10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 33604
10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 142505
10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 142505
10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398446
10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398446
10:32 jayme: upgrade istio (buster -> bullseye) on wikikube codfw - T351933
10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15305
10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15305
10:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19165
10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54080 and previous config saved to /var/cache/conftool/dbconfig/20231204-103121-arnaudb.json
10:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19165
10:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 237
10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 237
10:28 jayme: pgrade istio (buster -> bullseye) on wikikube eqiad - T351933
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
10:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bookworm
10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138997
10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54079 and previous config saved to /var/cache/conftool/dbconfig/20231204-101615-arnaudb.json
10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54078 and previous config saved to /var/cache/conftool/dbconfig/20231204-101143-arnaudb.json
10:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54077 and previous config saved to /var/cache/conftool/dbconfig/20231204-101120-arnaudb.json
10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
09:58 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:57 godog: roll-restart prometheus/k8s to apply size-based retention - T351179
09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54076 and previous config saved to /var/cache/conftool/dbconfig/20231204-095614-arnaudb.json
09:49 volans@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54075 and previous config saved to /var/cache/conftool/dbconfig/20231204-094107-arnaudb.json
09:36 elukey: upgrade istio (buster -> bullseye) on ml-serve-codfw - T351933
09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54074 and previous config saved to /var/cache/conftool/dbconfig/20231204-092600-arnaudb.json
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54073 and previous config saved to /var/cache/conftool/dbconfig/20231204-092136-arnaudb.json
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54072 and previous config saved to /var/cache/conftool/dbconfig/20231204-092054-arnaudb.json
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54070 and previous config saved to /var/cache/conftool/dbconfig/20231204-090547-arnaudb.json
08:58 elukey: upgrade istio (buster -> bullseye) on ml-serve-eqiad - T351933
08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54069 and previous config saved to /var/cache/conftool/dbconfig/20231204-085041-arnaudb.json
08:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
08:48 elukey: upgrade istio (buster -> bullseye) on aux-k8s-eqiad - T351933
08:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
08:43 elukey: upgrade istio (buster -> bullseye) on dse-k8s-eqiad - T351933
08:39 urbanecm@deploy2002: Finished scap: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329) (duration: 09m 49s)
08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54068 and previous config saved to /var/cache/conftool/dbconfig/20231204-083534-arnaudb.json
08:33 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
08:31 urbanecm@deploy2002: urbanecm and anzx: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54067 and previous config saved to /var/cache/conftool/dbconfig/20231204-083102-arnaudb.json
08:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:29 urbanecm@deploy2002: Started scap: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329)
08:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
08:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54066 and previous config saved to /var/cache/conftool/dbconfig/20231204-082758-arnaudb.json
08:25 oblivian@deploy2002: Finished scap: Backport for gerrit:979488Add throttle rule for editathon (T352569) (duration: 18m 04s)
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
08:23 _joe_: clearing throttle cache for T352569
08:18 oblivian@deploy2002: oblivian: Continuing with sync
08:17 oblivian@deploy2002: oblivian: Backport for gerrit:979488Add throttle rule for editathon (T352569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54065 and previous config saved to /var/cache/conftool/dbconfig/20231204-081251-arnaudb.json
08:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
08:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
08:07 oblivian@deploy2002: Started scap: Backport for gerrit:979488Add throttle rule for editathon (T352569)
07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54064 and previous config saved to /var/cache/conftool/dbconfig/20231204-075745-arnaudb.json
07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54063 and previous config saved to /var/cache/conftool/dbconfig/20231204-074238-arnaudb.json
07:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54062 and previous config saved to /var/cache/conftool/dbconfig/20231204-073957-arnaudb.json
07:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
07:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bookworm
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
07:07 kart_: Updated MinT to 2023-11-21-115852-production
07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bookworm
06:57 marostegui: Failover m5 from db1176 to db1119 - T332155
06:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:33 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:11 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:08 kart_: Updated cxserver to 2023-12-04-055024-production (T270060, T350773, T352620)
06:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:05 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:03 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:02 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:59 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:58 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
04:43 ryankemper: [WDQS] Clearing `BlazegraphFreeAllocatorsDecreasingRapidly` -> `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
00:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1006.eqiad.wmnet
00:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.eqiad.wmnet

2023-12-02

01:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1078.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1079.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1077.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1076.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
00:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
00:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED

2023-12-01

22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
22:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
22:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 cstone: payments-wiki upgraded from b37ab50e to 5284fc99
19:35 inflatador: bking@wdqs1006 rebooting unresponsive host
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ceph2001.codfw.wmnet with OS bullseye
16:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
16:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
16:26 dancy@deploy2002: Installation of scap version "4.65.0" completed for 537 hosts
16:26 dancy@deploy2002: Installing scap version "4.65.0" for 537 hosts
16:25 dancy@deploy2002: install-world aborted: (duration: 00m 50s)
16:24 dancy@deploy2002: Installing scap version "4.65.0" for 569 hosts
16:24 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1046.eqiad.wmnet
16:10 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1046.eqiad.wmnet
16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
16:00 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
15:58 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:58 akosiaris: give AAAA and PTR records to scandium T271142
15:57 akosiaris: give AAAA and PTR records to all rdb hosts (only 50% had it previously)
15:56 dancy@deploy2002: Installing scap version "4.65.0" for 570 hosts
15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
15:54 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
15:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1009-1010].eqiad.wmnet
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
15:50 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
15:45 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:42 urbanecm: mwmaint2002: mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=frwiki # T352550
15:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:36 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 24s)
15:31 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:28 moritzm: added Kamila to pwstore
15:21 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1009-1010].eqiad.wmnet
15:19 topranks: moving esams CR interconnect to 4x10G breakout cable T347403
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:26 akosiaris: cleanup rdb1009 from all deployment charts
14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:25 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:25 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:20 hashar@deploy2002: Finished deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library (duration: 00m 05s)
14:20 hashar@deploy2002: Started deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library
14:18 hashar@deploy2002: Finished deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom (duration: 00m 06s)
14:18 hashar@deploy2002: Started deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom
14:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
13:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
13:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
13:31 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
13:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
13:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
13:28 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:27 taavi: run prometheus provision-fs on prometheus2* to create file system for cloud instance T350010
13:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
13:13 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flerovium.eqiad.wmnet
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:34 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
12:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts flerovium.eqiad.wmnet
12:17 XioNoX: add BGP custom field to Netbox - T306649
12:07 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
12:03 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
12:03 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
12:02 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
11:49 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
11:30 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
11:30 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
11:29 topranks: Reset card 1/0 in cr1-codfw T350159
11:22 topranks: Disabling BGP peering to AS1299 prior to reset of card 1/0 in cr1-codfw T350159
11:09 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
11:09 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
11:04 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
11:04 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
11:00 topranks: Draining cr1-codfw transport to cr3-eqsin to reset card 1/0 T350159
10:59 topranks: Resetting circuit preference for transports landing on card 1/1 cr1-codfw T350159
10:55 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:49 moritzm: installing wireshark security updates on bookworm
10:37 topranks: Moving VRRP acrtive gateway for codfw row A/B vlans from cr1-codfw to cr2-codfw to reconfigure card 1/1 T350159
10:35 topranks: draining codfw<->eqiad transport link to reconfigure card 1/1 in cr1-codfw T350159
10:34 topranks: draining codfw<->eqdfw transport link to reconfigure card 1/1 in cr1-codfw T350159
10:30 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 12s)
10:08 godog: add 60GB to prometheus/k8s in codfw
09:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
09:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
09:45 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
09:44 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
09:20 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
09:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:59 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bookworm
07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bookworm
06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2135.codfw.wmnet with OS bookworm
06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
05:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2135.codfw.wmnet with OS bookworm
05:37 marostegui: Failover m3 from db1119 to db1159 - T352360
05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2109.codfw.wmnet with OS bookworm
02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2107.codfw.wmnet with OS bookworm
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2108.codfw.wmnet with OS bookworm
02:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2106.codfw.wmnet with OS bookworm
02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2105.codfw.wmnet with OS bookworm
02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
02:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
01:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2003']
01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2001']
01:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
01:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
01:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2109.codfw.wmnet with OS bookworm
01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2108.codfw.wmnet with OS bookworm
01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2104.codfw.wmnet with OS bookworm
01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2107.codfw.wmnet with OS bookworm
01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2003']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
01:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2106.codfw.wmnet with OS bookworm
01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2103.codfw.wmnet with OS bookworm
01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2105.codfw.wmnet with OS bookworm
01:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2102.codfw.wmnet with OS bookworm
01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2100.codfw.wmnet with OS bookworm
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2101.codfw.wmnet with OS bookworm
01:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
01:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
01:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:14 foks: removing 120 files for legal compliance
01:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2103.codfw.wmnet with reason: host reimage
01:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2102.codfw.wmnet with reason: host reimage
01:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
01:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
00:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2104.codfw.wmnet with OS bookworm
00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2103.codfw.wmnet with OS bookworm
00:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2102.codfw.wmnet with OS bookworm
00:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2101.codfw.wmnet with OS bookworm
00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2100.codfw.wmnet with OS bookworm
00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2098.codfw.wmnet with OS bookworm
00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2099.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2097.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2094.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
00:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
00:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1105.eqiad.wmnet with OS bookworm
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
00:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
00:01 krinkle@deploy2002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 06m 37s)
00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s

This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.