< Server Admin Log
Server Admin Log/Archive 74
2023-12-30
- 16:55 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:984627Add eventlogging_MediaWikiPingback stream (T323828) (duration: 15m 10s)
2023-12-29
- 22:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 08:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 08:00 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 08:00 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:12 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:09 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:06 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:06 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:03 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:02 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 00:01 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
2023-12-28
- 23:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:50 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:45 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:35 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:20 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
2023-12-27
- 22:53 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:53 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
2023-12-23
- 20:22 _joe_: downgraded vopsbot on alert1001, hopefully should not keep panicing in this unexpected situation
- 15:40 taavi: fix date-time on mw2448 (which thought it is the year 2098) by manually setting it once and then restarting systemd-timesyncd.service after bios was reset in T353679
- 01:19 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 01:19 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
2023-12-22
- 17:28 krinkle@deploy2002: Synchronized php-1.42.0-wmf.10/includes/skins/Skin.php: Ice6d6c (duration: 06m 25s)
- 15:16 jgiannelos@deploy2002: Finished deploy [restbase/deploy@5f2756a]: (no justification provided) (duration: 17m 36s)
- 14:58 jgiannelos@deploy2002: Started deploy [restbase/deploy@5f2756a]: (no justification provided)
- 14:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f0c9f9f]: (no justification provided) (duration: 09m 32s)
- 14:48 jgiannelos@deploy2002: Started deploy [restbase/deploy@f0c9f9f]: (no justification provided)
- 14:01 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4f56fff]: (no justification provided) (duration: 16m 57s)
- 13:45 reedy@deploy2002: Finished scap: T353920 (duration: 08m 02s)
- 13:44 jgiannelos@deploy2002: Started deploy [restbase/deploy@4f56fff]: (no justification provided)
- 13:37 reedy@deploy2002: Started scap: T353920
- 11:31 vgutierrez: upload golang-github-intel-go-cpuid_0.0~git20210602.5747e5c-2+deb12u1 to apt.wm.o (bookworm)
- 10:42 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:42 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 09:57 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
2023-12-21
- 21:42 wfan: payment-wiki revision 1c96980a -> 3b281d10
- 19:31 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T346919 (duration: 06m 26s)
- 19:14 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.10 refs T350086
- 18:39 mutante: releases1003 - sudo chmod -R g+w /srv/org/wikimedia/releases/mediawiki/1.*
- 17:26 mutante: mirror1001 - when syncing tails mirror - @ERROR: max connections (23) reached -- try again later
- 17:23 mutante: [mirror1001:~] $ sudo systemctl start update-tails-mirror
- 17:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 17:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 16:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:18 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 16:17 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 16:10 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1008.eqiad.wmnet
- 16:10 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:08 volans@cumin1002: START - Cookbook sre.dns.netbox
- 16:03 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1008.eqiad.wmnet
- 15:59 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:54 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1007.eqiad.wmnet
- 15:54 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:53 volans@cumin1002: START - Cookbook sre.dns.netbox
- 15:47 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1007.eqiad.wmnet
- 15:44 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:44 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:38 kharlan@deploy2002: Finished scap: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target (duration: 10m 37s)
- 15:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:32 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
- 15:30 kharlan@deploy2002: kharlan and dreamyjazz: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:28 kharlan@deploy2002: Started scap: Backport for gerrit:984502Use username for lookup for non-existing user as the vague target
- 15:24 kharlan@deploy2002: Finished scap: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target (duration: 11m 38s)
- 15:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:19 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:18 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
- 15:15 kharlan@deploy2002: kharlan and dreamyjazz: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:13 kharlan@deploy2002: Started scap: Backport for gerrit:984503Use username for lookup for non-existing user as the vague target
- 15:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793) (duration: 09m 27s)
- 14:46 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Continuing with sync
- 14:44 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:43 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984501Fix showing units and limits in NewPP limit report (T353793)
- 14:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:31 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:29 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 14:27 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860) (duration: 08m 33s)
- 14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
- 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984500Ignore "exact match" title when the title is not given (T353860)
- 14:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes bdwikimedia --fix # T351903 – 62 pages to fix, 62 were resolvable. 56 links to fix, 54 were resolvable, 2 were deleted.
- 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723) (duration: 09m 28s)
- 14:13 moritzm: re-added Eoghan to pwstore
- 14:09 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
- 14:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 10 hosts with reason: T352878
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 10 hosts with reason: T352878
- 14:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 13 hosts with reason: T352878
- 14:08 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 13 hosts with reason: T352878
- 14:06 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984498uzwikipedia: add a temporary logo for the 20th anniversary (T353723)
- 13:50 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:23 moritzm: installing libde265 security updates
- 12:29 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1006.eqiad.wmnet
- 12:29 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:27 volans@cumin1002: START - Cookbook sre.dns.netbox
- 12:20 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
- 12:18 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
- 12:14 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
- 11:37 claime: Manually restarted cassandra-a service on restbase2028 following OOM - T353456
- 11:23 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
- 11:22 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
- 11:16 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
- 11:13 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
- 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:29 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1006
- 09:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
- 08:59 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:54 apergos: UTC morning backport and config window done
- 08:50 ariel@deploy2002: Finished scap: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 12m 39s)
- 08:44 ariel@deploy2002: ariel and matmarex: Continuing with sync
- 08:39 ariel@deploy2002: ariel and matmarex: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:37 ariel@deploy2002: Started scap: Backport for gerrit:984496CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
- 08:25 ariel@deploy2002: Finished scap: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 11m 07s)
- 08:19 ariel@deploy2002: matmarex and ariel: Continuing with sync
- 08:16 ariel@deploy2002: matmarex and ariel: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:14 ariel@deploy2002: Started scap: Backport for gerrit:984495CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
- 05:56 kart_: Updated MinT to 2023-12-20-071058-production
- 05:50 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 05:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 05:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 05:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 05:29 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:26 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
- 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 00:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
- 00:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
2023-12-20
- 23:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
- 23:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
- 23:24 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host netbox1002
- 23:24 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host netbox1002
- 23:19 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wdqs1006
- 23:19 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
- 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
- 22:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1020-1021].eqiad.wmnet
- 22:59 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wdqs[1020-1021].eqiad.wmnet
- 22:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
- 22:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
- 22:25 ryankemper@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs[1006-1008].eqiad.wmnet
- 22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
- 22:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
- 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2080.codfw.wmnet with OS bullseye
- 22:24 ryankemper@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
- 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2079.codfw.wmnet with OS bullseye
- 22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
- 22:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
- 22:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
- 22:20 ryankemper@cumin1002: START - Cookbook sre.dns.netbox
- 22:18 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 22:18 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2077.codfw.wmnet with OS bullseye
- 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 22:17 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
- 22:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2076.codfw.wmnet with OS bullseye
- 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:16 ryankemper@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs[1006-1008].eqiad.wmnet
- 22:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:13 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 22:12 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 22:09 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 22:08 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 22:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 22:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
- 22:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 22:02 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
- 21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
- 21:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
- 21:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:54 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:53 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
- 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
- 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
- 21:48 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 21:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
- 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
- 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
- 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
- 21:48 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:47 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:47 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:46 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:45 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
- 21:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
- 21:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:40 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:39 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 21:37 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2079.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2077.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2076.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
- 21:30 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.10 refs T350086 (duration: 05m 57s)
- 21:28 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
- 21:26 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
- 21:24 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.10 refs T350086
- 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2074.codfw.wmnet with OS bullseye
- 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:15 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835) (duration: 08m 13s)
- 21:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 21:08 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 21:07 ladsgroup@deploy2002: Started scap: Backport for gerrit:984493Protect against ParserOutput re-namespacing (T353835)
- 21:04 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 21:02 aqu@deploy2002: Finished deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 28s)
- 21:01 aqu@deploy2002: Started deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
- 20:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
- 20:49 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835) (duration: 08m 19s)
- 20:47 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
- 20:47 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
- 20:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 20:42 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:40 ladsgroup@deploy2002: Started scap: Backport for gerrit:984492Protect against ParserOutput re-namespacing (T353835)
- 20:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
- 20:31 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
- 20:30 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
- 19:51 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 19:48 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
- 19:30 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
- 19:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs1022.eqiad.wmnet
- 19:27 dancy@deploy2002: Finished php-fpm-restarts
- 19:24 dancy@deploy2002: Starting php-fpm-restarts
- 19:18 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
- 18:59 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 18:59 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 18:59 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 18:58 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 18:58 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 18:57 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 18:38 krinkle@deploy2002: Finished deploy [integration/docroot@355ddbb]: (no justification provided) (duration: 00m 07s)
- 18:38 krinkle@deploy2002: Started deploy [integration/docroot@355ddbb]: (no justification provided)
- 18:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 18:06 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 18:05 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 18:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 18:05 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
- 18:05 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 17:26 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
- 17:25 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 17:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 17:05 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 16:03 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 16:03 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 16:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
- 15:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2074.codfw.wmnet with OS bullseye
- 15:18 Lucas_WMDE: UTC afternoon backport+config window done
- 15:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) (duration: 08m 22s)
- 15:15 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 15:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
- 15:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:09 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts wdqs1024.eqiad.wmnet
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1024.eqiad.wmnet
- 15:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984601Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751)
- 15:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 15:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
- 15:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1023.eqiad.wmnet
- 15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1023.eqiad.wmnet
- 15:05 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 15:04 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
- 15:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
- 15:01 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1024.eqiad.wmnet
- 15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
- 14:58 inflatador: bking@cumin2002 disable/mask wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories on wdqs102[24] T352878
- 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) (duration: 10m 30s)
- 14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
- 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:46 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982416RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265)
- 14:43 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708) (duration: 10m 16s)
- 14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Continuing with sync
- 14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:976650Remove BetaFeature code related to ReferencePreviews (T351708), gerrit:978035Remove wgPopupsReferencePreviews now that it defaults to true (T351708)
- 14:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758) (duration: 09m 38s)
- 14:26 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
- 14:22 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984324Check for false from ThumbnailImage::getStoragePath (T353758)
- 14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203) (duration: 15m 54s)
- 14:16 moritzm: installing distro-info-data updates from Bookworm point release
- 14:14 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Continuing with sync
- 14:12 moritzm: installing debootstrap bugfix updates from Bookworm point release
- 14:06 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:04 moritzm: installing cups updates from bookworm point release
- 14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:981636Make wiktionary and mw.org provide og:site_name (T348203)
- 13:38 aqu@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513] (duration: 00m 05s)
- 13:38 aqu@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513]
- 13:38 aqu@deploy2002: Finished deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 30s)
- 13:37 aqu@deploy2002: Started deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:37 aqu@deploy2002: Finished deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162] (duration: 00m 06s)
- 13:37 aqu@deploy2002: Started deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162]
- 13:36 aqu@deploy2002: Finished deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 25s)
- 13:36 aqu@deploy2002: Started deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:35 aqu@deploy2002: Finished deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 09s)
- 13:35 aqu@deploy2002: Started deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 05s)
- 13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 11s)
- 13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:32 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
- 13:32 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 13:31 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
- 13:31 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
- 12:12 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:30 kostajh: T353703 Manual run: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/mediamoderation.dblist extensions/MediaModeration/maintenance/updateMetrics.php --verbose
- 10:22 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
- 10:22 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
- 09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
- 09:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5002.wikimedia.org
- 09:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh5002.wikimedia.org
- 09:10 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh2001.wikimedia.org
- 09:10 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh2001.wikimedia.org
- 08:47 fabfur@cumin1001: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
- 06:31 ryankemper: T351671 Pooled `wdqs10[17-21]*`; data xfers completed and test queries are passing on `wdqs1018`. Will decom related hosts tomorrow (2023-12-20)
- 02:47 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 02:45 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 02:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 02:43 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 02:43 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 02:41 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 02:39 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 02:37 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 00:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 00:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
- 00:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
2023-12-19
- 22:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 22:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 22:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
- 22:26 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
- 21:43 mforns@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: (no justification provided) (duration: 00m 11s)
- 21:43 mforns@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: (no justification provided)
- 21:43 mforns@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: (no justification provided) (duration: 00m 27s)
- 21:43 mforns@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: (no justification provided)
- 21:39 ladsgroup@deploy2002: Finished scap: Backport for gerrit:984277Disable listings extension in more wikis (T253216) (duration: 07m 42s)
- 21:33 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 21:32 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:984277Disable listings extension in more wikis (T253216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:31 ladsgroup@deploy2002: Started scap: Backport for gerrit:984277Disable listings extension in more wikis (T253216)
- 21:26 kostajh: UTC late deploys done
- 21:26 kharlan@deploy2002: Finished scap: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353) (duration: 10m 00s)
- 21:20 kharlan@deploy2002: kharlan and dani: Continuing with sync
- 21:17 kharlan@deploy2002: kharlan and dani: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:16 kharlan@deploy2002: Started scap: Backport for gerrit:983962Undeploy Annual Plan Core Metrics survey (T351353)
- 21:14 kharlan@deploy2002: Finished scap: Backport for gerrit:984269MediaModeration: Add dblist (T353703) (duration: 07m 44s)
- 21:08 kharlan@deploy2002: kharlan: Continuing with sync
- 21:08 kharlan@deploy2002: kharlan: Backport for gerrit:984269MediaModeration: Add dblist (T353703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 kharlan@deploy2002: Started scap: Backport for gerrit:984269MediaModeration: Add dblist (T353703)
- 19:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
- 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bullseye
- 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 18:49 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 00m 05s)
- 18:48 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
- 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 18:43 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 03m 16s)
- 18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
- 18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe] (duration: 00m 06s)
- 18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe]
- 18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe] (duration: 09m 18s)
- 18:29 mforns@deploy2002: Started deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe]
- 18:29 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance (duration: 00m 31s)
- 18:28 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance
- 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
- 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
- 18:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
- 18:06 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testhost2001.codfw.wmnet with OS bullseye
- 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
- 16:23 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 16:15 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
- 16:12 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
- 16:04 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 327700
- 16:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
- 16:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 139901
- 16:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
- 16:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
- 15:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
- 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5398
- 15:55 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
- 15:55 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
- 15:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5398
- 15:42 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559) (duration: 07m 01s)
- 15:38 moritzm: installing gnutls28 security updates on bookworm
- 15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Continuing with sync
- 15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:35 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:983758Change virtual domain of botpassword to plural (T351559)
- 15:33 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php (duration: 07m 47s)
- 15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
- 15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984174Use main replica DB in importExistingFilesToScanTable.php
- 15:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
- 15:23 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
- 15:22 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334) (duration: 08m 49s)
- 15:16 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 15:15 moritzm: installing exim4 bugfix updates from Bookworm point release
- 15:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:13 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984172Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), gerrit:984173Use link batch in search APIs (T353334)
- 15:10 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
- 14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 14:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 14:33 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 14:32 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 14:31 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 14:30 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 14:29 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:29 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:25 Lucas_WMDE: UTC afternoon backport+config window done
- 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399) (duration: 07m 53s)
- 14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:21 kamila@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:21 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:19 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Continuing with sync
- 14:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:17 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:984166Send PhotoDNA the mime type of the thumbnail and not original file (T351401), gerrit:984169Add maintenance script to scan files in the mediamoderation_scan table (T351399)
- 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298) (duration: 10m 22s)
- 14:10 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Continuing with sync
- 14:08 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:05 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:983747testwiki: enable revertrisk model in ores extension (T348298)
- 13:45 jgiannelos@deploy2002: Finished deploy [restbase/deploy@40c15b1]: (no justification provided) (duration: 27m 26s)
- 13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
- 13:35 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
- 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
- 13:32 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
- 13:17 jgiannelos@deploy2002: Started deploy [restbase/deploy@40c15b1]: (no justification provided)
- 13:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 13:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:05 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:05 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 12:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
- 11:31 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 10:46 moritzm: installing perl security updates on bookworm
- 10:19 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:14 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 10:14 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 09:23 elukey: reload thanos-rule on titan2001
- 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1003.wikimedia.org
- 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
- 08:26 jmm@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
- 08:22 jmm@cumin1002: START - Cookbook sre.dns.netbox
- 08:17 jmm@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1003.wikimedia.org
- 06:13 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 06:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 05:10 kart_: Updated MinT to 2023-12-12-065316-production
- 04:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.10 refs T350086 (duration: 51m 03s)
- 04:49 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 04:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 04:43 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 04:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 04:36 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 04:09 cstone: civicrm upgraded from e2d49d10 to c3cc80c7
- 04:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.10 refs T350086
2023-12-18
- 23:40 taavi: conftool codfw/appserver/nginx/mw2448.codfw.wmnet: pooled changed yes => inactive # T353679, not sure why it was not logged automatically
- 22:35 maryum: Deployed patch for T347704
- 22:08 dancy: UTC late backport window completed.
- 22:07 dancy@deploy2002: Finished scap: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) (duration: 13m 34s)
- 21:57 dancy@deploy2002: dancy and kemayo: Continuing with sync
- 21:56 dancy@deploy2002: dancy and kemayo: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:54 dancy@deploy2002: Started scap: Backport for gerrit:983745Revert "Fix English Gboard backspace over aliens" (T353578 T325129), gerrit:983906Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), gerrit:983911Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129)
- 21:17 dancy@deploy2002: Finished scap: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393) (duration: 08m 30s)
- 21:11 dancy@deploy2002: dani and dancy: Continuing with sync
- 21:10 dancy@deploy2002: dani and dancy: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:09 dancy@deploy2002: Started scap: Backport for gerrit:983928Undeploy Reader Demographics 2 survey (T344393)
- 21:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 21:05 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 21:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 21:03 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 21:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 21:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:48 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:983939Add message_key_fields to page_content_change stream (T338231) (duration: 06m 32s)
- 20:31 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:19 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:19 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 17:14 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
- 17:12 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
- 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
- 16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
- 16:55 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
- 16:54 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:983756 Bumping portals to master (T128546) (duration: 06m 28s)
- 16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 16:52 akosiaris@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 16:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:983756 Bumping portals to master (T128546) (duration: 06m 08s)
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2076']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2074']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2079']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2077']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2078']
- 16:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
- 16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2080
- 16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
- 16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2080
- 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
- 16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
- 16:33 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
- 16:31 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 16:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
- 16:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
- 16:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
- 16:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
- 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
- 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
- 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
- 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
- 16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
- 16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
- 16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
- 16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
- 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2079']
- 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2080']
- 16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
- 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2079']
- 16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
- 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2078']
- 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2077']
- 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2076']
- 16:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
- 16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2075']
- 16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2074']
- 16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
- 16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
- 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
- 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
- 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
- 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
- 16:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
- 16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
- 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
- 15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
- 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 15:16 fabfur: repooling cp4037 (T352876)
- 15:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4037.ulsfo.wmnet
- 15:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp4037.ulsfo.wmnet
- 15:04 urbanecm@deploy2002: Finished scap: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) (duration: 10m 20s)
- 14:58 urbanecm@deploy2002: cwhite and urbanecm and chlod: Continuing with sync
- 14:55 urbanecm@deploy2002: cwhite and urbanecm and chlod: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:53 urbanecm@deploy2002: Started scap: Backport for gerrit:983229Configure and enable StatsLib for production (T343024), gerrit:983529Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076)
- 14:52 urbanecm@deploy2002: Finished scap: Backport for gerrit:981714Enable action blocks for zhwiki (T353120) (duration: 08m 58s)
- 14:47 urbanecm@deploy2002: milkydefer and urbanecm: Continuing with sync
- 14:45 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
- 14:45 moritzm: installing nagios-plugins-contrib bugfix updates
- 14:44 urbanecm@deploy2002: milkydefer and urbanecm: Backport for gerrit:981714Enable action blocks for zhwiki (T353120) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:44 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided) (duration: 00m 32s)
- 14:44 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided)
- 14:43 urbanecm@deploy2002: Started scap: Backport for gerrit:981714Enable action blocks for zhwiki (T353120)
- 14:43 urbanecm@deploy2002: Finished scap: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829) (duration: 19m 00s)
- 14:37 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Continuing with sync
- 14:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 14:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 14:34 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:24 urbanecm@deploy2002: Started scap: Backport for gerrit:982873Add a testing stream for page-prediction-change events (T349919), gerrit:983178CheckUser: Enable read new for event tables migration everywhere (T341829)
- 14:13 moritzm: installing node-undici security updates
- 13:15 moritzm: installing intel-microcode security updates on buster hosts
- 13:08 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
- 12:56 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 12:55 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 12:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 12:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 12:50 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 12:50 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 12:45 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 12:41 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-canary
- 12:26 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-canary
- 12:26 kamila@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 12:25 kamila@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:24 kamila@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 12:23 kamila@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 12:20 kamila@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:20 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 12:20 kamila@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:19 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
- 12:19 kamila@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:18 kamila@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:14 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:13 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:12 Emperor: restart swift-proxy and envoyproxy on ms-fe1012
- 12:10 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:04 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:03 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:01 moritzm: installing ncurses security updates
- 11:59 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:58 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:51 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 11:51 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
- 11:41 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 11:41 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 11:39 moritzm: installing qemu security updates on bookworm
- 11:38 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 11:37 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
- 11:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 11:36 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
- 10:56 moritzm: restarting apache/FPM on mw canaries to pick up gnutls update
- 10:52 moritzm: installing gnutls28 security updates
- 10:47 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 10:44 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 10:39 moritzm: installing jetty9 security updates
- 10:29 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 10:29 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 10:17 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 10:13 XioNoX: remove VRRP pinning on cr1-eqiad/cr2-eqiad/cr2-codfw
- 10:09 moritzm: installing Linux 6.1.67 updates on Bookworm hosts
- 09:45 XioNoX: make eqiad-codfw 100G link primary
- 09:10 vgutierrez: vgutierrez@acmechief1002:~$ sudo -i keyholder arm - T352242
2023-12-17
- 12:59 elukey: restart kubelet on ml-serve1001 (errors while syncing old containers)
2023-12-16
- 01:21 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided) (duration: 00m 10s)
- 01:21 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided)
- 00:44 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@63804c4]: (no justification provided) (duration: 00m 25s)
- 00:44 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@63804c4]: (no justification provided)
- 00:05 jhathaway: unbreaking my puppet change with, https://gerrit.wikimedia.org/r/c/operations/puppet/+/983504
2023-12-15
- 23:46 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@9600237]: (no justification provided) (duration: 00m 27s)
- 23:46 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@9600237]: (no justification provided)
- 23:06 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided) (duration: 00m 25s)
- 23:06 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided)
- 22:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:03 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided) (duration: 00m 25s)
- 22:03 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided)
- 21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS (duration: 00m 06s)
- 21:48 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS
- 21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS (duration: 81m 46s)
- 21:26 mutante: running puppet on all prometheus*
- 20:26 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS
- 15:44 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:25 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:01 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:00 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 14:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54482 and previous config saved to /var/cache/conftool/dbconfig/20231215-144624-arnaudb.json
- 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 14:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 14:40 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54481 and previous config saved to /var/cache/conftool/dbconfig/20231215-143812-arnaudb.json
- 14:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 80%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54480 and previous config saved to /var/cache/conftool/dbconfig/20231215-143118-arnaudb.json
- 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
- 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
- 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54479 and previous config saved to /var/cache/conftool/dbconfig/20231215-142307-arnaudb.json
- 14:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 40%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54478 and previous config saved to /var/cache/conftool/dbconfig/20231215-141613-arnaudb.json
- 14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54477 and previous config saved to /var/cache/conftool/dbconfig/20231215-140802-arnaudb.json
- 14:07 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:07 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 20%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54476 and previous config saved to /var/cache/conftool/dbconfig/20231215-140108-arnaudb.json
- 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 13:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54475 and previous config saved to /var/cache/conftool/dbconfig/20231215-135257-arnaudb.json
- 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db2179 to repool w/ api', diff saved to https://phabricator.wikimedia.org/P54474 and previous config saved to /var/cache/conftool/dbconfig/20231215-135228-arnaudb.json
- 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54473 and previous config saved to /var/cache/conftool/dbconfig/20231215-134603-arnaudb.json
- 13:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
- 13:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
- 12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 12:25 hashar@deploy2002: Finished deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515 (duration: 00m 06s)
- 12:25 hashar@deploy2002: Started deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515
- 11:28 hashar@deploy2002: Finished deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181 (duration: 00m 08s)
- 11:28 hashar@deploy2002: Started deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181
- 10:56 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:05 moritzm: installing Linux 6.1.67 packages on Bookworm hosts
- 08:56 XioNoX: shutdown already down IPv6 BGP session from ulsfo to the office
2023-12-14
- 23:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1002.eqiad.wmnet with OS bookworm
- 23:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
- 22:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
- 22:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
- 21:24 ssastry@deploy2002: Finished scap: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering" (duration: 10m 38s)
- 21:18 ssastry@deploy2002: ssastry: Continuing with sync
- 21:14 ssastry@deploy2002: ssastry: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 ssastry@deploy2002: Started scap: Backport for gerrit:982845Revert "Temporarily disable isPreview in Parsoid's rendering"
- 20:52 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 20:49 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 20:48 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:47 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:47 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 20:46 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 20:45 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 20:45 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[1009-1010].eqiad.wmnet
- 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 20:40 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 20:37 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 20:37 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 20:31 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
- 20:23 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[1009-1010].eqiad.wmnet
- 20:06 jmm@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
- 20:02 jmm@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
- 19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.9 refs T350085
- 19:03 brennen: 1.42.0-wmf.9 (T350085) status: no current blockers, although we should keep an eye on T353400. rolling to all wikis.
- 18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54462 and previous config saved to /var/cache/conftool/dbconfig/20231214-183508-arnaudb.json
- 18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54461 and previous config saved to /var/cache/conftool/dbconfig/20231214-183459-arnaudb.json
- 18:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54460 and previous config saved to /var/cache/conftool/dbconfig/20231214-182003-arnaudb.json
- 18:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54459 and previous config saved to /var/cache/conftool/dbconfig/20231214-181954-arnaudb.json
- 18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54458 and previous config saved to /var/cache/conftool/dbconfig/20231214-180458-arnaudb.json
- 18:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54457 and previous config saved to /var/cache/conftool/dbconfig/20231214-180449-arnaudb.json
- 17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54456 and previous config saved to /var/cache/conftool/dbconfig/20231214-174953-arnaudb.json
- 17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54455 and previous config saved to /var/cache/conftool/dbconfig/20231214-174944-arnaudb.json
- 17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54453 and previous config saved to /var/cache/conftool/dbconfig/20231214-173448-arnaudb.json
- 17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54452 and previous config saved to /var/cache/conftool/dbconfig/20231214-173439-arnaudb.json
- 17:24 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:23 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54451 and previous config saved to /var/cache/conftool/dbconfig/20231214-171943-arnaudb.json
- 17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54450 and previous config saved to /var/cache/conftool/dbconfig/20231214-171934-arnaudb.json
- 17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54449 and previous config saved to /var/cache/conftool/dbconfig/20231214-170438-arnaudb.json
- 17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54448 and previous config saved to /var/cache/conftool/dbconfig/20231214-170428-arnaudb.json
- 16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54446 and previous config saved to /var/cache/conftool/dbconfig/20231214-164925-arnaudb.json
- 16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54445 and previous config saved to /var/cache/conftool/dbconfig/20231214-164921-arnaudb.json
- 16:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:43 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 16:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 16:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54444 and previous config saved to /var/cache/conftool/dbconfig/20231214-163420-arnaudb.json
- 16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54443 and previous config saved to /var/cache/conftool/dbconfig/20231214-163416-arnaudb.json
- 16:24 akosiaris: updates of all wikikube services done T352906
- 16:20 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 16:18 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 16:17 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
- 16:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
- 16:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
- 16:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 16:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
- 16:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
- 16:16 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
- 16:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 16:15 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 16:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 16:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 16:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 16:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 16:13 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:12 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 16:12 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 16:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 16:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 16:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 16:09 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 16:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 16:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 16:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 16:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 16:08 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 16:07 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 16:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 16:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 16:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 16:06 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 16:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 16:06 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:06 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:05 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:05 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:05 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 16:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
- 16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
- 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
- 16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
- 16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
- 16:03 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 16:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
- 16:02 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:02 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 16:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 16:02 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:01 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 16:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 16:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 16:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
- 15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
- 15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
- 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
- 15:57 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
- 15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 15:57 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
- 15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
- 15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
- 15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 15:55 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
- 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet2002.codfw.wmnet
- 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
- 15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
- 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
- 15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
- 15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 15:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 15:53 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
- 15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 15:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 15:52 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 15:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 15:51 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 15:51 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
- 15:51 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
- 15:51 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
- 15:51 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
- 15:50 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
- 15:50 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
- 15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
- 15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
- 15:50 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 15:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
- 15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
- 15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
- 15:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
- 15:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 15:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
- 15:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
- 15:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 15:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 15:48 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 15:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 15:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 15:48 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 15:46 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
- 15:42 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:42 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 15:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 15:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 15:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 15:42 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet2002.codfw.wmnet
- 15:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 15:40 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:40 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 15:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 15:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:35 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 15:31 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided) (duration: 00m 48s)
- 15:30 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided)
- 15:29 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:28 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 15:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 15:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 15:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 15:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 15:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 15:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 15:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 15:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 15:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 15:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 14:46 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:45 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:45 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:45 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:43 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:43 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:22 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:07 moritzm: installing ruby-rails-html-sanitizer security updates
- 14:01 moritzm: installing ruby-loofah security updates
- 13:56 moritzm: installing reportbug bugfix updates on buster
- 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1137.eqiad.wmnet
- 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:53 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:52 moritzm: installing netty security updates
- 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1148.eqiad.wmnet
- 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:51 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1132.eqiad.wmnet
- 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 13:50 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 13:48 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1148.eqiad.wmnet
- 13:43 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1137.eqiad.wmnet
- 13:42 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1132.eqiad.wmnet
- 13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'decommissionning hosts', diff saved to https://phabricator.wikimedia.org/P54437 and previous config saved to /var/cache/conftool/dbconfig/20231214-134203-arnaudb.json
- 13:21 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1234.eqiad.wmnet
- 13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1134 in db1234 for T344036', diff saved to https://phabricator.wikimedia.org/P54436 and previous config saved to /var/cache/conftool/dbconfig/20231214-131913-arnaudb.json
- 13:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
- 13:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
- 13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
- 13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
- 13:12 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
- 13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1149 in db1249 for T344036', diff saved to https://phabricator.wikimedia.org/P54435 and previous config saved to /var/cache/conftool/dbconfig/20231214-131017-arnaudb.json
- 13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
- 13:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
- 13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
- 13:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
- 12:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:42 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:10 cgoubert@deploy2002: Finished scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 04m 16s)
- 12:05 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
- 12:03 cgoubert@deploy2002: sync-world aborted: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 00m 02s)
- 12:03 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
- 12:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
- 12:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
- 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54434 and previous config saved to /var/cache/conftool/dbconfig/20231214-115332-arnaudb.json
- 11:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 11:49 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
- 11:42 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
- 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54433 and previous config saved to /var/cache/conftool/dbconfig/20231214-113826-arnaudb.json
- 11:31 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
- 11:30 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
- 11:25 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
- 11:24 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
- 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54432 and previous config saved to /var/cache/conftool/dbconfig/20231214-112321-arnaudb.json
- 11:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54431 and previous config saved to /var/cache/conftool/dbconfig/20231214-110816-arnaudb.json
- 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54430 and previous config saved to /var/cache/conftool/dbconfig/20231214-110754-arnaudb.json
- 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54429 and previous config saved to /var/cache/conftool/dbconfig/20231214-110733-arnaudb.json
- 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54428 and previous config saved to /var/cache/conftool/dbconfig/20231214-110714-arnaudb.json
- 11:06 _joe_: restarted apache2 on lists1001
- 10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54427 and previous config saved to /var/cache/conftool/dbconfig/20231214-105814-arnaudb.json
- 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54426 and previous config saved to /var/cache/conftool/dbconfig/20231214-105311-arnaudb.json
- 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54425 and previous config saved to /var/cache/conftool/dbconfig/20231214-105248-arnaudb.json
- 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54424 and previous config saved to /var/cache/conftool/dbconfig/20231214-105228-arnaudb.json
- 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54423 and previous config saved to /var/cache/conftool/dbconfig/20231214-105209-arnaudb.json
- 10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
- 10:45 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
- 10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 90%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54422 and previous config saved to /var/cache/conftool/dbconfig/20231214-104308-arnaudb.json
- 10:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54421 and previous config saved to /var/cache/conftool/dbconfig/20231214-103806-arnaudb.json
- 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54420 and previous config saved to /var/cache/conftool/dbconfig/20231214-103743-arnaudb.json
- 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54419 and previous config saved to /var/cache/conftool/dbconfig/20231214-103723-arnaudb.json
- 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54418 and previous config saved to /var/cache/conftool/dbconfig/20231214-103704-arnaudb.json
- 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 80%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54417 and previous config saved to /var/cache/conftool/dbconfig/20231214-102803-arnaudb.json
- 10:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54416 and previous config saved to /var/cache/conftool/dbconfig/20231214-102301-arnaudb.json
- 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54415 and previous config saved to /var/cache/conftool/dbconfig/20231214-102238-arnaudb.json
- 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54414 and previous config saved to /var/cache/conftool/dbconfig/20231214-102218-arnaudb.json
- 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54413 and previous config saved to /var/cache/conftool/dbconfig/20231214-102159-arnaudb.json
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
- 10:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
- 10:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
- 10:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
- 10:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
- 10:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
- 10:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
- 10:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
- 10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 70%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54412 and previous config saved to /var/cache/conftool/dbconfig/20231214-101258-arnaudb.json
- 10:12 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
- 10:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
- 10:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
- 10:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
- 10:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
- 10:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
- 10:08 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54411 and previous config saved to /var/cache/conftool/dbconfig/20231214-100756-arnaudb.json
- 10:07 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
- 10:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54410 and previous config saved to /var/cache/conftool/dbconfig/20231214-100733-arnaudb.json
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54409 and previous config saved to /var/cache/conftool/dbconfig/20231214-100713-arnaudb.json
- 10:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
- 10:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54408 and previous config saved to /var/cache/conftool/dbconfig/20231214-100654-arnaudb.json
- 10:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
- 10:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 10:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 10:05 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 10:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 10:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 10:04 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 10:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 09:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 09:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 09:58 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 09:58 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 09:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 60%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54407 and previous config saved to /var/cache/conftool/dbconfig/20231214-095753-arnaudb.json
- 09:56 godog: remove >= 3 months old thanos blocks for prometheus/ops in eqiad/codfw and only for a single replica - T351927
- 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54406 and previous config saved to /var/cache/conftool/dbconfig/20231214-095228-arnaudb.json
- 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54405 and previous config saved to /var/cache/conftool/dbconfig/20231214-095208-arnaudb.json
- 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54404 and previous config saved to /var/cache/conftool/dbconfig/20231214-095149-arnaudb.json
- 09:51 hashar: Restarting CI Jenkins
- 09:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 09:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 09:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 09:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 09:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 09:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54402 and previous config saved to /var/cache/conftool/dbconfig/20231214-094248-arnaudb.json
- 09:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 09:39 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 09:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 09:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 09:38 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 09:38 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1002.eqiad.wmnet with OS bullseye
- 09:30 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 40%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54401 and previous config saved to /var/cache/conftool/dbconfig/20231214-092743-arnaudb.json
- 09:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
- 09:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
- 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
- 09:25 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 09:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 09:24 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 09:24 akosiaris: update all the other services. T352906
- 09:24 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 09:24 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 09:24 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 09:22 godog: delete raw replica blocks for prometheus/ops (only one replica) in codfw - T351927
- 09:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
- 09:21 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 09:20 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 09:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 09:20 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 09:20 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 09:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 30%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54400 and previous config saved to /var/cache/conftool/dbconfig/20231214-091238-arnaudb.json
- 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
- 09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host cumin1002.eqiad.wmnet
- 09:10 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cumin1002.eqiad.wmnet with OS bullseye
- 09:10 apergos: UTC morning backport and config window done
- 09:09 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 09:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 09:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 09:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
- 09:07 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
- 09:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 09:06 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 09:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 09:03 ariel@deploy2002: Finished scap: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) (duration: 09m 05s)
- 09:00 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
- 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 20%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54399 and previous config saved to /var/cache/conftool/dbconfig/20231214-085733-arnaudb.json
- 08:56 ariel@deploy2002: ariel and matmarex: Continuing with sync
- 08:56 ariel@deploy2002: ariel and matmarex: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:54 ariel@deploy2002: Started scap: Backport for gerrit:982415RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262)
- 08:47 ariel@deploy2002: Finished scap: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262) (duration: 08m 39s)
- 08:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54398 and previous config saved to /var/cache/conftool/dbconfig/20231214-084228-arnaudb.json
- 08:40 ariel@deploy2002: matmarex and ariel: Continuing with sync
- 08:39 ariel@deploy2002: matmarex and ariel: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:38 ariel@deploy2002: Started scap: Backport for gerrit:982414RunSingleJob.php: Remove overly complicated error handling (T353262)
- 08:35 ariel@deploy2002: Finished scap: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947) (duration: 10m 20s)
- 08:34 XioNoX: drain eqiad-codfw Arelion link for 100G migration
- 08:27 ariel@deploy2002: ariel and matmarex: Continuing with sync
- 08:26 ariel@deploy2002: ariel and matmarex: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:24 ariel@deploy2002: Started scap: Backport for gerrit:982441Remove references to refreshMessageBlobs.php (T314947)
- 08:20 ariel@deploy2002: Finished scap: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486) (duration: 10m 33s)
- 08:13 ariel@deploy2002: ariel: Continuing with sync
- 08:11 ariel@deploy2002: ariel: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:10 ariel@deploy2002: Started scap: Backport for gerrit:971967use virtual db domain for CentralAuth and GlobalBlocking (T348486)
- 08:08 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:02 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
- 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
- 07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cumin1002.eqiad.wmnet on all recursors
- 07:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache cumin1002.eqiad.wmnet on all recursors
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
- 07:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
- 07:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host cumin1002.eqiad.wmnet
- 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 07:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
- 07:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 03:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
- 03:06 bvibber: cleanupOrphanedTranscodes complete. requeueTranscodes continues... forever and ever and ever
- 02:54 bvibber: brion running cleanupOrphanedTranscodes on commonswiki on mwmaint2002
- 01:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
- 01:25 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
- 01:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
- 01:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
- 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
- 00:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:40 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:38 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:38 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=93) on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:34 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
- 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1002.eqiad.wmnet
- 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
- 00:17 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
- 00:15 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 00:11 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet
2023-12-13
- 23:48 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host acmechief1002.eqiad.wmnet
- 23:48 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host acmechief1002.eqiad.wmnet with OS bookworm
- 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
- 23:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
- 23:17 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
- 23:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
- 23:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
- 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
- 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:57 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]] (duration: 09m 42s)
- 22:57 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
- 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:53 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:50 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Continuing with sync
- 22:49 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:48 jhuneidi@deploy2002: Started scap: Backport for gerrit:982867Update wgStatsTarget to port 9125 (T240685), [[gerrit:982925|[BC] Enable desktop diff and history pages on mobile (T350181 T353388)]]
- 22:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
- 22:45 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering (duration: 10m 08s)
- 22:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
- 22:45 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
- 22:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
- 22:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1004.eqiad.wmnet with OS bullseye
- 22:40 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
- 22:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
- 22:38 jhuneidi@deploy2002: ssastry and jhuneidi: Continuing with sync
- 22:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
- 22:37 jhuneidi@deploy2002: ssastry and jhuneidi: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
- 22:36 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
- 22:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
- 22:35 jhuneidi@deploy2002: Started scap: Backport for gerrit:982835tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), gerrit:982834Temporarily disable isPreview in Parsoid's rendering
- 22:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
- 22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
- 22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
- 22:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
- 22:18 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685) (duration: 12m 33s)
- 22:18 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
- 22:17 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
- 22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief1002.eqiad.wmnet on all recursors
- 22:16 brett@cumin2002: START - Cookbook sre.dns.wipe-cache acmechief1002.eqiad.wmnet on all recursors
- 22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:16 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
- 22:15 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
- 22:12 brett@cumin2002: START - Cookbook sre.dns.netbox
- 22:11 brett@cumin2002: START - Cookbook sre.ganeti.makevm for new host acmechief1002.eqiad.wmnet
- 22:11 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Continuing with sync
- 22:09 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1004.eqiad.wmnet with OS bullseye
- 22:07 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:05 jhuneidi@deploy2002: Started scap: Backport for gerrit:982857Partially undeploy Reader Demographics 2 survey (T344393), gerrit:955015Enable $wgStatsTarget for requests to mwdebug (T240685)
- 22:01 jhuneidi@deploy2002: Finished scap: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) (duration: 10m 28s)
- 21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
- 21:54 jhuneidi@deploy2002: jhuneidi and jdlrobson: Continuing with sync
- 21:52 jhuneidi@deploy2002: jhuneidi and jdlrobson: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:50 jhuneidi@deploy2002: Started scap: Backport for gerrit:982244Restore fixed width and height, direction of arrow on change list pages (T352456 T353099)
- 21:04 cstone: civicrm upgraded from 834606ef to e2d49d10
- 20:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts planet1002.eqiad.wmnet
- 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
- 20:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet
- 19:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2031.codfw.wmnet
- 19:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2031.codfw.wmnet
- 19:19 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.9 refs T350085 (duration: 07m 29s)
- 19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.9 refs T350085
- 19:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
- 19:01 brennen: 1.42.0-wmf.9 (T350085) status: no blockers, rolling to group1
- 18:07 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:07 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 18:06 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:05 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:58 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:57 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
- 17:27 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:25 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1006']
- 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1005']
- 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1004']
- 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1006']
- 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1005']
- 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1004']
- 16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
- 16:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 16:39 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
- 16:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 16:38 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54395 and previous config saved to /var/cache/conftool/dbconfig/20231213-163657-arnaudb.json
- 16:36 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:36 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:36 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:31 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:30 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:29 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1006
- 16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1006
- 16:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 16:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1005
- 16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 16:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1005
- 16:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1004
- 16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 90%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54394 and previous config saved to /var/cache/conftool/dbconfig/20231213-162152-arnaudb.json
- 16:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1004
- 16:19 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 16:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 16:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 16:16 ladsgroup@deploy2002: Finished scap: Backport for gerrit:982824Fix my email in the key list (duration: 08m 45s)
- 16:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 16:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 16:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
- 16:13 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 16:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 16:12 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 16:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 16:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 16:09 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:982824Fix my email in the key list synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:07 ladsgroup@deploy2002: Started scap: Backport for gerrit:982824Fix my email in the key list
- 16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 80%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54393 and previous config saved to /var/cache/conftool/dbconfig/20231213-160647-arnaudb.json
- 16:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
- 16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
- 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
- 16:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
- 16:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
- 16:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
- 16:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 16:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 16:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
- 16:00 akosiaris: upgrade apertium, bluebberoid everywhere to use the latest service_proxy image, 1.23.10-2-s4-20231203 T352906
- 15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
- 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
- 15:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
- 15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
- 15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
- 15:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
- 15:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 15:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 15:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 15:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 70%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54392 and previous config saved to /var/cache/conftool/dbconfig/20231213-155142-arnaudb.json
- 15:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 15:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 15:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 15:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 15:46 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 15:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 15:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 15:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 15:40 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 15:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 15:39 claime: Deploying shellbox: update php-fpm-exporter version - 982432
- 15:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54389 and previous config saved to /var/cache/conftool/dbconfig/20231213-153636-arnaudb.json
- 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1147.eqiad.wmnet
- 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:35 Amir1: tagging 1.41.0-rc.0 in core
- 15:35 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:33 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 15:28 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1147.eqiad.wmnet
- 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1129.eqiad.wmnet
- 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:24 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:21 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54387 and previous config saved to /var/cache/conftool/dbconfig/20231213-152131-arnaudb.json
- 15:17 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 15:16 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1129.eqiad.wmnet
- 15:15 ladsgroup@deploy2002: Finished scap: Backport for gerrit:982499docroot: Add my pgp key (duration: 09m 50s)
- 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1128.eqiad.wmnet
- 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:12 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:10 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 15:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:07 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:982499docroot: Add my pgp key synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54386 and previous config saved to /var/cache/conftool/dbconfig/20231213-150626-arnaudb.json
- 15:06 ladsgroup@deploy2002: Started scap: Backport for gerrit:982499docroot: Add my pgp key
- 15:05 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1128.eqiad.wmnet
- 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1128 29 and 47', diff saved to https://phabricator.wikimedia.org/P54385 and previous config saved to /var/cache/conftool/dbconfig/20231213-150425-arnaudb.json
- 15:00 Lucas_WMDE: UTC afternoon backport+config window done
- 15:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829) (duration: 08m 29s)
- 14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
- 14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:51 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982105CheckUser: Enable read new for event tables migration on group1 (T341829)
- 14:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 30%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54384 and previous config saved to /var/cache/conftool/dbconfig/20231213-145121-arnaudb.json
- 14:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496) (duration: 19m 09s)
- 14:43 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 14:43 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 14:42 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Continuing with sync
- 14:42 hashar: Restarted Gerrit on gerrit1003 and gerrit2002
- 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54383 and previous config saved to /var/cache/conftool/dbconfig/20231213-143616-arnaudb.json
- 14:33 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:30 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:982653Utilities/Yaml: Use string as value with ini_set (T348496)
- 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
- 14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54381 and previous config saved to /var/cache/conftool/dbconfig/20231213-142111-arnaudb.json
- 14:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 14:02 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
- 14:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1148 in db1248 for T344036', diff saved to https://phabricator.wikimedia.org/P54380 and previous config saved to /var/cache/conftool/dbconfig/20231213-140017-arnaudb.json
- 13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
- 13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
- 13:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
- 13:53 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 13:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 13:51 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 13:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 13:50 moritzm: installing postgresql-11 security updates
- 13:49 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 13:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
- 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1233 for T344036', diff saved to https://phabricator.wikimedia.org/P54379 and previous config saved to /var/cache/conftool/dbconfig/20231213-134632-arnaudb.json
- 13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
- 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
- 13:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
- 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
- 13:27 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
- 13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1132 in db1232 for T344036', diff saved to https://phabricator.wikimedia.org/P54376 and previous config saved to /var/cache/conftool/dbconfig/20231213-132511-arnaudb.json
- 13:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
- 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
- 13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
- 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
- 13:05 godog: delete raw replica blocks for prometheus/ops (only one replica) in eqiad - T351927
- 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
- 12:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:40 moritzm: installing OpenSSH security updates on bullseye
- 12:25 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:25 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:16 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:16 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:09 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bookworm
- 12:02 vgutierrez: setting cp4037 as inactive - T352876
- 11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
- 11:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
- 11:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 11:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 11:33 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bookworm
- 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
- 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
- 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
- 11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
- 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
- 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
- 10:50 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
- 10:49 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
- 10:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
- 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bookworm
- 10:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 10:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 10:24 claime: Updating mw-debug prometheus-php-fpm-exporter to 0.0.3
- 10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
- 10:11 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646 (duration: 00m 42s)
- 10:11 hashar@deploy2002: Started deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646
- 10:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54374 and previous config saved to /var/cache/conftool/dbconfig/20231213-100708-arnaudb.json
- 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54373 and previous config saved to /var/cache/conftool/dbconfig/20231213-100651-arnaudb.json
- 10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54372 and previous config saved to /var/cache/conftool/dbconfig/20231213-100555-arnaudb.json
- 10:00 moritzm: failover ganeti master in eqsin to ganeti5007
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
- 09:57 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bookworm
- 09:56 hashar: Disabled puppet agent on contint1002, contint2002, releases1003 and releases2003 to progressively deploy https://gerrit.wikimedia.org/r/922555
- 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
- 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54371 and previous config saved to /var/cache/conftool/dbconfig/20231213-095203-arnaudb.json
- 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54370 and previous config saved to /var/cache/conftool/dbconfig/20231213-095146-arnaudb.json
- 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54369 and previous config saved to /var/cache/conftool/dbconfig/20231213-095049-arnaudb.json
- 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
- 09:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
- 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
- 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54368 and previous config saved to /var/cache/conftool/dbconfig/20231213-093658-arnaudb.json
- 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54367 and previous config saved to /var/cache/conftool/dbconfig/20231213-093641-arnaudb.json
- 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54366 and previous config saved to /var/cache/conftool/dbconfig/20231213-093544-arnaudb.json
- 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
- 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
- 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
- 09:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
- 09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:25 brouberol: increasing pod max requested memory to a higher value than the container max requested memory for dse-k8s-eqiad - T351722
- 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54365 and previous config saved to /var/cache/conftool/dbconfig/20231213-092153-arnaudb.json
- 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54364 and previous config saved to /var/cache/conftool/dbconfig/20231213-092136-arnaudb.json
- 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54363 and previous config saved to /var/cache/conftool/dbconfig/20231213-092039-arnaudb.json
- 09:20 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
- 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
- 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54362 and previous config saved to /var/cache/conftool/dbconfig/20231213-090648-arnaudb.json
- 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54361 and previous config saved to /var/cache/conftool/dbconfig/20231213-090631-arnaudb.json
- 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54360 and previous config saved to /var/cache/conftool/dbconfig/20231213-090534-arnaudb.json
- 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 202120
- 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 202120
- 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54359 and previous config saved to /var/cache/conftool/dbconfig/20231213-085143-arnaudb.json
- 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54358 and previous config saved to /var/cache/conftool/dbconfig/20231213-085125-arnaudb.json
- 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54357 and previous config saved to /var/cache/conftool/dbconfig/20231213-085027-arnaudb.json
- 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
- 08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
- 08:48 XioNoX: delete bgp group Confed_drmrs from cr1-esams - T347892
- 08:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
- 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
- 08:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 46997
- 08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46997
- 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54356 and previous config saved to /var/cache/conftool/dbconfig/20231213-083638-arnaudb.json
- 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54355 and previous config saved to /var/cache/conftool/dbconfig/20231213-083620-arnaudb.json
- 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54354 and previous config saved to /var/cache/conftool/dbconfig/20231213-083522-arnaudb.json
- 08:30 XioNoX: delete bgp group Confed_esams from cr2-drmrs - T347892
- 08:25 mlitn@deploy2002: Finished scap: Backport for gerrit:979113No custom UW licensing config (duration: 09m 43s)
- 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54353 and previous config saved to /var/cache/conftool/dbconfig/20231213-082133-arnaudb.json
- 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54352 and previous config saved to /var/cache/conftool/dbconfig/20231213-082115-arnaudb.json
- 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54351 and previous config saved to /var/cache/conftool/dbconfig/20231213-082017-arnaudb.json
- 08:18 mlitn@deploy2002: mlitn: Continuing with sync
- 08:17 mlitn@deploy2002: mlitn: Backport for gerrit:979113No custom UW licensing config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:16 mlitn@deploy2002: Started scap: Backport for gerrit:979113No custom UW licensing config
- 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bookworm
- 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54350 and previous config saved to /var/cache/conftool/dbconfig/20231213-080628-arnaudb.json
- 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54349 and previous config saved to /var/cache/conftool/dbconfig/20231213-080610-arnaudb.json
- 08:06 moritzm: installing openssh security updates
- 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54348 and previous config saved to /var/cache/conftool/dbconfig/20231213-080512-arnaudb.json
- 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
- 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
- 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54347 and previous config saved to /var/cache/conftool/dbconfig/20231213-075123-arnaudb.json
- 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54346 and previous config saved to /var/cache/conftool/dbconfig/20231213-075105-arnaudb.json
- 07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54345 and previous config saved to /var/cache/conftool/dbconfig/20231213-075006-arnaudb.json
- 07:43 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
- 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bookworm
- 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bookworm
- 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
- 06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
- 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bookworm
- 03:41 hashar@deploy2002: Finished deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109 (duration: 00m 08s)
- 03:41 hashar@deploy2002: Started deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109
2023-12-12
- 23:56 ejegg: donorwiki upgraded from f7407053 to bc49e5a6
- 23:26 tzatziki: removing 2 files for legal compliance
- 23:05 tzatziki: removing 2 files for legal compliance
- 22:57 mutante: planet - switched to eqiad and bookworm backend (T348392 T345617) - https://meta.wikimedia.org/wiki/Planet_Wikimedia
- 22:43 mutante: planet2003 -manually upgrade rawdog package to 3.0.2 T348392
- 21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
- 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
- 21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
- 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
- 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
- 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: reimage
- 21:18 samtar@deploy2002: Finished scap: Backport for gerrit:980963Add stream config for Android article instruments (T351292) (duration: 11m 59s)
- 21:10 samtar@deploy2002: cjming and samtar: Continuing with sync
- 21:07 samtar@deploy2002: cjming and samtar: Backport for gerrit:980963Add stream config for Android article instruments (T351292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 samtar@deploy2002: Started scap: Backport for gerrit:980963Add stream config for Android article instruments (T351292)
- 20:42 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 20:40 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 20:38 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 20:37 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 20:33 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 20:30 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 20:28 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:17 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:05 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 20:04 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 19:59 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
- 19:57 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 19:56 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 19:46 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:46 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:43 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.9 refs T350085
- 19:33 brennen@deploy2002: Finished scap: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) (duration: 09m 41s)
- 19:26 brennen@deploy2002: brennen and ssastry: Continuing with sync
- 19:25 brennen@deploy2002: brennen and ssastry: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:23 brennen@deploy2002: Started scap: Backport for gerrit:982237ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257)
- 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
- 19:08 brennen: 1.42.0-wmf.9 (T350085) status: deploying a fix for T353257 and then will proceed to group0.
- 19:03 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
- 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
- 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab2002.codfw.wmnet with OS bullseye
- 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
- 18:32 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 18:31 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 18:29 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:28 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
- 18:12 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
- 18:10 mutante: reimaging phab2002 (stand-by phorge server with bullseye - T327068
- 17:42 ejegg: fundraising civicrm upgraded from 8c107215 to 834606ef
- 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
- 17:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
- 17:32 ejegg: payments-wiki upgraded from 1d24dc90 to c1181b95
- 17:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
- 17:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
- 17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
- 17:16 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:13 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
- 16:33 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
- 16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
- 16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1060']
- 16:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1060']
- 16:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
- 16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274 (duration: 00m 48s)
- 16:04 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274
- 16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274 (duration: 00m 32s)
- 16:03 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274
- 16:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
- 16:03 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
- 16:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 15:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
- 15:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
- 15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:27 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:26 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:25 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:24 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:23 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:22 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:22 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:21 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:21 claime: Deploying new calico BGPPeers for codfw rows a/b - T352893
- 14:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
- 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1137 in db1237 for T344036', diff saved to https://phabricator.wikimedia.org/P54339 and previous config saved to /var/cache/conftool/dbconfig/20231212-145205-arnaudb.json
- 14:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
- 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
- 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
- 14:50 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
- 14:48 phuedx: UTC afternoon backport window done
- 14:47 phuedx@deploy2002: Finished scap: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393) (duration: 24m 33s)
- 14:39 phuedx@deploy2002: phuedx and dani: Continuing with sync
- 14:35 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
- 14:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
- 14:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
- 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1211 in db1226 for T344036', diff saved to https://phabricator.wikimedia.org/P54336 and previous config saved to /var/cache/conftool/dbconfig/20231212-143233-arnaudb.json
- 14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 14:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 14:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
- 14:24 phuedx@deploy2002: phuedx and dani: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:22 phuedx@deploy2002: Started scap: Backport for gerrit:982178Partially undeploy Reader Demographics 2 survey (T344393)
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:45 brouberol: increasing max container memory requests in dse-k8s from 3GB to 8GB - T351722
- 13:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
- 13:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
- 13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
- 13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
- 13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
- 13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
- 13:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
- 13:00 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
- 12:57 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
- 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1011.eqiad.wmnet
- 12:55 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
- 12:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 12:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 12:51 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
- 12:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup1011.eqiad.wmnet
- 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1010.eqiad.wmnet
- 12:45 jayme: increasing memory of ganeti instance kubemaster2001.codfw.wmnet from 4G to 12G (requires reboot) - T353233
- 12:38 claime: Uncordoning kubernetes10[59-62].eqiad.wmnet - T353135
- 12:37 claime: Pooling kubernetes10[59-62].eqiad.wmnet - T353135
- 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2011.codfw.wmnet
- 12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2011.codfw.wmnet
- 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2010.codfw.wmnet
- 12:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2010.codfw.wmnet
- 11:43 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 11:43 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 11:28 moritzm: installing postgresql-11 security updates
- 10:50 samtar@deploy2002: Finished scap: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041) (duration: 09m 51s)
- 10:47 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
- 10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1229 for T344036', diff saved to https://phabricator.wikimedia.org/P54335 and previous config saved to /var/cache/conftool/dbconfig/20231212-104404-arnaudb.json
- 10:43 samtar@deploy2002: samtar and samwilson: Continuing with sync
- 10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
- 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
- 10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
- 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
- 10:41 samtar@deploy2002: samtar and samwilson: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:40 samtar@deploy2002: Started scap: Backport for gerrit:981423testwiki: Enable the Edit Recovery feature (T353041)
- 10:30 moritzm: installing nghttp2 security updates
- 10:16 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 10:15 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 10:13 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 10:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 10:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 10:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 10:04 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 10:04 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 10:04 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 09:57 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
- 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 clone from db1128 ', diff saved to https://phabricator.wikimedia.org/P54334 and previous config saved to /var/cache/conftool/dbconfig/20231212-095352-arnaudb.json
- 09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
- 09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
- 09:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
- 09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
- 09:43 moritzm: installing ca-certificates-java updates from Bookworm point release
- 09:08 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
- 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1147 in db1247 for T344036', diff saved to https://phabricator.wikimedia.org/P54333 and previous config saved to /var/cache/conftool/dbconfig/20231212-090652-arnaudb.json
- 09:05 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
- 09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
- 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
- 09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
- 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
- 08:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
- 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
- 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
- 07:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
- 07:52 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
- 07:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 07:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 07:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
- 07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
- 06:46 marostegui@deploy2002: Finished scap: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1" (duration: 09m 00s)
- 06:38 marostegui@deploy2002: marostegui: Continuing with sync
- 06:38 marostegui@deploy2002: marostegui: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 06:37 marostegui@deploy2002: Started scap: Backport for gerrit:982226Revert "ProductionServices.php: Promote pc2014 as master of pc1"
- 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bookworm
- 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
- 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
- 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bookworm
- 05:59 marostegui@deploy2002: Finished scap: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787) (duration: 08m 35s)
- 05:52 marostegui@deploy2002: marostegui: Continuing with sync
- 05:52 marostegui@deploy2002: marostegui: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 05:51 marostegui@deploy2002: Started scap: Backport for gerrit:982206ProductionServices.php: Promote pc2014 as master of pc1 (T351787)
- 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
- 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
- 04:58 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.5 (duration: 02m 17s)
- 04:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.9 refs T350085 (duration: 53m 03s)
- 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.9 refs T350085
2023-12-11
- 22:39 jdrewniak@deploy2002: Finished scap: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]] (duration: 12m 14s)
- 22:32 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 22:28 jdrewniak@deploy2002: jdrewniak: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:26 jdrewniak@deploy2002: Started scap: Backport for [[gerrit:982162|[Vector] Deploy the Zebra CSS refactor under feature flag (T353008)]]
- 22:23 ladsgroup@deploy2002: Finished scap: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) (duration: 10m 42s)
- 22:15 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
- 22:14 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:12 ladsgroup@deploy2002: Started scap: Backport for gerrit:981737api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237)
- 22:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
- 22:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
- 18:34 claime: Raised replicas for mw-web
- 18:32 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:48 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:47 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:47 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:45 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:45 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 17:43 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 17:43 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 17:42 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 17:04 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:982110 Bumping portals to master (T128546) (duration: 08m 15s)
- 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2004.codfw.wmnet with OS bullseye
- 17:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:56 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:982110 Bumping portals to master (T128546) (duration: 10m 12s)
- 16:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
- 16:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
- 16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
- 16:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
- 16:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 16:27 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
- 16:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bullseye
- 16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
- 16:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
- 16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2005.codfw.wmnet with OS bullseye
- 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
- 16:19 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:968344Enable canary events for all MediaWiki event streams (T266798) (duration: 08m 25s)
- 16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
- 16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
- 16:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
- 16:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
- 16:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:13 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 effectively enabling IPIP encapsulation on ncredir@eqiad - T351069
- 16:10 ottomata: enabling canary events for all mediawiki state change event streams - T266798
- 16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
- 16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
- 16:02 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
- 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 16:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
- 16:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:00 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:59 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:58 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:57 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:57 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:56 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 15:55 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:55 claime: homer lsw1-*eqiad* commit "Put kubernetes10[59-62] in production - T353135"
- 15:55 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
- 15:55 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:54 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:53 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:53 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
- 15:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:48 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
- 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2006.codfw.wmnet with OS bullseye
- 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1143.eqiad.wmnet
- 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 15:32 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:30 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 15:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:25 brouberol: provisioning TLS certificates for the spark-history and spark-history-test namespaces in dse-k8s-eqiad - T352639
- 15:25 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1143.eqiad.wmnet
- 15:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1142.eqiad.wmnet
- 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:20 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:18 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1142.eqiad.wmnet
- 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 15:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1141.eqiad.wmnet
- 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 15:01 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
- 14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
- 14:57 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
- 14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
- 14:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 14:56 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
- 14:53 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
- 14:53 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
- 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1141 42 and 43', diff saved to https://phabricator.wikimedia.org/P54330 and previous config saved to /var/cache/conftool/dbconfig/20231211-145300-arnaudb.json
- 14:52 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
- 14:52 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
- 14:51 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1141.eqiad.wmnet
- 14:51 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
- 14:51 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 14:50 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
- 14:50 otto@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 14:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 14:49 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
- 14:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 14:48 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
- 14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 14:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 14:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 14:45 ottomata: deploying changeprop to pick up https://phabricator.wikimedia.org/T351247
- 14:37 TheresNoTime: close UTC afternoon backport window
- 14:25 samtar@deploy2002: Finished scap: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981) (duration: 10m 35s)
- 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
- 14:17 samtar@deploy2002: samtar and anzx: Continuing with sync
- 14:16 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
- 14:15 samtar@deploy2002: samtar and anzx: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:14 samtar@deploy2002: Started scap: Backport for gerrit:981726hewikivoyage: update vector 2022 wordmark and tagline (T351981)
- 14:11 samtar@deploy2002: Finished scap: Backport for gerrit:979986Enable read new on group0 wikis (T341829) (duration: 07m 57s)
- 14:05 samtar@deploy2002: samtar and dreamyjazz: Continuing with sync
- 14:05 samtar@deploy2002: samtar and dreamyjazz: Backport for gerrit:979986Enable read new on group0 wikis (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 samtar@deploy2002: Started scap: Backport for gerrit:979986Enable read new on group0 wikis (T341829)
- 13:59 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:58 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 13:27 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1138.eqiad.wmnet
- 13:26 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:25 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
- 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1138', diff saved to https://phabricator.wikimedia.org/P54328 and previous config saved to /var/cache/conftool/dbconfig/20231211-132250-arnaudb.json
- 13:20 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1138.eqiad.wmnet
- 13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
- 13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
- 13:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 12:57 claime: Rebuilding production-images for python3-build-bookworm - T352733
- 12:12 urbanecm@deploy2002: Finished scap: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) (duration: 08m 20s)
- 12:11 brouberol: Adding spark-history(-test).svc.eqiad.wmnet CNAMEs pointing to k8s-ingress-dse.svc.eqiad.wmnet. - T352639
- 12:05 urbanecm@deploy2002: urbanecm: Continuing with sync
- 12:05 urbanecm@deploy2002: urbanecm: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:03 urbanecm@deploy2002: Started scap: Backport for gerrit:981734Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)
- 11:20 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069
- 11:18 claime: sudo confctl --object-type discovery select 'name=eqiad,dnsdisc=k8s-ingress-dse' set/pooled=true - T352639
- 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:12 brouberol: Add discovery records for the k8s-ingress-dse LVS service - T352639
- 10:55 dcausse: (properly) restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
- 10:50 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
- 10:46 claime: Running puppet on O:lvs::balancer - T352639
- 10:45 claime: Disabling puppet on O:lvs::balancer - T352639
- 10:42 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 10:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 10:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 10:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 10:38 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 10:38 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 10:37 claime: Repooling dse-k8s-worker nodes - sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=yes - T352639
- 10:03 jayme: removed cergen certs of all k8s servies from private puppet in commit d36a97a - T300033
- 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38753
- 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38753
- 09:55 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 09:55 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1547
- 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1547
- 09:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 09:50 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 09:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 09:44 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 08:43 kostajh: UTC morning deploys done
- 08:43 kharlan@deploy2002: Finished scap: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) (duration: 22m 02s)
- 08:40 dcausse: restarted blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 08:36 kharlan@deploy2002: kharlan and d3r1ck01: Continuing with sync
- 08:22 kharlan@deploy2002: kharlan and d3r1ck01: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:21 kharlan@deploy2002: Started scap: Backport for gerrit:976252ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), gerrit:981424IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)
- 08:16 kharlan@deploy2002: Finished scap: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false (duration: 09m 55s)
- 08:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
- 08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
- 08:09 kharlan@deploy2002: kharlan: Continuing with sync
- 08:07 kharlan@deploy2002: kharlan: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:06 kharlan@deploy2002: Started scap: Backport for gerrit:979969MediaModeration: Set MediaModerationDeveloperMode to false
- 07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
- 07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
- 07:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
- 07:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
- 07:24 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
- 07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
- 07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 T351864
- 07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 org
- 06:44 marostegui@deploy2002: Finished scap: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1" (duration: 08m 22s)
- 06:37 marostegui@deploy2002: marostegui: Continuing with sync
- 06:37 marostegui@deploy2002: marostegui: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 06:35 marostegui@deploy2002: Started scap: Backport for gerrit:981729Revert "ProductionServices.php: Promote pc1014 to pc1"
- 06:35 _joe_: update sirenbot to 0.3.7
- 06:34 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bookworm
- 06:29 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 06:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 06:16 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 06:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
- 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
- 06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 06:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 05:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bookworm
- 05:54 marostegui@deploy2002: Finished scap: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787) (duration: 16m 54s)
- 05:47 marostegui@deploy2002: marostegui: Continuing with sync
- 05:46 marostegui@deploy2002: marostegui: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 05:37 marostegui@deploy2002: Started scap: Backport for gerrit:981710ProductionServices.php: Promote pc1014 to pc1 (T351787)
- 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
- 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
2023-12-09
- 15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
- 15:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
- 01:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
- 00:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 00:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 00:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
- 00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
- 00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 00:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
- 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
- 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 00:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 00:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
2023-12-08
- 23:49 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 23:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 23:48 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 23:48 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 23:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 23:47 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bullseye
- 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
- 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
- 23:04 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:03 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
- 22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
- 22:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
- 22:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 22:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bullseye
- 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
- 21:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
- 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
- 20:02 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:02 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 17:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 17:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:49 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:49 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
- 16:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
- 16:08 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:50 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) (duration: 00m 52s)
- 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 15:49 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@049cf03]: (no justification provided)
- 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
- 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
- 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
- 15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 15:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 14:44 XioNoX: drain eqiad-codfw lumen transport for maintenance - T342502
- 14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
- 14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
- 14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 12:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:42 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:40 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54322 and previous config saved to /var/cache/conftool/dbconfig/20231208-101337-arnaudb.json
- 09:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54321 and previous config saved to /var/cache/conftool/dbconfig/20231208-095830-arnaudb.json
- 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54320 and previous config saved to /var/cache/conftool/dbconfig/20231208-094324-arnaudb.json
- 09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:41 brouberol: Creating the echoserver namespace in dse-k8s-eqiad - T353004
- 09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54319 and previous config saved to /var/cache/conftool/dbconfig/20231208-092817-arnaudb.json
- 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54318 and previous config saved to /var/cache/conftool/dbconfig/20231208-091628-arnaudb.json
- 09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 09:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 237
- 07:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 237
- 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54317 and previous config saved to /var/cache/conftool/dbconfig/20231208-062636-ladsgroup.json
- 06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54316 and previous config saved to /var/cache/conftool/dbconfig/20231208-061130-ladsgroup.json
- 05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54315 and previous config saved to /var/cache/conftool/dbconfig/20231208-055623-ladsgroup.json
- 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54314 and previous config saved to /var/cache/conftool/dbconfig/20231208-054116-ladsgroup.json
- 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54313 and previous config saved to /var/cache/conftool/dbconfig/20231208-050624-ladsgroup.json
- 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54312 and previous config saved to /var/cache/conftool/dbconfig/20231208-041826-ladsgroup.json
- 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54311 and previous config saved to /var/cache/conftool/dbconfig/20231208-040319-ladsgroup.json
- 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54310 and previous config saved to /var/cache/conftool/dbconfig/20231208-034813-ladsgroup.json
- 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54309 and previous config saved to /var/cache/conftool/dbconfig/20231208-033306-ladsgroup.json
- 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54308 and previous config saved to /var/cache/conftool/dbconfig/20231208-030005-ladsgroup.json
- 03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54307 and previous config saved to /var/cache/conftool/dbconfig/20231208-025942-ladsgroup.json
- 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54306 and previous config saved to /var/cache/conftool/dbconfig/20231208-024435-ladsgroup.json
- 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54305 and previous config saved to /var/cache/conftool/dbconfig/20231208-022929-ladsgroup.json
- 02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 02:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 02:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 02:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2004']
- 02:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2004']
- 02:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 02:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54304 and previous config saved to /var/cache/conftool/dbconfig/20231208-021422-ladsgroup.json
- 02:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
- 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54303 and previous config saved to /var/cache/conftool/dbconfig/20231208-012115-ladsgroup.json
- 01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54302 and previous config saved to /var/cache/conftool/dbconfig/20231208-012051-ladsgroup.json
- 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54301 and previous config saved to /var/cache/conftool/dbconfig/20231208-010545-ladsgroup.json
- 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54300 and previous config saved to /var/cache/conftool/dbconfig/20231208-005038-ladsgroup.json
- 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bullseye
- 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bullseye
- 00:43 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bullseye
- 00:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bullseye
- 00:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54299 and previous config saved to /var/cache/conftool/dbconfig/20231208-003532-ladsgroup.json
- 00:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
- 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
- 00:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
- 00:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
- 00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
- 00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
- 00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
- 00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
- 00:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bullseye
- 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bullseye
- 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bullseye
- 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bullseye
2023-12-07
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54298 and previous config saved to /var/cache/conftool/dbconfig/20231207-235333-ladsgroup.json
- 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54297 and previous config saved to /var/cache/conftool/dbconfig/20231207-235310-ladsgroup.json
- 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
- 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
- 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
- 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
- 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:47 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:47 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54296 and previous config saved to /var/cache/conftool/dbconfig/20231207-233802-ladsgroup.json
- 23:23 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:23 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54295 and previous config saved to /var/cache/conftool/dbconfig/20231207-232256-ladsgroup.json
- 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 23:17 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:15 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54294 and previous config saved to /var/cache/conftool/dbconfig/20231207-230749-ladsgroup.json
- 23:05 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 22:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 22:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
- 22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 22:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 22:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
- 22:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
- 22:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
- 22:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
- 22:31 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
- 22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
- 22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
- 22:29 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
- 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54293 and previous config saved to /var/cache/conftool/dbconfig/20231207-222656-ladsgroup.json
- 22:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 22:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54292 and previous config saved to /var/cache/conftool/dbconfig/20231207-222633-ladsgroup.json
- 22:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
- 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
- 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
- 22:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54291 and previous config saved to /var/cache/conftool/dbconfig/20231207-221127-ladsgroup.json
- 22:10 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 22:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54290 and previous config saved to /var/cache/conftool/dbconfig/20231207-215620-ladsgroup.json
- 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54289 and previous config saved to /var/cache/conftool/dbconfig/20231207-214114-ladsgroup.json
- 21:38 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@049cf03]: (no justification provided) (duration: 00m 28s)
- 21:37 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@049cf03]: (no justification provided)
- 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1082.eqiad.wmnet with OS bullseye
- 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 21:23 jdrewniak@deploy2002: Finished scap: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873) (duration: 09m 54s)
- 21:17 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Continuing with sync
- 21:15 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 jdrewniak@deploy2002: Started scap: Backport for gerrit:980951Enable Vector beta feature for all wikis (T351339), [[gerrit:981337|[beta] ores-extension: enable revertrisk model for enwiki (T348298)]], gerrit:976911Enable action blocks in Serbian Wikipedia (T351873)
- 21:06 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: gerrit:977075Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventStreamConfig (T329718) (duration: 09m 16s)
- 21:02 dcausse: restarting blazegraph on wdqs2017 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54288 and previous config saved to /var/cache/conftool/dbconfig/20231207-205817-ladsgroup.json
- 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 20:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54287 and previous config saved to /var/cache/conftool/dbconfig/20231207-205753-ladsgroup.json
- 20:56 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Config: gerrit:977075Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventLogging config (T329718) (duration: 07m 07s)
- 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54286 and previous config saved to /var/cache/conftool/dbconfig/20231207-204247-ladsgroup.json
- 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
- 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54285 and previous config saved to /var/cache/conftool/dbconfig/20231207-202740-ladsgroup.json
- 20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54283 and previous config saved to /var/cache/conftool/dbconfig/20231207-201234-ladsgroup.json
- 20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
- 20:05 urandom: bootstrap Cassandra/restbase2030-a — T352468
- 20:02 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
- 20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:59 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:59 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
- 19:38 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
- 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
- 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54282 and previous config saved to /var/cache/conftool/dbconfig/20231207-192949-ladsgroup.json
- 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54281 and previous config saved to /var/cache/conftool/dbconfig/20231207-192926-ladsgroup.json
- 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54280 and previous config saved to /var/cache/conftool/dbconfig/20231207-191420-ladsgroup.json
- 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54279 and previous config saved to /var/cache/conftool/dbconfig/20231207-185913-ladsgroup.json
- 18:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54278 and previous config saved to /var/cache/conftool/dbconfig/20231207-184406-ladsgroup.json
- 18:42 mutante: puppetmaster1001 - revoke cert for miscweb.discovery.wmnet
- 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54277 and previous config saved to /var/cache/conftool/dbconfig/20231207-180427-ladsgroup.json
- 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 18:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1024.eqiad.wmnet
- 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1024.eqiad.wmnet
- 17:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 17:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 17:39 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 17:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 17:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
- 17:09 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:09 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
- 17:08 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
- 17:05 herron@cumin1001: START - Cookbook sre.dns.netbox
- 16:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 16:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 16:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 16:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
- 16:38 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
- 16:27 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:26 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:02 sukhe: run dummy authdns-update on dns6001
- 16:00 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop (duration: 00m 07s)
- 16:00 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop
- 15:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54274 and previous config saved to /var/cache/conftool/dbconfig/20231207-155712-arnaudb.json
- 15:55 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178]: hotfix: sqoop (duration: 10m 08s)
- 15:53 sukhe: running authdns-update with broken resolv.conf on dns6001
- 15:48 sukhe: clear out dns6001 resolv.conf to check for SSH config-based authdns-update
- 15:45 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178]: hotfix: sqoop
- 15:45 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 15:44 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54273 and previous config saved to /var/cache/conftool/dbconfig/20231207-154205-arnaudb.json
- 15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
- 15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
- 15:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
- 15:28 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
- 15:28 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
- 15:27 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
- 15:27 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
- 15:27 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
- 15:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54272 and previous config saved to /var/cache/conftool/dbconfig/20231207-152659-arnaudb.json
- 15:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
- 15:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
- 15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54271 and previous config saved to /var/cache/conftool/dbconfig/20231207-151152-arnaudb.json
- 15:08 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 15:08 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54270 and previous config saved to /var/cache/conftool/dbconfig/20231207-150750-arnaudb.json
- 15:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
- 15:07 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 15:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
- 15:07 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 15:06 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 15:06 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 15:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 15:03 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 15:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 15:01 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 15:00 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 14:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 14:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
- 14:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
- 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
- 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
- 14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
- 14:32 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 14:31 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 14:30 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 14:29 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 14:26 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 13:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 13:49 ladsgroup@deploy2002: Finished scap: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237) (duration: 10m 55s)
- 13:42 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
- 13:40 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:38 ladsgroup@deploy2002: Started scap: Backport for gerrit:980483api: Only force backlink namespace index when there is one ns only (T351237)
- 13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:32 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:32 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:31 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:31 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 13:25 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 13:25 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:25 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 13:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 13:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 13:10 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:08 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:07 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:07 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 12:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 12:52 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 12:48 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
- 12:18 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 12:18 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 12:17 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 12:17 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 12:17 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 12:16 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 12:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1001.eqiad.wmnet
- 11:51 btullis@deploy2002: Finished deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17] (duration: 03m 17s)
- 11:48 btullis@deploy2002: Started deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17]
- 11:33 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 11:33 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 11:17 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 11:17 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 11:14 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:14 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 11:13 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 11:13 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 11:12 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 11:10 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:10 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
- 11:01 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
- 10:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
- 10:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
- 10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 10:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 10:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 10:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cluster::management
- 10:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:38 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 10:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 10:34 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 10:34 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 10:33 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 10:33 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 10:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 10:27 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
- 10:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 09:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 09:41 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 09:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
- 08:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
- 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1119.eqiad.wmnet
- 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
- 06:52 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
- 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
- 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1119.eqiad.wmnet
- 06:35 marostegui: Failover m5-master from dbproxy1021 to dbproxy1027 T351864
- 00:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
- 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1081.eqiad.wmnet with OS bullseye
- 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1080.eqiad.wmnet with OS bullseye
- 00:53 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
2023-12-06
- 23:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1082.eqiad.wmnet with OS bullseye
- 23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 23:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
- 23:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
- 23:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
- 23:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
- 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
- 23:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
- 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1081.eqiad.wmnet with OS bullseye
- 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
- 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1080.eqiad.wmnet with OS bullseye
- 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1080']
- 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1082']
- 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1081']
- 22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
- 22:43 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1081']
- 22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
- 22:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1080']
- 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1082']
- 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
- 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 21:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 21:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:43 urbanecm@deploy2002: Finished scap: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339) (duration: 10m 51s)
- 21:36 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
- 21:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 21:34 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 21:33 urbanecm@deploy2002: Started scap: Backport for gerrit:980477Correct links to beta feature (T352826), gerrit:980517Beta Features: Allow Vector 2022 typography feature (T351339)
- 21:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:31 urbanecm@deploy2002: Finished scap: Backport for gerrit:980920DiscussionTools: Rename config (duration: 10m 01s)
- 21:25 urbanecm@deploy2002: esanders and urbanecm: Continuing with sync
- 21:22 urbanecm@deploy2002: esanders and urbanecm: Backport for gerrit:980920DiscussionTools: Rename config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:21 urbanecm@deploy2002: Started scap: Backport for gerrit:980920DiscussionTools: Rename config
- 21:20 urbanecm@deploy2002: Finished scap: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232) (duration: 10m 43s)
- 21:13 urbanecm@deploy2002: urbanecm and esanders: Continuing with sync
- 21:11 urbanecm@deploy2002: urbanecm and esanders: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:09 urbanecm@deploy2002: Started scap: Backport for gerrit:978531Enable DT visual enhancements on pages with __NEWSECTIONLINK__ (T352232)
- 20:55 ejegg: fundraising civicrm upgraded from 6ca683b2 to 8c107215
- 19:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wdqs1024.eqiad.wmnet
- 18:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
- 18:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
- 18:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
- 18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
- 18:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
- 17:47 ejegg: standalone SmashPig upgraded from 83d509ed to fc74ccca
- 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 17:34 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 17:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
- 17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
- 17:06 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
- 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
- 17:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
- 16:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
- 16:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 16:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
- 16:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
- 16:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
- 16:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
- 16:29 urandom: bootstrapping Cassandra/restbase2020-a — T352468
- 16:07 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job (duration: 01m 28s)
- 16:05 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job
- 15:56 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
- 15:56 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
- 15:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:46 urandom: restarting Cassandra on aqs2001-{a,b,c} (testing puppet 7 migration)
- 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: sessionstore
- 15:39 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
- 15:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
- 15:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: sessionstore
- 15:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
- 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
- 15:30 jforrester@deploy2002: Finished scap: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages (duration: 11m 33s)
- 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
- 15:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
- 15:28 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
- 15:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
- 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
- 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: restbase::production
- 15:23 sukhe: depool cp4037 for reimage testing: T350179
- 15:23 jforrester@deploy2002: jforrester: Continuing with sync
- 15:21 jforrester@deploy2002: jforrester: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
- 15:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
- 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
- 15:19 jforrester@deploy2002: Started scap: Backport for gerrit:980512Beta Features: Move ULS Compact Links to only the wikis it's enabled on, gerrit:980883Beta Features: Drop Popups, deployed everywhere for ages
- 15:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
- 15:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: restbase::production
- 15:02 moritzm: installing mariadb bugfix updates from Bookworm point release (as packaged in Debian, unrelated to wmf-mariadb packages)
- 14:43 moritzm: installing debian-archive-keyring updates from Bookworm point release
- 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dnsbox
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dnsbox
- 14:21 fabfur: repooling cp4052 after reimage (bookworm -> bullseye) due to possible impacting T352744
- 13:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4052.ulsfo.wmnet
- 13:45 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
- 13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
- 13:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4052.ulsfo.wmnet
- 13:12 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4] (duration: 00m 11s)
- 13:12 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4]
- 13:08 moritzm: installing systemd bugfix updates from Bookworm point release
- 12:52 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
- 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4041.ulsfo.wmnet
- 12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4041.ulsfo.wmnet
- 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
- 12:33 moritzm: installing pam bugfix updates from Bookworm point release
- 12:30 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
- 12:15 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
- 11:48 hnowlan: rollback changeprop-jobqueue
- 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::analytics::worker
- 11:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::analytics::worker
- 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4044.ulsfo.wmnet
- 11:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4044.ulsfo.wmnet
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4050.ulsfo.wmnet
- 10:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4050.ulsfo.wmnet
- 10:26 moritzm: installing gtk+3.0 bug fix updates from Bookworm point release
- 08:49 godog: test rsyslog version from bullseye-backports on centrallog - T351710
- 08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54264 and previous config saved to /var/cache/conftool/dbconfig/20231206-084928-arnaudb.json
- 08:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54263 and previous config saved to /var/cache/conftool/dbconfig/20231206-083422-arnaudb.json
- 08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54262 and previous config saved to /var/cache/conftool/dbconfig/20231206-081915-arnaudb.json
- 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4047.ulsfo.wmnet
- 08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54261 and previous config saved to /var/cache/conftool/dbconfig/20231206-080409-arnaudb.json
- 07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4047.ulsfo.wmnet
- 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54260 and previous config saved to /var/cache/conftool/dbconfig/20231206-075333-arnaudb.json
- 07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54259 and previous config saved to /var/cache/conftool/dbconfig/20231206-075309-arnaudb.json
- 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54258 and previous config saved to /var/cache/conftool/dbconfig/20231206-073803-arnaudb.json
- 07:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54257 and previous config saved to /var/cache/conftool/dbconfig/20231206-072256-arnaudb.json
- 07:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54256 and previous config saved to /var/cache/conftool/dbconfig/20231206-070749-arnaudb.json
- 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54255 and previous config saved to /var/cache/conftool/dbconfig/20231206-062922-arnaudb.json
- 06:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 06:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54254 and previous config saved to /var/cache/conftool/dbconfig/20231206-062859-arnaudb.json
- 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54252 and previous config saved to /var/cache/conftool/dbconfig/20231206-061352-arnaudb.json
- 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54251 and previous config saved to /var/cache/conftool/dbconfig/20231206-055846-arnaudb.json
- 05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54250 and previous config saved to /var/cache/conftool/dbconfig/20231206-054339-arnaudb.json
- 05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54249 and previous config saved to /var/cache/conftool/dbconfig/20231206-053321-arnaudb.json
- 05:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 05:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54248 and previous config saved to /var/cache/conftool/dbconfig/20231206-053256-arnaudb.json
- 05:19 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade T351616 (duration: 00m 09s)
- 05:19 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade T351616
- 05:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54247 and previous config saved to /var/cache/conftool/dbconfig/20231206-051750-arnaudb.json
- 05:09 ejegg: fundraising civicrm upgraded from 6bb8a67f to 6ca683b2
- 05:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54246 and previous config saved to /var/cache/conftool/dbconfig/20231206-050243-arnaudb.json
- 04:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54245 and previous config saved to /var/cache/conftool/dbconfig/20231206-044737-arnaudb.json
- 04:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54244 and previous config saved to /var/cache/conftool/dbconfig/20231206-043718-arnaudb.json
- 04:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 04:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54243 and previous config saved to /var/cache/conftool/dbconfig/20231206-043638-arnaudb.json
- 04:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54242 and previous config saved to /var/cache/conftool/dbconfig/20231206-042132-arnaudb.json
- 04:14 ejegg: standalone (payments listener) SmashPig upgraded from f24afba3 to 83d509ed
- 04:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54241 and previous config saved to /var/cache/conftool/dbconfig/20231206-040625-arnaudb.json
- 03:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54240 and previous config saved to /var/cache/conftool/dbconfig/20231206-035119-arnaudb.json
- 03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54239 and previous config saved to /var/cache/conftool/dbconfig/20231206-034045-arnaudb.json
- 03:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 03:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54238 and previous config saved to /var/cache/conftool/dbconfig/20231206-034022-arnaudb.json
- 03:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54237 and previous config saved to /var/cache/conftool/dbconfig/20231206-032516-arnaudb.json
- 03:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54236 and previous config saved to /var/cache/conftool/dbconfig/20231206-031009-arnaudb.json
- 02:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54235 and previous config saved to /var/cache/conftool/dbconfig/20231206-025503-arnaudb.json
- 02:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54234 and previous config saved to /var/cache/conftool/dbconfig/20231206-024108-arnaudb.json
- 02:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 02:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 02:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54233 and previous config saved to /var/cache/conftool/dbconfig/20231206-024045-arnaudb.json
- 02:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54232 and previous config saved to /var/cache/conftool/dbconfig/20231206-022538-arnaudb.json
- 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54231 and previous config saved to /var/cache/conftool/dbconfig/20231206-021031-arnaudb.json
- 02:08 eileen: civicrm upgraded from 7fb98ee8 to 6bb8a67f
- 02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54230 and previous config saved to /var/cache/conftool/dbconfig/20231206-015519-arnaudb.json
- 01:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:51 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54229 and previous config saved to /var/cache/conftool/dbconfig/20231206-014506-arnaudb.json
- 01:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 01:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 01:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54228 and previous config saved to /var/cache/conftool/dbconfig/20231206-014443-arnaudb.json
- 01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54227 and previous config saved to /var/cache/conftool/dbconfig/20231206-012936-arnaudb.json
- 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
- 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
- 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
- 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:21 eileen: civicrm upgraded from d8238788 to 7fb98ee8
- 01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54226 and previous config saved to /var/cache/conftool/dbconfig/20231206-011430-arnaudb.json
- 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
- 01:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 00:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54225 and previous config saved to /var/cache/conftool/dbconfig/20231206-005923-arnaudb.json
- 00:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54224 and previous config saved to /var/cache/conftool/dbconfig/20231206-004820-arnaudb.json
- 00:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 00:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 00:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54223 and previous config saved to /var/cache/conftool/dbconfig/20231206-004756-arnaudb.json
- 00:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54222 and previous config saved to /var/cache/conftool/dbconfig/20231206-003249-arnaudb.json
- 00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54221 and previous config saved to /var/cache/conftool/dbconfig/20231206-001742-arnaudb.json
- 00:17 ejegg: civicrm upgraded from 297a091d to d8238788
- 00:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54220 and previous config saved to /var/cache/conftool/dbconfig/20231206-000236-arnaudb.json
2023-12-05
- 23:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54219 and previous config saved to /var/cache/conftool/dbconfig/20231205-235213-arnaudb.json
- 23:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 23:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54218 and previous config saved to /var/cache/conftool/dbconfig/20231205-234425-arnaudb.json
- 23:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54217 and previous config saved to /var/cache/conftool/dbconfig/20231205-232918-arnaudb.json
- 23:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54216 and previous config saved to /var/cache/conftool/dbconfig/20231205-231412-arnaudb.json
- 22:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54215 and previous config saved to /var/cache/conftool/dbconfig/20231205-225905-arnaudb.json
- 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54214 and previous config saved to /var/cache/conftool/dbconfig/20231205-224838-arnaudb.json
- 22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54213 and previous config saved to /var/cache/conftool/dbconfig/20231205-224816-arnaudb.json
- 22:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54212 and previous config saved to /var/cache/conftool/dbconfig/20231205-223309-arnaudb.json
- 22:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54211 and previous config saved to /var/cache/conftool/dbconfig/20231205-221803-arnaudb.json
- 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54210 and previous config saved to /var/cache/conftool/dbconfig/20231205-220256-arnaudb.json
- 22:01 jforrester@deploy2002: Finished scap: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298) (duration: 19m 01s)
- 21:53 jforrester@deploy2002: ksarabia and jforrester and cjming: Continuing with sync
- 21:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54209 and previous config saved to /var/cache/conftool/dbconfig/20231205-215135-arnaudb.json
- 21:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 21:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 21:43 jforrester@deploy2002: ksarabia and jforrester and cjming: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
- 21:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
- 21:42 jforrester@deploy2002: Started scap: Backport for gerrit:977785Define the corresponding stream for scroll (T350883), gerrit:978947Add stream config for *webuiactions via Metrics Platform (T351298)
- 21:40 jforrester@deploy2002: Finished scap: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) (duration: 12m 50s)
- 21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
- 21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
- 21:34 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Continuing with sync
- 21:30 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:27 jforrester@deploy2002: Started scap: Backport for [[gerrit:979704|[Zebra] Make .vector-column-start cache compatible (T347712 T351830)]], gerrit:980467Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464)
- 21:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 21:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54208 and previous config saved to /var/cache/conftool/dbconfig/20231205-212707-arnaudb.json
- 21:27 jforrester@deploy2002: Finished scap: Backport for gerrit:980028Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339) (duration: 13m 44s)
- 21:19 jforrester@deploy2002: bwang and jforrester: Continuing with sync
- 21:13 jforrester@deploy2002: Started scap: Backport for gerrit:980028Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339)
- 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54207 and previous config saved to /var/cache/conftool/dbconfig/20231205-211200-arnaudb.json
- 21:11 jforrester@deploy2002: Finished scap: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719) (duration: 08m 45s)
- 21:04 jforrester@deploy2002: tgr and jforrester: Continuing with sync
- 21:04 jforrester@deploy2002: tgr and jforrester: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:02 jforrester@deploy2002: Started scap: Backport for gerrit:972263Revert "Do not try to use Thumbor on beta" (T344605), gerrit:980009nlwikivoyage: Drop Listings extension (T352696), gerrit:980047Drop Listings extension from Wikivoyages where unused (T352719)
- 20:58 inflatador: bking@prometheus1006 disable puppet for troubleshooting T347355
- 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54206 and previous config saved to /var/cache/conftool/dbconfig/20231205-205654-arnaudb.json
- 20:53 inflatador: bking@prometheus1006 reload prometheus-blackbox service T347355
- 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54205 and previous config saved to /var/cache/conftool/dbconfig/20231205-204147-arnaudb.json
- 20:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54204 and previous config saved to /var/cache/conftool/dbconfig/20231205-203158-arnaudb.json
- 20:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54203 and previous config saved to /var/cache/conftool/dbconfig/20231205-203136-arnaudb.json
- 20:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54202 and previous config saved to /var/cache/conftool/dbconfig/20231205-201629-arnaudb.json
- 20:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54201 and previous config saved to /var/cache/conftool/dbconfig/20231205-200123-arnaudb.json
- 19:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54200 and previous config saved to /var/cache/conftool/dbconfig/20231205-194616-arnaudb.json
- 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54199 and previous config saved to /var/cache/conftool/dbconfig/20231205-193627-arnaudb.json
- 19:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 19:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54198 and previous config saved to /var/cache/conftool/dbconfig/20231205-193604-arnaudb.json
- 19:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54197 and previous config saved to /var/cache/conftool/dbconfig/20231205-192057-arnaudb.json
- 19:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54196 and previous config saved to /var/cache/conftool/dbconfig/20231205-190551-arnaudb.json
- 18:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54195 and previous config saved to /var/cache/conftool/dbconfig/20231205-185044-arnaudb.json
- 18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54194 and previous config saved to /var/cache/conftool/dbconfig/20231205-184108-arnaudb.json
- 18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 18:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 18:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54193 and previous config saved to /var/cache/conftool/dbconfig/20231205-184045-arnaudb.json
- 18:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54192 and previous config saved to /var/cache/conftool/dbconfig/20231205-182539-arnaudb.json
- 18:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
- 18:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54191 and previous config saved to /var/cache/conftool/dbconfig/20231205-181032-arnaudb.json
- 17:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54190 and previous config saved to /var/cache/conftool/dbconfig/20231205-175526-arnaudb.json
- 17:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 17:46 vgutierrez: rolling restart of text|secondary LVS on drmrs effectively enabling IPIP encapsulation for ncredir@drmrs- T351069
- 17:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 17:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
- 17:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['testhost2001']
- 17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
- 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
- 17:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
- 17:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
- 16:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54189 and previous config saved to /var/cache/conftool/dbconfig/20231205-165503-arnaudb.json
- 16:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 16:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54188 and previous config saved to /var/cache/conftool/dbconfig/20231205-165439-arnaudb.json
- 16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
- 16:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
- 16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
- 16:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
- 16:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
- 16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54187 and previous config saved to /var/cache/conftool/dbconfig/20231205-163933-arnaudb.json
- 16:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
- 16:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
- 16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54186 and previous config saved to /var/cache/conftool/dbconfig/20231205-162426-arnaudb.json
- 16:24 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1019 - T352639
- 16:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:18 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1020 - T352639
- 16:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 16:14 samtar@deploy2002: Finished scap: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951) (duration: 07m 53s)
- 16:11 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 16:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 16:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 16:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54185 and previous config saved to /var/cache/conftool/dbconfig/20231205-160920-arnaudb.json
- 16:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 16:08 samtar@deploy2002: samtar: Continuing with sync
- 16:08 samtar@deploy2002: samtar: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:07 samtar@deploy2002: Started scap: Backport for gerrit:959327.well-known: Add F-Droid signature to assetlinks.json (T346951)
- 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
- 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
- 15:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
- 15:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54184 and previous config saved to /var/cache/conftool/dbconfig/20231205-155858-arnaudb.json
- 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54183 and previous config saved to /var/cache/conftool/dbconfig/20231205-155814-arnaudb.json
- 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:56 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 15:56 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 15:56 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 15:56 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
- 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4040.ulsfo.wmnet
- 15:49 claime: sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=inactive - T352639
- 15:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4040.ulsfo.wmnet
- 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54182 and previous config saved to /var/cache/conftool/dbconfig/20231205-154308-arnaudb.json
- 15:42 moritzm: installing monitoring-plugins bugfix updates from Bookworm point release
- 15:42 claime: Manually restarting pybal on lvs1020 - T352639
- 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
- 15:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1471.eqiad.wmnet with OS bullseye
- 15:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2005']
- 15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2005']
- 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
- 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2006']
- 15:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2006']
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
- 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54181 and previous config saved to /var/cache/conftool/dbconfig/20231205-152801-arnaudb.json
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aqs2001.codfw.wmnet
- 15:22 claime: Manually restarting pybal on lvs1019 - T352639
- 15:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 15:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 15:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 15:16 claime: Manually restarting pybal on lvs1020 - T352639
- 15:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 15:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host aqs2001.codfw.wmnet
- 15:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
- 15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54180 and previous config saved to /var/cache/conftool/dbconfig/20231205-151255-arnaudb.json
- 15:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=1) rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
- 15:11 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 15:10 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
- 15:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
- 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:06 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 15:06 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 15:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
- 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4043.ulsfo.wmnet
- 15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54179 and previous config saved to /var/cache/conftool/dbconfig/20231205-150243-arnaudb.json
- 15:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 15:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54178 and previous config saved to /var/cache/conftool/dbconfig/20231205-150220-arnaudb.json
- 15:01 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
- 14:58 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 14:58 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1471.eqiad.wmnet with OS bullseye
- 14:57 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 14:57 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 14:57 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
- 14:54 brouberol: adding k8s-ingress-dse backend to LVS - T352639
- 14:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4043.ulsfo.wmnet
- 14:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54177 and previous config saved to /var/cache/conftool/dbconfig/20231205-144714-arnaudb.json
- 14:45 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 14:45 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
- 14:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
- 14:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:41 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:40 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::master
- 14:38 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
- 14:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:32 urbanecm@deploy2002: Finished scap: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234) (duration: 08m 55s)
- 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54176 and previous config saved to /var/cache/conftool/dbconfig/20231205-143207-arnaudb.json
- 14:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::master
- 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
- 14:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 14:26 urbanecm@deploy2002: kharlan and urbanecm: Continuing with sync
- 14:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 14:25 urbanecm@deploy2002: kharlan and urbanecm: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 14:23 urbanecm@deploy2002: Started scap: Backport for gerrit:979698User impact: update quantizeViews to process small series of view data (T352349), gerrit:979700Add maintenance script to import existing files to scan table (T350863), gerrit:979701Only allow drawing and bitmap media types to be scanned (T352234)
- 14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54175 and previous config saved to /var/cache/conftool/dbconfig/20231205-141701-arnaudb.json
- 14:13 urbanecm@deploy2002: Finished scap: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266) (duration: 09m 33s)
- 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54174 and previous config saved to /var/cache/conftool/dbconfig/20231205-140742-arnaudb.json
- 14:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 14:07 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54173 and previous config saved to /var/cache/conftool/dbconfig/20231205-140720-arnaudb.json
- 14:06 urbanecm@deploy2002: urbanecm: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 14:05 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 14:04 urbanecm@deploy2002: Started scap: Backport for gerrit:980357Growth: Enable Welcome survey user research for ar/en/es (T351266)
- 14:03 moritzm: installing cups security updates
- 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54172 and previous config saved to /var/cache/conftool/dbconfig/20231205-135213-arnaudb.json
- 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4048.ulsfo.wmnet
- 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1078.eqiad.wmnet with OS bullseye
- 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1079.eqiad.wmnet with OS bullseye
- 13:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:48 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1470.eqiad.wmnet with OS bullseye
- 13:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:43 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1465.eqiad.wmnet with OS bullseye
- 13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4048.ulsfo.wmnet
- 13:38 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1464.eqiad.wmnet with OS bullseye
- 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54171 and previous config saved to /var/cache/conftool/dbconfig/20231205-133706-arnaudb.json
- 13:30 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
- 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
- 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1076.eqiad.wmnet with OS bullseye
- 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:26 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
- 13:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 13:24 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
- 13:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
- 13:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
- 13:23 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
- 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54169 and previous config saved to /var/cache/conftool/dbconfig/20231205-132200-arnaudb.json
- 13:21 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
- 13:21 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
- 13:18 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
- 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::slave
- 13:14 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1470.eqiad.wmnet with OS bullseye
- 13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54168 and previous config saved to /var/cache/conftool/dbconfig/20231205-131240-arnaudb.json
- 13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 13:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
- 13:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
- 13:08 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1465.eqiad.wmnet with OS bullseye
- 13:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
- 13:06 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS bullseye
- 13:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1464.eqiad.wmnet with OS bullseye
- 13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
- 13:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 13:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
- 13:04 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:04 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
- 13:03 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:02 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1463.eqiad.wmnet with OS bullseye
- 12:59 cmooney@cumin2002: START - Cookbook sre.dns.netbox
- 12:58 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS bullseye
- 12:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::slave
- 12:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 12:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54167 and previous config saved to /var/cache/conftool/dbconfig/20231205-125641-arnaudb.json
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4042.ulsfo.wmnet
- 12:50 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS bullseye
- 12:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
- 12:47 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
- 12:47 ladsgroup@deploy2002: Finished scap: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237) (duration: 12m 30s)
- 12:45 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 12:45 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS bullseye
- 12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 12:44 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
- 12:42 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
- 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54165 and previous config saved to /var/cache/conftool/dbconfig/20231205-124134-arnaudb.json
- 12:41 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
- 12:40 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:39 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
- 12:37 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:36 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
- 12:34 ladsgroup@deploy2002: Started scap: Backport for gerrit:980370Set migration of pagelinks on large wikis of s5 to read new (T351237)
- 12:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4042.ulsfo.wmnet
- 12:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
- 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4051.ulsfo.wmnet
- 12:28 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1463.eqiad.wmnet with OS bullseye
- 12:28 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
- 12:27 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 12:26 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
- 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54164 and previous config saved to /var/cache/conftool/dbconfig/20231205-122628-arnaudb.json
- 12:26 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 12:25 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS bullseye
- 12:24 moritzm: installing unbound bugfix updates from Bookworm point release
- 12:23 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
- 12:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4051.ulsfo.wmnet
- 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4039.ulsfo.wmnet
- 12:18 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS bullseye
- 12:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54163 and previous config saved to /var/cache/conftool/dbconfig/20231205-121121-arnaudb.json
- 12:10 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS bullseye
- 12:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS bullseye
- 12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4039.ulsfo.wmnet
- 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54162 and previous config saved to /var/cache/conftool/dbconfig/20231205-120206-arnaudb.json
- 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54161 and previous config saved to /var/cache/conftool/dbconfig/20231205-120145-arnaudb.json
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4049.ulsfo.wmnet
- 11:53 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 11:52 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 11:51 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 11:51 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 11:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4049.ulsfo.wmnet
- 11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54160 and previous config saved to /var/cache/conftool/dbconfig/20231205-114638-arnaudb.json
- 11:40 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 11:40 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 11:40 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 11:40 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 11:38 ladsgroup@deploy2002: Finished scap: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604) (duration: 07m 47s)
- 11:33 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 11:32 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 11:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:32 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54159 and previous config saved to /var/cache/conftool/dbconfig/20231205-113132-arnaudb.json
- 11:30 ladsgroup@deploy2002: Started scap: Backport for gerrit:979920Bump ParserCache TTL back to 30 days (T280604)
- 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bookworm
- 11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54158 and previous config saved to /var/cache/conftool/dbconfig/20231205-111625-arnaudb.json
- 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
- 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
- 11:08 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 11:08 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 11:07 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 11:07 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
- 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54157 and previous config saved to /var/cache/conftool/dbconfig/20231205-110448-arnaudb.json
- 11:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 11:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54156 and previous config saved to /var/cache/conftool/dbconfig/20231205-110426-arnaudb.json
- 11:02 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1002.eqiad.wmnet with OS bookworm
- 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bookworm
- 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54155 and previous config saved to /var/cache/conftool/dbconfig/20231205-104919-arnaudb.json
- 10:45 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54154 and previous config saved to /var/cache/conftool/dbconfig/20231205-103413-arnaudb.json
- 10:21 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
- 10:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
- 10:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54153 and previous config saved to /var/cache/conftool/dbconfig/20231205-101906-arnaudb.json
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54152 and previous config saved to /var/cache/conftool/dbconfig/20231205-100744-arnaudb.json
- 10:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54151 and previous config saved to /var/cache/conftool/dbconfig/20231205-100722-arnaudb.json
- 10:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15305
- 10:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
- 10:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15305
- 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
- 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63927
- 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54150 and previous config saved to /var/cache/conftool/dbconfig/20231205-095215-arnaudb.json
- 09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63927
- 09:42 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
- 09:37 brouberol: running authdns-update on dns1004.wikimedia.org - T352639
- 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54149 and previous config saved to /var/cache/conftool/dbconfig/20231205-093709-arnaudb.json
- 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54148 and previous config saved to /var/cache/conftool/dbconfig/20231205-092202-arnaudb.json
- 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54147 and previous config saved to /var/cache/conftool/dbconfig/20231205-091232-arnaudb.json
- 09:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
- 09:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
- 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58952
- 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58952
- 09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 09:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 08:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:26 marostegui: Failover m2-master dbproxy1023.eqiad.wmnet -> dbproxy1025.eqiad.wmnet T351864
- 06:55 vgutierrez: rolling restart of text|secondary LVS on eqsin effectively enabling IPIP encapsulation for ncredir@eqsin - T351069
- 06:23 marostegui: Failover m5 from db1119 to db1176 - T352631
- 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
- 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
- 01:18 mutante: LDAP - added user xqt to group nda (T348520)
- 01:12 ejegg: payments-wiki upgraded from 5284fc99 to 1d24dc90
- 00:06 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet
2023-12-04
- 23:53 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet
- 23:52 eevans@cumin1001: START - Cookbook sre.puppet.migrate-host for host restbase2028.codfw.wmnet
- 22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54146 and previous config saved to /var/cache/conftool/dbconfig/20231204-225336-arnaudb.json
- 22:53 eileen: civicrm upgraded from 83816165 to 297a091d
- 22:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54145 and previous config saved to /var/cache/conftool/dbconfig/20231204-223830-arnaudb.json
- 22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54144 and previous config saved to /var/cache/conftool/dbconfig/20231204-222323-arnaudb.json
- 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54142 and previous config saved to /var/cache/conftool/dbconfig/20231204-220817-arnaudb.json
- 22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54141 and previous config saved to /var/cache/conftool/dbconfig/20231204-220345-arnaudb.json
- 22:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 22:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54140 and previous config saved to /var/cache/conftool/dbconfig/20231204-220322-arnaudb.json
- 21:58 ebernhardson@deploy2002: Finished scap: Backport for gerrit:979693Always load transcode state from db when opting in to primary db (duration: 08m 37s)
- 21:52 ebernhardson@deploy2002: ebernhardson and brion: Continuing with sync
- 21:51 ebernhardson@deploy2002: ebernhardson and brion: Backport for gerrit:979693Always load transcode state from db when opting in to primary db synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:50 ebernhardson@deploy2002: Started scap: Backport for gerrit:979693Always load transcode state from db when opting in to primary db
- 21:49 ebernhardson@deploy2002: Finished scap: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335) (duration: 09m 23s)
- 21:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54138 and previous config saved to /var/cache/conftool/dbconfig/20231204-214816-arnaudb.json
- 21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
- 21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
- 21:42 ebernhardson@deploy2002: ebernhardson: Continuing with sync
- 21:41 ebernhardson@deploy2002: ebernhardson: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:39 ebernhardson@deploy2002: Started scap: Backport for gerrit:979155cirrus: Enable event bus bridge on more wikis (T352335)
- 21:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54137 and previous config saved to /var/cache/conftool/dbconfig/20231204-213309-arnaudb.json
- 21:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 21:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1077.eqiad.wmnet with OS bullseye
- 21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
- 21:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54136 and previous config saved to /var/cache/conftool/dbconfig/20231204-211803-arnaudb.json
- 21:14 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
- 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54135 and previous config saved to /var/cache/conftool/dbconfig/20231204-211305-arnaudb.json
- 21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54134 and previous config saved to /var/cache/conftool/dbconfig/20231204-211241-arnaudb.json
- 21:09 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
- 21:06 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
- 20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54133 and previous config saved to /var/cache/conftool/dbconfig/20231204-205735-arnaudb.json
- 20:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
- 20:50 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
- 20:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54132 and previous config saved to /var/cache/conftool/dbconfig/20231204-204228-arnaudb.json
- 20:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
- 20:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54131 and previous config saved to /var/cache/conftool/dbconfig/20231204-202722-arnaudb.json
- 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
- 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
- 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
- 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
- 19:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54130 and previous config saved to /var/cache/conftool/dbconfig/20231204-194103-arnaudb.json
- 19:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 19:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 19:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54129 and previous config saved to /var/cache/conftool/dbconfig/20231204-194039-arnaudb.json
- 19:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54128 and previous config saved to /var/cache/conftool/dbconfig/20231204-192532-arnaudb.json
- 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
- 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
- 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
- 19:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
- 19:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54126 and previous config saved to /var/cache/conftool/dbconfig/20231204-191026-arnaudb.json
- 19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
- 19:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
- 19:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
- 18:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54125 and previous config saved to /var/cache/conftool/dbconfig/20231204-185519-arnaudb.json
- 18:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
- 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
- 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
- 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
- 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54124 and previous config saved to /var/cache/conftool/dbconfig/20231204-184630-arnaudb.json
- 18:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 18:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54123 and previous config saved to /var/cache/conftool/dbconfig/20231204-184607-arnaudb.json
- 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54122 and previous config saved to /var/cache/conftool/dbconfig/20231204-183100-arnaudb.json
- 18:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54121 and previous config saved to /var/cache/conftool/dbconfig/20231204-181554-arnaudb.json
- 18:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
- 18:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54120 and previous config saved to /var/cache/conftool/dbconfig/20231204-180047-arnaudb.json
- 17:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
- 17:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
- 17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54119 and previous config saved to /var/cache/conftool/dbconfig/20231204-175448-arnaudb.json
- 17:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
- 17:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
- 17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54118 and previous config saved to /var/cache/conftool/dbconfig/20231204-175426-arnaudb.json
- 17:41 ladsgroup@deploy2002: Finished scap: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628) (duration: 08m 07s)
- 17:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54117 and previous config saved to /var/cache/conftool/dbconfig/20231204-173919-arnaudb.json
- 17:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:34 ladsgroup@deploy2002: ladsgroup: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:33 ladsgroup@deploy2002: Started scap: Backport for gerrit:979692Category: Stop locking thousands of rows (T352628)
- 17:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54116 and previous config saved to /var/cache/conftool/dbconfig/20231204-172413-arnaudb.json
- 17:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
- 17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 17:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
- 17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
- 17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
- 17:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
- 17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
- 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
- 17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
- 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
- 17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
- 17:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
- 17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
- 17:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
- 17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54115 and previous config saved to /var/cache/conftool/dbconfig/20231204-170906-arnaudb.json
- 17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
- 17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
- 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54114 and previous config saved to /var/cache/conftool/dbconfig/20231204-170604-arnaudb.json
- 17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 17:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54113 and previous config saved to /var/cache/conftool/dbconfig/20231204-170525-arnaudb.json
- 16:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: gerrit:979990 Bumping portals to master (T128546) (duration: 07m 45s)
- 16:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54112 and previous config saved to /var/cache/conftool/dbconfig/20231204-165018-arnaudb.json
- 16:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33604
- 16:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33604
- 16:44 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:979990 Bumping portals to master (T128546) (duration: 06m 40s)
- 16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54111 and previous config saved to /var/cache/conftool/dbconfig/20231204-163511-arnaudb.json
- 16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54110 and previous config saved to /var/cache/conftool/dbconfig/20231204-162005-arnaudb.json
- 16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54109 and previous config saved to /var/cache/conftool/dbconfig/20231204-161408-arnaudb.json
- 16:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
- 16:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
- 16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54108 and previous config saved to /var/cache/conftool/dbconfig/20231204-161346-arnaudb.json
- 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54107 and previous config saved to /var/cache/conftool/dbconfig/20231204-155840-arnaudb.json
- 15:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
- 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54105 and previous config saved to /var/cache/conftool/dbconfig/20231204-154333-arnaudb.json
- 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54104 and previous config saved to /var/cache/conftool/dbconfig/20231204-152826-arnaudb.json
- 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1077']
- 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1078']
- 15:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
- 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
- 15:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1077']
- 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1078']
- 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
- 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4046.ulsfo.wmnet
- 14:51 vgutierrez: upload tcp-mss-clamper 0.4 to apt.wm.o (bookworm)
- 14:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
- 14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
- 14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
- 14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4046.ulsfo.wmnet
- 14:46 Lucas_WMDE: UTC afternoon backport+config window done
- 14:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) (duration: 11m 48s)
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4038.ulsfo.wmnet
- 14:43 sukhe: running authdns-update for CR 979976 [revert of T349665]
- 14:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Continuing with sync
- 14:37 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4038.ulsfo.wmnet
- 14:36 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:34 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:977196Create new namespaces and namespace aliases for bd.wikimedia.org (T351903)
- 14:33 sukhe: running authdns-update for T352579
- 14:32 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829) (duration: 10m 42s)
- 14:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 14:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54103 and previous config saved to /var/cache/conftool/dbconfig/20231204-142754-arnaudb.json
- 14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
- 14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
- 14:25 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
- 14:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 14:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 14:22 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for gerrit:979914Enable read new for event tables migration on testwiki (T341829)
- 14:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54102 and previous config saved to /var/cache/conftool/dbconfig/20231204-141848-arnaudb.json
- 14:15 jforrester@deploy2002: Finished scap: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) (duration: 07m 41s)
- 14:10 jforrester@deploy2002: jforrester and terasail: Continuing with sync
- 14:09 jforrester@deploy2002: jforrester and terasail: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:08 jforrester@deploy2002: Started scap: Backport for gerrit:979362wikifunctionswiki: Disable thumbnail in Vector search (T352532), gerrit:979180wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495)
- 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54101 and previous config saved to /var/cache/conftool/dbconfig/20231204-140341-arnaudb.json
- 13:59 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 13:59 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 13:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 13:57 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 13:56 moritzm: installing postgresql-13 security updates
- 13:52 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54100 and previous config saved to /var/cache/conftool/dbconfig/20231204-134835-arnaudb.json
- 13:43 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54099 and previous config saved to /var/cache/conftool/dbconfig/20231204-133328-arnaudb.json
- 13:30 moritzm: instaling dbus security updates on buster
- 13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54098 and previous config saved to /var/cache/conftool/dbconfig/20231204-132859-arnaudb.json
- 13:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54097 and previous config saved to /var/cache/conftool/dbconfig/20231204-132836-arnaudb.json
- 13:22 moritzm: installing libde265 security updates
- 13:22 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54096 and previous config saved to /var/cache/conftool/dbconfig/20231204-131329-arnaudb.json
- 13:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:05 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 13:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 12:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54095 and previous config saved to /var/cache/conftool/dbconfig/20231204-125823-arnaudb.json
- 12:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54094 and previous config saved to /var/cache/conftool/dbconfig/20231204-124316-arnaudb.json
- 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54093 and previous config saved to /var/cache/conftool/dbconfig/20231204-124037-arnaudb.json
- 12:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 12:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54092 and previous config saved to /var/cache/conftool/dbconfig/20231204-124015-arnaudb.json
- 12:35 urbanecm@deploy2002: Finished scap: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) (duration: 09m 43s)
- 12:29 urbanecm@deploy2002: urbanecm: Continuing with sync
- 12:28 urbanecm@deploy2002: urbanecm: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-druid1005.eqiad.wmnet
- 12:25 urbanecm@deploy2002: Started scap: Backport for gerrit:979690User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)
- 12:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54091 and previous config saved to /var/cache/conftool/dbconfig/20231204-122508-arnaudb.json
- 12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-druid1005.eqiad.wmnet
- 12:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bookworm
- 12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54090 and previous config saved to /var/cache/conftool/dbconfig/20231204-121002-arnaudb.json
- 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host druid1011.eqiad.wmnet
- 12:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host druid1011.eqiad.wmnet
- 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
- 11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54089 and previous config saved to /var/cache/conftool/dbconfig/20231204-115455-arnaudb.json
- 11:54 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS bullseye
- 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
- 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54088 and previous config saved to /var/cache/conftool/dbconfig/20231204-115217-arnaudb.json
- 11:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 11:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54087 and previous config saved to /var/cache/conftool/dbconfig/20231204-115154-arnaudb.json
- 11:51 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1462.eqiad.wmnet with OS bullseye
- 11:43 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 11:43 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 11:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 44592
- 11:42 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 44592
- 11:42 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 11:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:39 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bookworm
- 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54086 and previous config saved to /var/cache/conftool/dbconfig/20231204-113648-arnaudb.json
- 11:36 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
- 11:33 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
- 11:32 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
- 11:30 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
- 11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54085 and previous config saved to /var/cache/conftool/dbconfig/20231204-112141-arnaudb.json
- 11:17 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1462.eqiad.wmnet with OS bullseye
- 11:15 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS bullseye
- 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: eventschemas::service
- 11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54084 and previous config saved to /var/cache/conftool/dbconfig/20231204-110635-arnaudb.json
- 11:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54083 and previous config saved to /var/cache/conftool/dbconfig/20231204-110156-arnaudb.json
- 11:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54082 and previous config saved to /var/cache/conftool/dbconfig/20231204-110134-arnaudb.json
- 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: eventschemas::service
- 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
- 10:50 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
- 10:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
- 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54081 and previous config saved to /var/cache/conftool/dbconfig/20231204-104628-arnaudb.json
- 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23856
- 10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23856
- 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63927
- 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63927
- 10:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31898
- 10:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31898
- 10:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58952
- 10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58952
- 10:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 44592
- 10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 44592
- 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4800
- 10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4800
- 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 33604
- 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 33604
- 10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 142505
- 10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 142505
- 10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398446
- 10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398446
- 10:32 jayme: upgrade istio (buster -> bullseye) on wikikube codfw - T351933
- 10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15305
- 10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15305
- 10:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19165
- 10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54080 and previous config saved to /var/cache/conftool/dbconfig/20231204-103121-arnaudb.json
- 10:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19165
- 10:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 237
- 10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 237
- 10:28 jayme: pgrade istio (buster -> bullseye) on wikikube eqiad - T351933
- 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
- 10:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
- 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bookworm
- 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
- 10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138997
- 10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54079 and previous config saved to /var/cache/conftool/dbconfig/20231204-101615-arnaudb.json
- 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54078 and previous config saved to /var/cache/conftool/dbconfig/20231204-101143-arnaudb.json
- 10:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 10:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54077 and previous config saved to /var/cache/conftool/dbconfig/20231204-101120-arnaudb.json
- 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
- 09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
- 09:58 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 09:57 godog: roll-restart prometheus/k8s to apply size-based retention - T351179
- 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54076 and previous config saved to /var/cache/conftool/dbconfig/20231204-095614-arnaudb.json
- 09:49 volans@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54075 and previous config saved to /var/cache/conftool/dbconfig/20231204-094107-arnaudb.json
- 09:36 elukey: upgrade istio (buster -> bullseye) on ml-serve-codfw - T351933
- 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54074 and previous config saved to /var/cache/conftool/dbconfig/20231204-092600-arnaudb.json
- 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54073 and previous config saved to /var/cache/conftool/dbconfig/20231204-092136-arnaudb.json
- 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54072 and previous config saved to /var/cache/conftool/dbconfig/20231204-092054-arnaudb.json
- 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54070 and previous config saved to /var/cache/conftool/dbconfig/20231204-090547-arnaudb.json
- 08:58 elukey: upgrade istio (buster -> bullseye) on ml-serve-eqiad - T351933
- 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54069 and previous config saved to /var/cache/conftool/dbconfig/20231204-085041-arnaudb.json
- 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
- 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
- 08:48 elukey: upgrade istio (buster -> bullseye) on aux-k8s-eqiad - T351933
- 08:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
- 08:43 elukey: upgrade istio (buster -> bullseye) on dse-k8s-eqiad - T351933
- 08:39 urbanecm@deploy2002: Finished scap: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329) (duration: 09m 49s)
- 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54068 and previous config saved to /var/cache/conftool/dbconfig/20231204-083534-arnaudb.json
- 08:33 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
- 08:31 urbanecm@deploy2002: urbanecm and anzx: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54067 and previous config saved to /var/cache/conftool/dbconfig/20231204-083102-arnaudb.json
- 08:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
- 08:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
- 08:29 urbanecm@deploy2002: Started scap: Backport for gerrit:979686hewikivoyage: add tagline (T351981), gerrit:979223azwiki: Enable $wgMinervaEnableSiteNotice (T352621), gerrit:978522trwikivoyage: update wordmark (T352329)
- 08:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 08:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54066 and previous config saved to /var/cache/conftool/dbconfig/20231204-082758-arnaudb.json
- 08:25 oblivian@deploy2002: Finished scap: Backport for gerrit:979488Add throttle rule for editathon (T352569) (duration: 18m 04s)
- 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
- 08:23 _joe_: clearing throttle cache for T352569
- 08:18 oblivian@deploy2002: oblivian: Continuing with sync
- 08:17 oblivian@deploy2002: oblivian: Backport for gerrit:979488Add throttle rule for editathon (T352569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54065 and previous config saved to /var/cache/conftool/dbconfig/20231204-081251-arnaudb.json
- 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
- 08:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
- 08:07 oblivian@deploy2002: Started scap: Backport for gerrit:979488Add throttle rule for editathon (T352569)
- 07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54064 and previous config saved to /var/cache/conftool/dbconfig/20231204-075745-arnaudb.json
- 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
- 07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54063 and previous config saved to /var/cache/conftool/dbconfig/20231204-074238-arnaudb.json
- 07:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54062 and previous config saved to /var/cache/conftool/dbconfig/20231204-073957-arnaudb.json
- 07:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
- 07:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
- 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bookworm
- 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 07:07 kart_: Updated MinT to 2023-11-21-115852-production
- 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bookworm
- 06:57 marostegui: Failover m5 from db1176 to db1119 - T332155
- 06:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
- 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
- 06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 06:33 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 06:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 06:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 06:11 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 06:08 kart_: Updated cxserver to 2023-12-04-055024-production (T270060, T350773, T352620)
- 06:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:05 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:03 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:02 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:59 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:58 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 04:43 ryankemper: [WDQS] Clearing `BlazegraphFreeAllocatorsDecreasingRapidly` -> `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
- 00:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1006.eqiad.wmnet
- 00:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.eqiad.wmnet
2023-12-02
- 01:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1078.eqiad.wmnet with OS bullseye
- 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1079.eqiad.wmnet with OS bullseye
- 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1077.eqiad.wmnet with OS bullseye
- 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1076.eqiad.wmnet with OS bullseye
- 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
- 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
- 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
- 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
- 00:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
- 00:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
- 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
2023-12-01
- 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 22:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
- 22:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:31 cstone: payments-wiki upgraded from b37ab50e to 5284fc99
- 19:35 inflatador: bking@wdqs1006 rebooting unresponsive host
- 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
- 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ceph2001.codfw.wmnet with OS bullseye
- 16:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
- 16:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
- 16:26 dancy@deploy2002: Installation of scap version "4.65.0" completed for 537 hosts
- 16:26 dancy@deploy2002: Installing scap version "4.65.0" for 537 hosts
- 16:25 dancy@deploy2002: install-world aborted: (duration: 00m 50s)
- 16:24 dancy@deploy2002: Installing scap version "4.65.0" for 569 hosts
- 16:24 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1046.eqiad.wmnet
- 16:10 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1046.eqiad.wmnet
- 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
- 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
- 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
- 16:00 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
- 15:58 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 15:58 akosiaris: give AAAA and PTR records to scandium T271142
- 15:57 akosiaris: give AAAA and PTR records to all rdb hosts (only 50% had it previously)
- 15:56 dancy@deploy2002: Installing scap version "4.65.0" for 570 hosts
- 15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
- 15:54 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
- 15:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1009-1010].eqiad.wmnet
- 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
- 15:50 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
- 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
- 15:45 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
- 15:42 urbanecm: mwmaint2002: mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=frwiki # T352550
- 15:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:36 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 24s)
- 15:31 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:28 moritzm: added Kamila to pwstore
- 15:21 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1009-1010].eqiad.wmnet
- 15:19 topranks: moving esams CR interconnect to 4x10G breakout cable T347403
- 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 14:26 akosiaris: cleanup rdb1009 from all deployment charts
- 14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 14:25 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 14:25 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 14:20 hashar@deploy2002: Finished deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library (duration: 00m 05s)
- 14:20 hashar@deploy2002: Started deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library
- 14:18 hashar@deploy2002: Finished deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom (duration: 00m 06s)
- 14:18 hashar@deploy2002: Started deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom
- 14:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 14:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 13:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 13:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 13:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 13:31 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 13:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 13:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 13:28 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 13:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 13:27 taavi: run prometheus provision-fs on prometheus2* to create file system for cloud instance T350010
- 13:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 13:13 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 12:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 12:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flerovium.eqiad.wmnet
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:34 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
- 12:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts flerovium.eqiad.wmnet
- 12:17 XioNoX: add BGP custom field to Netbox - T306649
- 12:07 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
- 12:03 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
- 12:03 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
- 12:02 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
- 11:49 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
- 11:30 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
- 11:30 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
- 11:29 topranks: Reset card 1/0 in cr1-codfw T350159
- 11:22 topranks: Disabling BGP peering to AS1299 prior to reset of card 1/0 in cr1-codfw T350159
- 11:09 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
- 11:09 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
- 11:04 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
- 11:04 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
- 11:00 topranks: Draining cr1-codfw transport to cr3-eqsin to reset card 1/0 T350159
- 10:59 topranks: Resetting circuit preference for transports landing on card 1/1 cr1-codfw T350159
- 10:55 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 10:49 moritzm: installing wireshark security updates on bookworm
- 10:37 topranks: Moving VRRP acrtive gateway for codfw row A/B vlans from cr1-codfw to cr2-codfw to reconfigure card 1/1 T350159
- 10:35 topranks: draining codfw<->eqiad transport link to reconfigure card 1/1 in cr1-codfw T350159
- 10:34 topranks: draining codfw<->eqdfw transport link to reconfigure card 1/1 in cr1-codfw T350159
- 10:30 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 12s)
- 10:08 godog: add 60GB to prometheus/k8s in codfw
- 09:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
- 09:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
- 09:45 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
- 09:44 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
- 09:20 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 09:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:59 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 08:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
- 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bookworm
- 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
- 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
- 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bookworm
- 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2135.codfw.wmnet with OS bookworm
- 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
- 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
- 05:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2135.codfw.wmnet with OS bookworm
- 05:37 marostegui: Failover m3 from db1119 to db1159 - T352360
- 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
- 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
- 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2109.codfw.wmnet with OS bookworm
- 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2107.codfw.wmnet with OS bookworm
- 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2108.codfw.wmnet with OS bookworm
- 02:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2106.codfw.wmnet with OS bookworm
- 02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2105.codfw.wmnet with OS bookworm
- 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
- 02:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
- 02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
- 02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
- 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
- 01:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
- 01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
- 01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2003']
- 01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2001']
- 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
- 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
- 01:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
- 01:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2109.codfw.wmnet with OS bookworm
- 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2108.codfw.wmnet with OS bookworm
- 01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
- 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2104.codfw.wmnet with OS bookworm
- 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
- 01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2107.codfw.wmnet with OS bookworm
- 01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
- 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
- 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
- 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
- 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2003']
- 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2002']
- 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2001']
- 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
- 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
- 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
- 01:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2106.codfw.wmnet with OS bookworm
- 01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2103.codfw.wmnet with OS bookworm
- 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2105.codfw.wmnet with OS bookworm
- 01:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2102.codfw.wmnet with OS bookworm
- 01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2100.codfw.wmnet with OS bookworm
- 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2101.codfw.wmnet with OS bookworm
- 01:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
- 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
- 01:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
- 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
- 01:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
- 01:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
- 01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 01:14 foks: removing 120 files for legal compliance
- 01:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2103.codfw.wmnet with reason: host reimage
- 01:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
- 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2102.codfw.wmnet with reason: host reimage
- 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
- 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
- 01:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
- 00:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2104.codfw.wmnet with OS bookworm
- 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2103.codfw.wmnet with OS bookworm
- 00:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2102.codfw.wmnet with OS bookworm
- 00:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2101.codfw.wmnet with OS bookworm
- 00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2100.codfw.wmnet with OS bookworm
- 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2098.codfw.wmnet with OS bookworm
- 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2099.codfw.wmnet with OS bookworm
- 00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2097.codfw.wmnet with OS bookworm
- 00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2094.codfw.wmnet with OS bookworm
- 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
- 00:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
- 00:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
- 00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
- 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1105.eqiad.wmnet with OS bookworm
- 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
- 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
- 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
- 00:01 krinkle@deploy2002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 06m 37s)
- 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020s
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.