-1

Special events such as BFCM (Black Friday / Cyber Monday).

  • Hi Gustavo, welcome to DevOps SE. It's going to be a little hard to answer your question as it is; is there any further information you can provide to help us understand what you're trying to achieve? See https://devops.stackexchange.com/help/how-to-ask for some tips; in particular, knowing what you've come up with so far will help us more specifically hone in on what you're after. – Tim Malone Mar 24 '19 at 06:52
  • Hi Tim, thanks for the reference. I'm trying to establish how SREs in other companies (other than mine/Google) approach things such as https://cloud.google.com/blog/products/management-tools/tune-up-your-sli-metrics-cre-life-lessons. – Gustavo Franco Mar 25 '19 at 04:05

1 Answers1

1

it sounds as though the concept of "burst limits" may apply here. If high reliability is needed during foreseen special events of a generally similar duration, it probably is also needed during emerging events too. I suggest broad timescale SLOs (e.g. 99.9% successful responses per month) plus sliding SLOs based on demand, or fixed-limit SLOs instead of rates. An example might be, 99.95% successful responses at 1000qps, 99.99% success at 5000qps. Or, 99.9% successful responses, and no more than 5 failures per second regardless of load.