1300 130 447
make a knowledge base
available to end users
Source - HDI Report
About the Knowledge Base
Search all the Knowledge Base
Testimonial: I have found that the new HDAA Knowledge Base reduces the time it takes me to research industry stats & reliable information for the ITSM sector. It’s easy to use search functionality encompassing KCS principles, helps to filter & tailor my searches more accurately & there are numerous new services now available through the website. Every time I return to the site there is new information published. Very impressive.
Chris Powderly, Support & Services Manager, Allens
supportworld , service management , ITSM
No Result Found
Somewhere along the way, birds got a bad reputation. Not as cute as kittens or playful as puppies, birds are...boring. Before the ornithologists out there get too upset, I want to point out that “boring” isn’t bad. In fact, for incident response, “boring” is actually a pretty desirable characteristic because the last thing your business needs is an encounter with a black swan.
How do you get from “black swan” to “boring?” Many companies pursue an aggressive strategy for modernization, bringing in modern tools like Slack and PagerDuty and expecting transformation to just happen. And while many people, including some ITIL supporters, view modernization as a worthless exercise (i.e., it’s “for the birds”), I cling to the Oxford definition of modernization:
The process of adapting something to modern needs or habits.
In other words, if your business isn’t being impacted with modern needs or habits, don’t modernize.
On the other hand, if your customer demands are increasing, or your organization is transforming to reach customers differently, or your operational complexity is rising as more developers get involved, then let me share a couple of important ways to keep incident response as boring as possible. These techniques are based on PagerDuty’s open-sourced incident response documentation.
When US Airways Flight 1549 struck a flock of geese leaving LaGuardia airport, Captain Chesley “Sully” Sullenberger had to make a quick decision based on how bad things were. As the dialog goes:
Captain Sully: Mayday, mayday, mayday, this is Cactus 1549. Hit birds, we’ve lost thrust in both engines. We’re turning back towards LaGuardia.
Air Traffic Controller: Okay, you need to return to LaGuardia?
Captain Sully: We’re unable [to land]. We may end up on the Hudson.
Note that it wasn’t the air traffic controller making that call; Captain Sully was the expert on the situation at hand. This is similar to DevOps culture, which advocates for service ownership and makes the experts (the developers themselves) accountable for the customer’s experience with their service. The DevOps model of ownership greatly improves triage and tightens feedback loops when responding to customer-impacting or business-impacting issues. In contrast, in IT operations, there often seems to be a desire to put triage teams or Level 1 support teams as the first point of contact.
Another element to simplifying triage is measuring what impacts your business. Developers and operations engineers often get caught responding to metrics that lack outside-in context, such as CPU load or API responsiveness. While those can be helpful in diagnosing a problem, they won’t help you triage.
At PagerDuty, one of our most critical capabilities is ensuring that customers receive timely notifications. As a result, one of our key metrics is measuring end-to-end notification latency. If our notification latency starts to creep up, we jump into incident response (SEV-2) immediately, even when all of our servers seem to be behaving normally. Meanwhile, for Amazon, that relevant business metric might be “orders per second” instead of notifications.
Your metric might be different—the key thing is to choose a metric that reflects your business and your customers’ expectations. When you can glance at that metric in your triage process and get a quick gauge of what's going on, it will speed up your ability to answer questions about an incident’s business impact.
While triage is a nuanced process with many inputs, don’t forget that the output should be a decision on how to respond. If the business metrics say it’s a SEV-2 but it feels like a SEV-1, then respond as if it’s a SEV-1. Take command by keeping everyone focused on mitigating the customer impact. Additionally, when training incident commanders, decision-making is a critical point of emphasis—one of the easiest ways to waste time on a response call is to discuss incident severities.
You may have noticed that I haven’t used the term “incident manager” here, which I see frequently in centralized operational models. Instead, PagerDuty took inspiration from the National Incident Management System (NIMS) model, which is considered state of art when it comes to incident response, albeit in a rather different vertical than IT operations. Using that model, we defined our roles to ensure that the incident commander could be most effective; in contrast, most other models typically have incident managers focus more on scribe and communications liaison activities.
In addition to keeping everyone focused, incident commanders drive decision-making, take input from subject matter experts, and quickly establish consensus to keep things moving toward resolution. Incident commanders also assign tasks to specific people with a specific time allotment and follow up when tasks are completed—this helps avoid the bystander effect that so often plagues emergency responders.
Not everyone is cut out to be an incident commander. You need to be a cool as a penguin, with a bird’s eye view of your systems, and be able to maintain authority in the face of stressed-out executives who have the tendency to try and “swoop and poop” on your response team. The good news is that the process can be practiced.
We take the opportunity to practice our incident response process through our chaos engineering experiments—what we call “Failure Fridays.” In fact, you can even make it fun by using games such as Keep Talking and Nobody Explodes! Just remember: After you make it fun, make sure you make it boring! Incident response is best for you and your business when it’s boring.
Dave Cliffe is a bird-brained software engineer who has led various product management and product marketing initiatives at PagerDuty. Before PagerDuty, he filled a variety of roles at Microsoft (on the Azure team) and Amazon (launching their grocery business).
No Result Found
- Contact Us
- IT Membership
- Support Centre Association
- Comparison Guide
- Price Guide
- Membership Conditions
Training & Workshops
- Training Courses
- Recent Workshops
- Cancellation & Transfer Policy
- ITIL Training
- ITIL Foundations
- Support Centre Consulting
- Service Desk Consulting
- Help Desk Consulting
- Media Kit
- Update your details
- New account
© Copyright HDAA. All rights reserved.
HDAA - Energising the Service & Support Profession
Help Desk Association Australasia Pty Ltd trading as HDAA
T: 1300 130 447 T: +61 (0) 2 9986 1988 F: +61 (0) 2 9986 1330
E: firstname.lastname@example.org W: www.hdaa.com.au A: PO Box 303, Turramurra NSW 2074 Australia
ABN: 20 088 292 755
Our Services: ITIL | ITIL Training | ITIL Foundations | IT Membership | Service Desk Association | Support Centre Association | Support Centre Training | Service Desk Training | Help Desk Training | Support Centre Consulting | Service Desk Consulting | Help Desk Consulting
ITIL® and PRINCE2® are registered trade marks of AXELOS Limited, used under permission of AXELOS Limited. All rights reserved.
RESILIA™ is a trade mark of AXELOS Limited, used under permission of AXELOS Limited. All rights reserved.
The Swirl logo™ is a trade mark of AXELOS Limited, used under permission of AXELOS Limited. All rights reserved.
DevOps Foundation®, is a registered mark of the DevOps Institute.
HDI® is a Registered Trade Mark. HDAA is the Australasian Gold Partner of HDI®.
KCS℠ is a Service Mark of the Consortium for Service Innovation™.