We must analyze human-caused errors without judgment to reveal what shortcomings in processes and automation created situations ripe for problems.
In problem management, we find that there are multiple causes of problems across any given environment, application, or platform. One of the most frustrating causes of any problem, especially one that carries large impact numbers, is human error.
Recently, I was surprised to see several articles saying that human error wasn’t a valid root cause. The running thought here was that for human error to occur, it had to be due to a shortcoming of something else – automation, controls, or procedures.
I would argue that you can’t always negate human error as a root cause. As English poet Alexander Pope said, “To err is human…”
To prevent future instances of human error, we must understand the scenario that allowed it to happen in the first place. Doing the “blameless postmortem” is a very effective way to ensure that we have the best picture of what occurred. At the end of the day, unless it’s a pattern of ineptitude to be handled by management, we shouldn’t care who broke it; we want to understand how we got there.
What are the steps, gaps, and inefficiencies currently in place that set the stage for the error? When we know this, we are better equipped to prevent the next person from stumbling into the issue. To get this information, you must make sure that the human behind the error understands this. Trust and communication are important; until we have this information, we are unable to effectively look at opportunities for prevention and control.
Automation in place of human action is a great solution to human error, if it’s possible. Depending on the size of the firm, the skill of the workforce, or the nature of the action, it’s not always an option. Additionally, I would argue that by and large, automation represents innovation created to reduce the amount of manual work.
Controls and procedures are always great options for reducing the risk of human error. However, they are often created or enhanced after an error has occurred. They come with maturity and reflection. During a problem review, if a person’s action, or lack of, causes a problem, the steps leading to and from that action must be reviewed to determine what can be implemented or improved, automation aside, to prevent a similar occurrence. Was the process fully documented? Is there an opportunity for a secondary review before the action is completed? Are there systemic controls in place to prevent or identify a misstep before it causes a problem?
It’s rare that we can fully insulate our systems from every scenario that could occur, but when we are looking for these opportunities, our inspection should be thorough. It’s important that we do not isolate the review to that specific step either. Analysis of the environment and similar processes should also be reviewed. Are there other similar scenarios just waiting for the next stumble?
At the end of the day, we’ll likely never see an environment that is so flawless in its design that runs error free. While we can test changes to our environment, we often set expectations and criteria with the lens of how things are supposed to work; it’s when the unexpected happens that we learn the most. Humans are the most unpredictable variables, and we should be thoughtful and intentional in our designs that expect or require human interaction. When problems occur at the hands of our people, if we take the time to talk, reflect, and analyze, we are better able to be more holistic in our prevention of human errors.