Overhaul 17b: Undeniable Truth of Overhaul 22
Introduction
This is the second installment in the Undeniable Truths of Overhaul series, Undeniable Truth 22: Learning rapidly is a superpower. The most important job of a crew in overhaul isn’t supporting the shipyard (SY). I wrote in a previous post that it was, but it is time to reveal the truth (you can handle it). The crew can’t consistently support the shipyard unless it continuously improves, learns from mistakes, and anticipates problems. That’s their most important job.
The key points from the first post on learning (Undeniable Truths of Overhaul 17a):
Errors in overhaul are inevitable; effectively learning how to be safer isn’t.
Questioning an organization’s deeply held beliefs, myths, and assumptions about error isn’t career-enhancing.
Problems and errors are clues about deep organizational problems and unresolved goal conflicts.
Learning differently in the SY is essential. The standard qualification model, “go to training, study manuals, take tests, stand under-instruction watches, copy what others do, pass oral boards, and go” isn’t sufficient. Although seldom acknowledged, the effectiveness of this learning model assumes the existence something that is missing in overhaul: an experienced cadre of experienced personnel, outnumbering trainees, who can watch them closely until their performance isn’t so scary. In overhaul, everyone is learning at the same time, which has important implications for risk management.
The following is one way to emphasize the differences between normal operating conditions of Navy warships and overhaul environments that make learning difficult:
Leave it to Beaver Land: living in a neighborhood of well-maintained houses (normal ops) and manicured yards (practiced procedures), populated by model families (robust and stable designs, mostly), and wise parents (experienced personnel) aided by kindly neighbors (staff and HQ support) who are ready to step in to rescue you (constrained pathways for error) when you get sucked into a crazy scheme hatched by Eddie Haskel.
Rocky Horror SY Show: unsuspecting crew members caught with a flat (ship in need of repair), entering a spooky castle (the SY), terrorized by aliens (SY personnel) performing scary rituals (SY procedures), having to undress (ship systems torn apart), blundering into monsters (temporary systems and constant oversight) so that everything is new to everyone all at once.
In this post I briefly describe why I became dissatisfied with the standard approach to one aspect of overhaul performance: learning from error. Then, I describe a new way to learn from error: Error Management.
Let me take you on a journey …
Thinking Differently
This section describes how I changed my thinking about error. It is autobiographical, not an attempt to persuade. I wanted to help people learn better and improve faster in overhaul. I expected to learn how to do this from the people who had done it before me. I didn’t plan on completely rethinking my ideas about learning.
I used to think of errors as facts. People did the wrong thing and needed to be held accountable for it. Accountability was an essential part of learning. I internalized this belief by copying the people around me. The procedure said “Do X” and someone did Y. Consequences included reprimands, re-training, tighter procedures, more supervision, etc. Following the accountability (it was always “training,” not “punishment”), leaders moved on to the next thing.
No one questioned whether this was a good model for learning from error. I internalized the dogma and practiced it like a pro. Social pressure to behave in the ways approved by authority figures probably had a lot to do with it. Having a questioning attitude is tolerated only to a point. Some topics are off limits (i.e., undiscussables) (Argyris, 1994).
My change in thinking about responses to errors started with a question. As I was preparing to be the Reactor Officer (RO) aboard a CVN undergoing Refueling Complex Overhaul (which should always be written as “Refueling COMPLEX Overhaul”), I wondered, “What do people know about achieving high levels of safe performance in complex environments (SYs)?” I knew that the answer wouldn’t be simple because I had seen and been a member of crews that struggled in overhaul (understatement of the century). Surely, experienced people could teach me about this.
The question wasn’t as simple as I thought. The focus of RO pipeline training and most Navy training is technical (the reasons for this are beyond the scope of this post). Amid the technical details coming through a firehose, it felt like something was missing. The performance issues I had seen flatten crews and delay overhauls never came up. That seemed strange.
The people doing the technical training were very smart, but they couldn’t help me. When I asked about improving safe operations in the SY, they stared at me, said it was someone else’s job (those people stared too), or changed the subject to another technical issue. I stopped asking because it seemed to make people uncomfortable. I was on my own.
Being on my own to learn how to improve safety in the SY wasn’t scary or lonely. It was “permission granted” to figure it out for myself. I like those kinds of challenges.
I began with reading. At the suggestion of a mentor who didn’t scare easily, I read everything I could find about the cognitive demands of complex systems, managing error, team coordination, and human performance. The study areas included:
high reliability organizing,
human error,
judgment and decision making,
tactical decision making under stress (a research program funded by the Navy in the 1980s), and
Crew Resource Management (the foundation of the revolution in aviation safety that began in the 1980s)
While I can provide a bibliography for those wanting to know what I read, there are two caveats. The first is that the changes in my thinking about human performance can’t be summed up as: “Do the reading that I did, and you’ll change too.” This section is a description of my question, my dissatisfaction, my unlearning, and my new understanding of human error. I’m not trying to convince readers of anything. Even if they read the same books, there is much research suggesting that they wouldn’t reach the same conclusions I did.
The second caveat is that I wasn’t motivated by a higher moral purpose. I knew I couldn’t transform the department’s overhaul quality of life into unicorns and rainbows. Being in the SY sucks for the crew. This is one reason why JOs keep writing Proceedings articles asking to be excused. I wanted to know more about being safer in overhaul because I didn’t want to suffer through lurching from one crisis to another. I had done that and it made overhaul suckage worse.
I learned a lot from reading academics' and practitioners' writings. Newly aware of ideas I was hadn’t been exposed to before, I moved on to studying reports of actual SY experience. I read every CVN overhaul problem report I could find (“Who does that?”). I had lots to read.
The problems I read about seemed to fall into broad categories. They came up again and again. I created a taxonomy of error likely situations based on the reports (also available on request). Based on my reading, I was confident they would come up on my CVN, and I wanted the department to be able to prepare.
The corrective actions in the problem reports followed the standard error-response model that I knew well:
identify the perpetrators,
assign a “root cause” (the one thing)
lecture the guilty on what they should’ve known,
train the uninvolved about what the perps should’ve done,
add more rules or training or supervision to prevent that particular error, and
exhort people to try harder or pay more attention, “so we won't have this problem again.”
This model is the gold standard for problem response (not just in the Navy). Senior leaders I observed throughout my career talked and behaved as if these actions worked, which is why they kept taking them and writing about it.
After reading about CVN overhaul experiences, I had two serious concerns. First, the error-response model that was so familiar to me was entirely at odds with my reading. Second, the actions people took didn’t make things better. The problem reports documented that crews had the same problems, overhaul after overhaul. It was in black and white. I observed the same thing as an ED in the shipyard and as an operator before becoming an ED. Navy operators weren’t getting safer in overhaul and weren’t learning from the mistakes of their predecessors. Isn’t that the reason for writing the reports, to learn? These were fundamental disconnects.
Was I the first person in the Navy to think that the standard model for responding to error was ineffective? Was I going to have better results just trying harder? It didn’t seem likely.
My conclusion was that bad-apples thinking and simplistic models of causality work, but only to a point. They are like sticking your finger in a dyke to stop a single leak. In the SY, you’re likely to run out of fingers before you run out of problems.
The standard model worked for specific cases. It affected the people who took particular actions in one context. More of the same error response eventually approaches asymptotic levels of safety because it constrains general learning. Learning using the standard model of problem response wasn’t generalizable beyond the narrow circumstances and the people involved (the utility of “brief all crews” is a myth). This is a twofer: counterintuitive and heresy against the error-response dogma.
There is nothing either good or bad, but thinking makes it so. —Hamlet, Act II, Scene 2.
As Hamlet trenchantly observed, how you think about your experience is fundamental. Things gone wrong aren’t good or bad. They’re clues. When something goes wrong (or nearly wrong) in overhaul, it is either evidence of “sailors behaving badly” or an opportunity to improve if you look deeper. What you learn depends on how you think about experience. Even labeling an action as an “error” is problematic because it depends on knowing how things turned out (hindsight) and (potentially) biased moral judgments (e.g. that errors come from “bad apples”).
I changed my thinking about error. I stopped believing that “you become safer by correcting the errors of the operators who touched it last.” I considered the possibility that “robust safety improvement is only possible if you look beyond the ‘obvious’ errors of operators to find and correct deeper issues.” It seemed like modifying training, event preparation, and ways of thinking about risk (i.e., all components of the system design) would make my life suck much less than telling people to “be more careful next time.”
The next section describes a different approach to error and learning for improving safety.
A Different Approach: Error Management
I’ve established that my desire to improve safety in overhaul wasn’t a moral crusade. Developing people and helping them learn to operate more safely was a matter of self-preservation. From the reading I had done, an approach with promise was Error Management (EM). In this section, I’ll limit myself to high-level description. There are plenty of resources that describe it in detail. I’ll give specific examples of applying it in the next post because this one is already longer than I prefer.
At its core, EM is a system for thinking differently about error (Helmreich, 1998). The central tenet of EM is that most errors are unintentional, and that they result from insufficient knowledge, or are a result of poor decision processes. It is difficult for leaders to have any influence over actions that people did not intend, which is why actions to reduce error tend to be ineffective. In the end, blaming people may be emotionally satisfying (if we’re honest about it, it is very satisfying), but it has little impact on their future susceptibility to error (Reason, 1997).
EM has been practiced in commercial and military aviation for decades. It focuses on building resilience. EM builds resilience through systematic efforts to collect, analyze, understand, and act on the sources of error (operators are just the fall guys). Actions to reduce error at the source include changing policies, improving procedures, training differently, mentoring, pre-task checks, and improved risk mitigation (recognizing error-likely situations like the list I created). Errors will still occur, but learning about them can help the entire department or ship learn practices and ways of thinking that enhance safety.
An argument against EM might be, “We already do that.” Perhaps. Another possibility is that an organization’s existing vocabulary (“loss of situational awareness”), models (“switch” theory, “root” cause), and thinking (if doing what you’ve always done without considering alternatives is thinking) are part of the problem. These aren’t EM. In fact, they block us from becoming as safe as we could be. Commercial and military aviation didn’t become safer by doing more of what they were already doing. Neither can leaders in overhaul.
Once I decided to think differently about error, I had to choose how to act differently. That will be the topic of the next post.
Conclusion
It is difficult for leaders to keep people from making errors in situational assessments, using the wrong procedure, or taking the wrong action when their skills are inadequate. These aren’t causes, “root” or otherwise. They are consequences of deeper issues.
“Bad apples” thinking, believing that people make mistakes because they are careless or lazy, isn’t a law of nature. Believing that people who commit errors are defective (need an “upgrade”) is a choice, learned through experience and social pressure to conform to organizational safety dogma. Leaders can make different choices to improve how people learn from error. Choosing Error Management is a better option.
The best thing leaders can do to improve sailor quality of life in overhaul is to support the shipyard to finish on time. It’s an iron law of warship overhaul that slower crew learning delays completion. Crews who don’t learn well have more problems and work stoppages. If your primary goal is finishing your overhaul on time, improving crew learning and resilience is essential.
In the next post, I’ll give examples of the practices my team and I used to manage and learn from error. We certainly weren’t error free, but we had substantially fewer incidents and work stoppages than the ship that preceded us.
References
Argyris, C. (1994). Good communication that blocks learning. Harvard Business Review, 72(4), 77-85.
Helmreich, R.L. (1998). Error management as organisational strategy. In Proceedings of the IATA Human Factors Seminar (1-7). Bangkok, Thailand, April 20-22, 1998.
Reason, J. (1997). Managing the risks of organizational accidents. Ashgate.