2015-02-24 16.23.42.jpg

Hi.

Welcome to my website. This is where I write about what interests me. Enjoy your visit!

Overhaul 17c: Undeniable Truth of Overhaul 22

Overhaul 17c: Undeniable Truth of Overhaul 22

Introduction

This is my third and final post about Undeniable Truth of Overhaul 22: Learning is a superpower. The series began with the claim (lots of evidence, but I won’t point fingers) that slow learning delays overhauls. This is intuitively sensible. Crews that make more errors have more work stoppages and sometimes hurt people badly. Backward-looking responses to error, such as searches for root cause, confidence in material and process design (‘We have good procedures, but people aren’t following them.”), and blaming operators work up to a point, but constrain more general learning. There are cognitive reasons for this (beyond the scope of this post).

This post describes forward-looking approaches to error guided by Error Management (EM) practiced by a CVN in RCOH. EM doesn’t seek to control or reduce errors, but rather manage the situations and practices that make them likely (Reason, 1997).

Important note: smaller ships with fewer resources than a CVN won’t be able to do everything described in this post. The CO must choose what’s practicable for their resources, time, and energy. This post gives leaders information to choose wisely. The next section defines Error Management and the layered approach our CVN used to implement it.

Less important note: the complete reference list for all three parts of Overhaul 17 is here.

A Layered Approach to Error Management (EM)

Error Management is a systematic effort to cultivate operational resilience (Helmreich, 1998; Reason, 1997). The principles of EM are:

  • problem investigation motivated by the belief that unexpected outcomes are clues about deeper issues, not evidence of defective humans (this is best illustrated by prioritizing the identification and reduction of error-inducing situations and non-punitive responses to most errors),

  • monitoring to identify and understand the types of error that occur,

  • training in risk identification, and error avoidance and management strategies,

  • training for monitors and leaders in assessing and reinforcing error management practices,

  • problem reporting that prioritizes learning, which demands policies for error response that distinguish “well meant shortcomings from heedless or stupid blunder” (Buell, 1900).

Our CVN implemented Error Management (EM) in three layers of increasing complexity. Spoiler alert: if you want to change how people understand and respond to error, don’t start by telling them, “Everything you know is wrong.” This is certain to generate confusion, frustration, and resistance.

A layered approach is the best way to build on what people already know. It connects EM ideas to be introduced later to concepts that have been internalized by middle and senior managers (particularly Chief Petty Officers and Limited Duty Officers in the Navy). A senior leader depends on line managers in the organization to make things “work” day-to-day. The last thing a leader wants at the outset of a change process is to untether their line leaders from familiar ways of operating and sensemaking. Alienating people whose support you need for implementing change is a poor strategy.

Taking a layered approach to EM had three advantages. First, it wasn’t scary. The foundational layer was nothing new for experienced personnel. I emphasized the fundamentals of safe operations after relieving as RO because people expect a new leader to emphasize fundamentals.

Second, it permitted a progression from familiar, concrete concepts supporting safe operations (like believing your indications, observing system response, etc.) to more thoughtful strategies for qualifications, monitoring, communications, and finally, watchteam coordination. A leader can’t implement more advanced strategies of EM unless people have a solid grasp of the fundamentals of safe performance.

Third, I could quickly demonstrate the principles of EM in action. Errors, problems, and preparations for the next event don’t pause when a new leader arrives. Problems allowed me to demonstrate that I was looking beyond simplistic, “root cause” approaches to understanding error.

The three layers of EM implementation were:

  • Layer 0 Foundation: standard theory, systems, and operations knowledge of the hardware (from Navy technical documentation). This was augmented with instruction in and regular reinforcement of well-established cultural values like questioning attitude, anticipating system response, and independence of process checks (tagout, valve line up second checkers). Some of these values are defined in Navy manuals, but most are not. Cultural values are sometimes emphasized in qualifications, but more often only after things have gone wrong (backward-looking). We taught the cultural values proactively as table stakes for safe operations.

  • Layer 1 Individual Resilience: principles of human performance and reliability like watchstanding principles, formality, expectations for supervision, speaking up (like watchteam backup, but more), seriously considering concerns, crew rest, etc. Much of this information about resilient practices is in Navy documentation, but so widely dispersed as to be functionally invisible. I synthesized what I wanted to teach based on extensive study before reporting to the ship.

The focus of the training I conducted on individual resilience was diligent, thoughtful action for safe operations. I frequently illustrated principles of individual resilience by noting their absence from recent own ship problems, problems on other ships, and organizational accidents (Challenger and Columbia space shuttles, Chernobyl, etc.).

  • Layer 2 Team Resilience: cross training for casualty response, forceful backup (both providing and supporting), senior enlisted as watchstanding monitors and mentors, Crew Resource Management, high reliability organizing practices, tools like Pre-Mortems, and TADMUS.*

*TADMUS, Tactical Decision Making Under Stress, was research funded by the Navy after the USS VINCENNES crew mistook an Iranian commercial airliner for a hostile target, and launched a missile that killed everyone aboard.

Resistance happens and should be expected in any change initiative. I introduced concepts and tools for managing and learning from error that senior personnel in our organization, regulators, and leaders on fleet staffs had never heard of (“What’s a premortem?”). I did receive pushback, sometimes very strongly. My customary response was, “I can connect everything I say, do, and teach to the fundamentals, can you?” This was another reason for starting with the fundamentals.

Our CVN had problems just like every other Navy ship in overhaul. What was different was how we learned from them. Most of the problems fit into the categories based on the overhaul problem reports that I had studied. We sought to leverage our errors and causal analyses to learn faster to build better practices for safer operations.

The next section describes the practical implementations of Error Management.

Error Management in Action

In this section, I provide some examples of our EM practices. They focused on guiding people to think differently about error. We leveraged every problem for learning, enhancing resilience, improving decision making, training for increased resilience, and anticipation. Remember, reducing error isn’t the goal. You reduce error to reduce work stoppages, prevent new work from damaging equipment, and shorten schedule delays necessary to get your act together.

Leveraging Problems for Learning

An extensive study of overhaul problem reports convinced me that people weren’t learning from error in overhaul. Perhaps individual ships may have, but the same errors were occurring over and over on other ships. Since the purpose of documenting problems and investigations is learning, we changed how we did it.

The problem reports that I read were hard to learn from. Since there wasn’t a required format or writing process, we designed ours to enhance our learning. Our changes:

  • Preparation—before assigning a problem report author (customarily the senior leader involved), I asked the senior people involved what their goals were, how they assessed the situation, and how they selected their course of action. Which watchstanding fundamentals did they think applied (not “how they failed”)?

  • Format—in many problem reports, it can be difficult to identify what went wrong. To deepen our analyses, our reports clearly indicated the problems in the chronology with labels: Problem 1, Problem 2, etc. Authors addressed each in a Problem Summary section immediately after the chronology. The analysis included the applicable watchstanding principles and expectations for supervision.

  • Each corrective action had to relate to at least one of the problems, which was indicated in parentheses after the action.

  • The report author had to explain to me how they planned to change their future performance based on their understanding of the watchstanding and supervision principles.

  • Report writers had to note similarities to prior problems and why we repeated the problem, if applicable.

  • Before accepting a report, personnel in training read it and I discussed it with them. If they couldn’t understand the problems, the principles involved, or what they would do differently, the author revised it until they could (this was learning about learning).

  • An action to review and discuss recent problem reports was added to qualifications. Only people whom I had trained to my standards could sign for this requirement.

Enhancing Resilience

A hallmark of High Reliability Organizing (HRO) is commitment to resilience (Weick and Sutcliffe, 2007). In my series on the fundamentals of HRO, I wrote about questioning attitude’s contribution to resilience (here and here). As I noted in the previous section, questioning attitude was part of the fundamental layer of Error Management. In this section, I will describe resilience in greater depth.

Resilience is an organization’s capability to not come unraveled (stay “raveled”?) when the unexpected happens, as it always does. It has two components: anticipation and having skillful ways to deal with the unexpected (the things you didn’t know that you didn’t know).

Examples of Improving Anticipation

  • We adapted the pre-evolution brief format for overhaul and trained on its use with specific examples. We reviewed it while preparing for procedures. Aside—Why any ship entering overhaul needs to develop its own pre-evolution brief format is beyond my understanding. The Type Commander representative or equivalent should hand the last ship’s format (reviewed by shipyard testers) to the CO with the message, “Use this. Don’t subtract anything. Tell me what’s missing.”

  • All our briefs included standard communications and reports for watchstanders to use during the procedure.

  • For procedures with elevated risk, duty section leaders used Navy ORM and Premortems to prepare. Both are focused on improving awareness of what could go wrong before an event. They surface risks that might not be accounted for in the plan. You can learn all you need to know about doing a premortem from a web search using the term “gary klein premortem.”

  • We reviewed recent test experiences and consulted SY test leaders for their perspectives about common problems (our ship and others).

Skillfully Dealing with the Unexpected

  • We did interactive briefs, of course, and added specific questions such as, “What indications will you monitor? What will you do if …?”

  • Briefs included stop points if things did not go as expected.

  • We decided how much supervision was necessary for procedures and where to position it at the department, not the watch team, level. Sometimes, I asked the Immediate Superior in Command (ISIC, title varies by ship type) for monitoring assistance. Personnel weren’t often available, but it’s better to ask.

  • We had junior personnel shadow senior or experienced or more capable watchstanders (cross training). We had junior watch officer shadow watch supervisors or senior enlisted monitors. I gave guidance to the mentors about what to teach, but there wasn’t a lesson plan.

  • We used senior enlisted personnel for monitoring new procedures (or those with elevated risk) and mentoring personnel.

Improving Decision Making

Two important sources of error are non-reflective decision making and poor mental models (common among junior watchstanders). Sometimes people choose risky courses of action because they either don’t consider alternatives possible or think they will be unacceptable to leaders (“Never delay the SY” vs “Sometimes STOP is good progress”). Normally, people learn to consider risks and make better decisions through trial and error (like touching a hot stove), but relying on trial-and-error learning in overhaul is risky. Inexperience and error can cause delays from critiques, work stoppages, and equipment damage (starting a pump with the discharge valve shut, oh no!).

It isn’t easy to teach people to be more reflective when making decisions, so we tried to improve the quality of supervisor decisions in several ways. First, we explicitly taught conservative decision making (no hand waving). The essence of conservative decision making is to recognize and challenge all uncertainty. We used recent experience from SY procedures of both conservative and risky decision making.

All decision makers are subject to cognitive biases. Inexperienced watchstanders can have flawed mental models of complex situations that arise in SYs. Under conditions of uncertainty, people tend to have too much confidence in their situational assessments and skills (Skala, 2008).

When faced with situations outside the scope of normal experience, we prompted supervisors to stop, set stable conditions, and do a quick “huddle,” possibly including the Shift Test Supervisor and the senior duty officer. The purpose of the huddle was to compare understandings of the situation and risk before proceeding cautiously or getting further clarification. We identified criteria for recognizing uncertainty and used the pre-event brief checklist to establish “stop and evaluate” points before starting a procedure.

Second, I told supervisors to be ready to justify everything they did on watch as if they expected to see me at any time. I regularly visited personnel in workspaces to ask what they were doing and why. I listened, asked about goals, and gave feedback on their plan and suggested alternatives, when appropriate. I used care not to tell them what to do if I could avoid it. Some research suggests that the expectation of an evaluation of one’s actions leads to more deliberative decision making.

Third, if a supervisor assigned someone to take an action that led to a problematic outcome, I asked them questions like, “How did you establish confidence that this person could accomplish what you asked them to do?” “What alternatives did you consider?” “What else was going on that seemed more important?” These questions generated defensive behavior until I demonstrated that a) no punishment (including harsh language) would be forthcoming, b) I wanted them to think deliberately about the choices they made, and c) I wanted to understand how to change future watchstander training and mentoring. I must have had some impact on decision making because a junior officer gave me a paperweight with the words “Make good choices” engraved on it.

Training

In addition to the training noted above, other training in support of EM took several forms:

*Interactive reviews of recent problem reports and non-reportable problems (not read aloud, verbatim, in a monotone). I monitored how this was done and provided frequent demonstrations and feedback until supervisors reviewed the reports the way I wanted (I may write more about this in the future).

*RO-led seminars for newly qualified supervisory watchstanders with experienced watchstanders. The purpose was to prepare them for the complex watch standing environment they were about to enter. The format was designed to identify risks and be more reflective about their own decision making. The Chief Test Engineer and Shift Test Engineers participated. There was homework that required reviewing SY test management policies.

Other Resilience Tools

We employed many other High Reliability Organizing (HRO) and Crew Resource Management (CRM) practices such as:

  • Day before duty briefs: reviews of monitoring program findings, upcoming SY procedures, and support requirements. Watchstanders were updated on changes since their last duty day, procedure updates, schedule revisions, and upcoming events.

  • Day after duty reviews: duty section assessments of their performance, signs of watchstander overload, surprises, close calls, and opportunities for improvement. This was more learning about learning.

  • Stationing an LDO and senior enlisted on back shift for weeks at the start of integrated testing (before shift work and continuing through the end of the test program) for teaching risk management and mentoring operational skills. Some ships assign a “midnight cowboy” to be a watch officer on the backshift during periods of intensive testing as well.

I couldn’t be present at all the duty section briefs, so I trained duty section leaders on my standards and asked SY test leaders to participate.

In all these actions in support of Error Management, my senior leaders and I sought synergy and positive feedback. Frequent training on fundamentals, their application, and consequences when they weren’t used made them more salient. Constantly referring to them during procedure preparation, mentoring on-watch personnel, and doing problem analysis gave people practice applying them. I discussed our EM practices and our goals for using them with SY test engineering leaders frequently to solicit advice, suggestions, and watch team backup when we fell short. Day before and after duty reviews improved resilience through planning ahead for risks and learning what went well, what didn’t, and why.

Finally, we developed a thoughtful, structured process for integrating new personnel into your EM processes, particularly those with experience from other ships. They need to understand the standards of your ship and possibly unlearn non-reflective practices that were the norm on previous ships. This was one of our biggest training challenges.

Conclusion

It is hard to know where to start with Error Management (EM). That’s why I wrote this post. Everyone “knows” the backward-looking practices of blaming, shaming, and retraining. Forward-looking practices to cultivate organizational resilience not so much.

Perhaps “Where to start?” is the wrong question. The more important question is, “Why aren’t we getting safer?” That’s the question that started me on the path of EM. I just had to be careful about saying it aloud. We didn’t start with all the practices identified in the previous section. We learned what to add and what to stop doing as we went.

There were four features of our EM implementation. The first was adopting a non-punitive policy toward error. The second was using a layered approach for introducing the concepts. Only after I was sure that people had a solid grounding in the fundamentals of safe operations did I proceed to the next two layers. We integrated more advanced notions of individual and team reliability incrementally after that. The third feature was taking advantage of positive feedback between the actions. We leveraged our day-to-day experience (and problems) to reinforce existing or new processes for training, qualifications, mentoring, anticipating, problem analysis, and learning. Finally, we included self and team assessments to learn about our learning. Our non-punitive responses to error were crucial. We needed information to improve. We could only get the performance and insight we needed from making it clear that we wouldn’t punish people for trying to accomplish their jobs in accordance with regulations and SOPs as they understood them.

The overhaul environment is too changeable, unfamiliar, and unpredictable to rely on unreflective adherence to traditional approaches (assuming that the other guy knows what he’s doing). The most challenging overhaul situations fall outside the conditions that normal operating procedures assume. We sought to engage the brains and commitment of all department personnel in developing, critiquing, implementing, improving, and owning our shipyard support.  One of my biggest challenges was to insist that people figure out what to do and resist frequent entreaties, under time and workload pressure, to be simply told what to do.

Overhaul 17b: Undeniable Truth of Overhaul 22

Overhaul 17b: Undeniable Truth of Overhaul 22