Sep 29

Sep 29 HRO 9q Collision at Sea-The HRO Perspective-Failure

This post focuses on the last of Weick and Sutcliffe’s (2007) three principles of High Reliability Organizing (HRO) related to problem anticipation: Preoccupation with Failure (i.e., Failure).

Operations (Sensitivity to Operations),
Simplification (Reluctance to Simplify), and
Failure (Preoccupation with Failure)

* Weick, K.E., Sutcliffe, K.M. (2007). Managing the unexpected: Assuring high performance in an age of complexity. Jossey-Bass.

I provided my perspective on Sensitivity to Operations and Reluctance to Simplify in prior posts.

There is some overlap between my perspective on Preoccupation with Failure and Weick and Sutcliffe (2007), but my ideas go deeper by describing the assumptions, preparations, mindsets, and key behaviors that I think are necessary to enact a preoccupation with failure.

Assumptions

A preoccupation with failure is built on assumptions about the system being managed and the people in it. These assumptions are the foundation for the design of an organization’s personnel practices (developing people as operators and leaders), work processes (how things are done to produce reliable outcomes), and documentation (who records what, how it is reported, and how it is evaluated). My list of the essential assumptions supporting a preoccupation with failure is:

Monitoring performance is as important as robust system design. Having a system design that incorporates reliable hardware and procedures is not sufficient for HRO. Operators must regularly verify the adequacy of system performance through monitoring hardware (e.g. taking logs on operating equipment) and personnel (observing operators doing their work either on the job or through training activities that mimic job requirements).

For an Officer of the Deck (OOD) of a Navy ship underway, monitoring includes comparing what they see to reports from lookouts, considering Combat Information Center (CIC) recommendations based on their own judgment of a situation, knowing the capabilities and limitations of all Bridge equipment, and asking operators how they would recognize an equipment failure at their watch station and what their response would be.

Problems, errors, and deviations, shortened to “problems” for the rest of this post, happen. For this reason, processes for HRO must include cross-checks, formality (in direct proportion to the risk involved), and people backing each other up to catch errors quickly and minimize their impact. A preoccupation with failure on the Bridge means that operators believe that errors in transferring control of steering could have catastrophic consequences if not carefully controlled by training, procedures, and supervision. The Bridge team of the USS JOHN S MCCAIN lacked formality for splitting the HELM and LEE HELM controls at the Ship Control Console and for managing the transfer of steering control between the Bridge and After Steering just before the collision.
Problems can have up- and downstream consequences. People practicing HRO look deeper into the system for problems even when it’s painful to do so (e.g., identifying leadership shortcomings, creating work stoppages or delays). All problem investigations involve asking “Where else do we have this problem?” or “How far does this problem go?” This was an important lesson from the loss of the USS THRESHER (Holwitt, 2018; US Navy Court of Inquiry).

* Holwitt, J. I. (2018). The Loss of USS Thresher: Technological and Cultural Change and the Cold War US Navy. Journal of Military History, 82(3). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=08993718&AN=130321455&h=o0N1TF4lEIIriroEQ0ATVw91ISHU9DZQpyhoOJlynlnqjUkaBzt9ylMl7FhE0Af9O7ryFwdPKiIyXmxzYprwmQ%3D%3D&crl=f&casa_token=mNuD-T8yb0kAAAAA:-eUweaYBt3L2zQ52ygr3p-zGJ-WI1NPC513E_fRlA7GwjBxkGCuiZ_ZMUD2bUzKtBl1modjac6FYj3U

* Navy Court of Inquiry (NCOI) into Loss of USS Thresher (SSN 593), retrieved from https://news.usni.org/tag/thresher-inquiry

Problems provide invaluable insight into system health. Even robustly designed and very reliable systems have flaws that are inconceivable to designers and operators until a problem occurs.

These assumptions are about reality and the way the world works for people in HROs. It is possible that many people in HROs don’t think much about the assumptions and couldn’t articulate them easily if asked. They learn them through practice and accept them as given. That was my experience from 30 years of experience in the Navy.

Preparation

Some preparation is essential to practice a preoccupation with failure effectively. The most important kinds are:

Detailed system technical knowledge that enables operators to anticipate outcomes and recognize problems. This knowledge helps operators recognize what is normal and when is not. It gives them greater insight into where problems can appear and what indications they might present.
Audits and surveillances are expandable and recursive. Audits (checking whether a procedure accomplishes its purpose) and surveillances (watching people actually execute a procedure) must be able to grow when people discover problems. Where might the problems lead? What were the causes? They must also be recursive, capable of being adapted and updated if they fail to find problems or encounter a problem they weren’t designed to find.
Processes must exist for problem investigation, documentation, corrective action, and reporting to others. Problem investigations need to be formal, robust, and flexible. Formal means there should be a written procedure or, for low-risk problems, general guidelines for analysis, reporting, and corrective action. Robust means they can deal with multi-layered, systemic issues as well as misunderstandings between operators. Flexible means they can be tailored to the nature of the problem and the risk it presents. If you do a formal investigation using detailed procedures for every problem you encounter, you’ll never get any work done.

To stimulate a greater preoccupation with failure, I used to ask people “what are the single points of failure?” in processes they were executing or owned. In other words, what were the essential things people had to “get right” in a particular process to achieve success? Sometimes this wasn’t easy for them because I often got the answer, “I just follow the procedure.” To help people think differently, I could turn the question around and ask “if you wanted this process to fail, what would you do?” That elicited raised eyebrows as well as more creative thinking.

Mindsets

There are qualitative issues, called mindsets, associated with a preoccupation with failure that need to be part of how an organization does business. A mindset reflects the assumptions, beliefs, goals and expectations people use as rules to guide their attitudes and practice in a particular field (Fang, Kang, Liu, 2004). A mindset can be thought of as a set of guidelines people that people use to understand and react to the world. They can be shared to varying degrees within organizations or specific fields like the U.S. Navy.

* Fang, F., Kang, S. P., & Liu, S. (2004). Measuring mindset change in the systemic transformation of education. Association for Educational Communications and Technology, 298-304.

A mindset is not a collection of behaviors, but rather a collection cognitive processes that influence what people notice and accept as actionable, and how they understand situations and events (Fang, Kang, Liu, 2004). They are mental routines that people use when to get things done. They are seldom taught to members of an organization explicitly, but rather are learned by watching the behavior and interpreting the motivations of other members of the organization as they do their work respond to problems.

My brief list of mindsets that support a preoccupation with failure is:

Swiftness. People practicing highly reliable organizing are “preoccupied swiftly” (Weick and Sutcliffe, 2004, p.57). They investigate problems while memories are fresh. They review logs and other data quickly. They review lessons from drills and observed evolutions with personnel involved right away. Audits and surveillances have time limits for performance and review by leaders. Leaders doing HRO worry about the “age” of outstanding corrective actions from past problems.
Sensitivity. People try to combat the natural tendency toward complacency when things are going well. One of the worst criticisms that a person practicing high reliability organizing can be accused of is lack of sensitivity (to alarms, reports of problems, data out of specification, etc.).
Suspicion. People operate as if failure might be lurking around any corner. They are suspicious of their own mindsets because they believe they can narrow perceptions, breed intolerance of competing views, and entrench over-learned practices for reacting to problems. People trying to practice HRO are suspicious when problems are NOT found because it could mean members are normalizing unexpected outcomes. They interpret the absence of problems to mean that people aren’t looking hard enough.
Learning (“Knowledge is Good”). Organizations practicing HRO place a premium on learning: constantly refreshing their detailed technical knowledge, learning to do their job correctly, continuously improving, learning to operate as a member of a team, and learning from mistakes committed by themselves and others. Since things don’t “go wrong” often in reliable systems, the operators in them place great emphasis on extracting maximum learning when they do.

Leaders need to be alert to the possibility that problem investigations reports can become *obstacles* to learning. If an organization is only seeking to document WHAT went wrong (a bureaucratic, non-HRO mindset), the reports may leave out valuable context because it’s considered “obvious” to experience operators. Junior personnel need more context to understand WHY things went wrong. To fight this, I used to take draft incident reports to trainees and have them discuss their understanding with me. I would often insist on changes based on these conversations.

Key Behaviors

Armed with assumptions, preparation, and mindsets, a preoccupation with failure requires specific behaviors for enactment. The most important are:

Anticipation. Prepared with detailed technical knowledge of the system, people need to anticipate the outcome of every action they take (called “system response”). This is easier with equipment that HR practices, which can have long time delays between a change and indications of its impact.
Formality. The organization has documented and observable standards for operations, training, and administration. People are trained on the standards, take them seriously, and correct infractions they observe even among their peers. There is documentation for procedures, problem investigation, and communications. Audits evaluate adherence to the standards.
Reporting. People are taught to speak up about their errors and those of others, problems they notice, and their concerns about what “might” be wrong. They could be the only person that notices something important. If they have doubt about a concern, they only way they can “tune” this perception is to tell someone.

Investigating. People are just as concerned about understanding WHY something happened as understanding WHAT happened. If you don’t know how a situation made sense to the operators at the time a problem occurred, then it becomes more likely you will use simplistic folk models like “loss of situational awareness” and the clarity of hindsight to “explain” problems (Dekker, 2004). Assigning causality to “loss of situational awareness” explains nothing.

* Dekker, S. (2004). Ten questions about human error: A new view of human factors and system safety. CRC Press.

Follow up. Finally, people in organizations seeking higher reliability act quickly to “close the loop” on problems: investigate, determine actionable causes, implement corrective action, and check the effectiveness of those actions. Many system accident investigations find long tails of “things gone wrong” that were ignored or dismissed as “anomalies” unlikely to reoccur. A critical component of follow up is what Rickover called “facing facts.” You have to be willing to change your assumptions and mindsets if your investigations indicate they have become invalid or ineffective.

None of these behaviors are intuitive and all require considerable moral courage. They must be taught, practiced, and supported through an organization’s professional development processes.

Conclusion

It is important to acknowledge that having a preoccupation with failure is not normal. It requires assumptions, preparation, mindsets, and behaviors that NO ONE practices without focused training and extensive experience (mostly making mistakes). Humans have a tendency to assume all is well in the absence of problems. Human awareness is tuned to dismiss small problems because, most of the time, they *are* small and inconsequential. You have to teach people to recognize small problems (not so hard) and relentlessly investigate them (harder), even at the risk of disrupting their plans and operations (very hard).

In my next post, I will return to my analysis of the collision between the USS JOHN S MCCAIN (DDG 56) with Motor Vessel ALNIC MC on 21 August 2017 in the Straits of Singapore. I have more to write about what could be learned for future practice despite the dismal analyses in the official reports. Collisions are still going to happen, but they don’t have to be on your watch.