2015-02-24 16.23.42.jpg

Hi.

Welcome to my website. This is where I write about what interests me. Enjoy your visit!

HRO 9a Applied HRO-Collision at Sea

HRO 9a Applied HRO-Collision at Sea

This post begins a departure from my prior posts about important, but under-examined, aspects of High Reliability Organizing (HRO). The focus of this mini-series will be the collision between the USS JOHN S MCCAIN with Motor Vessel ALNIC MC on 21 August 2017 in the Straits of Singapore. As we Sailors like to say, a collision at sea can ruin your whole day. In these and all other Navy-related posts, I am following Navy convention and capitalizing Sailor(s).

References

(a) Chief of Naval Operations. (2017). Memorandum for distribution, Enclosure (2) report on the collision between USS John S McCain (DDG 56) and motor vessel Alnic MC. https://www.doncio.navy.mil/FileHandler.ashx?id=12011

(b) National Transportation Safety Board. (2019). Maritime accident report: Collision between US Navy destroyer John S McCain and tanker Alnic MC Singapore Strait, 5 miles northeast of Horsburgh Lighthouse august 21, 2017. NTSB/MAR-19/01 PB2019-100970. https://www.ntsb.gov/investigations/accidentreports/reports/mar1901.pdf

(c) International Regulations for Preventing Collisions at Sea (COLREGS). (1972). https://en.wikisource.org/wiki/International_Regulations_for_Preventing_Collisions_at_Sea

The analysis of the events leading up to the collision of these two ships involves the application of High Reliability organizing (HRO) principles to an organizational accident, defined as “situations in which latent conditions (arising from such aspects as management decision practices, or cultural influences) combine adversely with local triggering events (such as weather, location, etc.) and with active failures (errors and/or procedural violations) committed by individuals or teams at the sharp end of an organization, to produce the accident” (Reason, 1997, p.1).

* Reason, J. (1997). Managing the risks of organizational accidents. Routledge.

Expanding on some of the terms of Reason’s definition may be helpful for those not familiar with it. Latent conditions exist before the accident, but are so far in the background, accepted without question (“it has always been that way”), or unknown that the people involved aren’t aware of them until it is too late or they are answering questions for a Navy Board of Inquiry. To paraphrase Donald Rumsfeld, you go into crises with the organization, training, role systems, procedures, and equipment condition you have, not those you might wish you had (Rumsfeld, 2004). Organizations do have ways to influence (probably not control) all these things before the accident.

Latent conditions can be in place for years before a triggering event makes them salient. The triggering event is what makes the latent conditions suddenly, glaringly obvious or extremely harmful. Organizations have almost no control of triggering events; they just happen.

Active failures are things the human operators do just before the accident. I think it is a mistake to think of the active failures as causing the accident or collision directly, but combined with the triggering event, they unleash the latent conditions and lots of damage. Unlike latent conditions, active failures don’t exist in the background. They are easy to spot after the organizational accident and are usually the superficial causes people identify for the accident. In my analysis, we’ll go a lot deeper than this.

Active failures are inevitable, unpredictable, and resist simple solutions like telling people to “be more careful” or blaming them for losing “situational awareness,” whatever that means. If my unwillingness to pretend that I understand what it means to “lose situational awareness” irks you, take a deep breath for now. I’ll have more to say about loss of situational awareness later in this series.

The best thing organizations can do about active failures is create procedures that balance the need for efficiency with systems, processes, and equipment that add redundancy so human errors can be spotted and corrected quickly. Every organization needs to be good at getting things done with the fewest people and the shortest amount of time practicable or they don’t stay in business very long. Even government organizations need to exhibit some kind of efficiency or people at the top don’t keep their jobs for long.

Redundancy is not a “cure” for human error. Redundancy without limit, if a such a thing could be said to exist outside the minds of people that focus their post-accident energy on what the operators “should have” done, carries two risks. First, nothing can get done because everyone is checking everyone else all the time. Second, ever increasing levels of redundancy in procedures and equipment leads to complex, often unpredictable interactions among system components. This can make it impossible to predict their performance. This was one of Perrow’s warnings that I noted in the Normal Accident Theory (NAT) and HRO post.

Reason’s definition of organizational accident is also a model of accident causality. Recall from my post on NAT and HRO, system accidents are undesirable events involving unanticipated interactions arising from multiple failures in systems (Perrow, 1984, p. 70). Reason’s definition breaks down “unanticipated interactions among multiple failures” into three categories: latent conditions, triggering events, and active failures.

* Perrow, C. (1984). Normal accidents: Living with high risk technologies. Princeton University Press.

I conclude now with a brief description of the collision, the causes chosen by the Navy and NTSB investigators (causes are never “found,” investigators choose them), and a brief description of where I plan to go in future posts in this series.

The USS JOHN S MCCAIN (JSM) collided with Motor Vessel ALNIC MC on 21 August 2017 in the Straits of Singapore. The collision caused the deaths of 10 Sailors, 48 more were injured, and the ship suffered damage estimated at more than $100M. There were no fatalities aboard ALNIC and the ship suffered approximately $225,000 worth of damage.

The senior Navy leaders that investigated the collision chose the following causes of the collision:

• “Loss of situational awareness in response to mistakes in the operation of the JOHN S MCCAIN’s steering and propulsion system, while in the presence of a high density of maritime traffic.

• Failure to follow the International Nautical Rules of the Road (Ref (c)), a system of rules to govern the maneuvering of vessels when risk of collision is present.

• Watchstanders operating the JOHN S MCCAIN’s steering and propulsion systems had insufficient proficiency and knowledge of the systems,” (Ref (a)).

In the National Transportation Safety Board (NTSB) report, their investigators chose as the probable cause of the collision “a lack of effective operational oversight of the destroyer by the US Navy, which resulted in insufficient training and inadequate bridge operating procedures” (Ref (b), p. 39). This is a BIG difference from the Navy investigation report. Note that the NTSB used the words “oversight of the destroyer by the US Navy,” (emphasis added) which the report identifies as being Navy organizations outside the lifelines of the ship.

The NTSB added contributing causes:

  • “The [JSM’s] bridge team’s loss of situation awareness and failure to follow loss of steering emergency procedures, which included the requirement to inform nearby traffic of their perceived loss of steering. 

  • The operation of the steering system in backup manual mode, which allowed for an unintentional, unilateral transfer of steering control” (Ref (b), p. 39). This is not mentioned by the Navy investigation report.

In the next several posts, I will provide a detailed review of the sequence of events prior to the collision with comments, identify deficiencies or gaps in both reports, and add some final thoughts about what can be learned from the event for High Reliability Organizing (HRO). What I won’t do (if I can help it) is use hindsight to tell you what the operators should have done.

Karl Weick argued (Weick, 1990) that after many systems accidents (Perrow, 1984) or organizational accidents (Reason, 1997), we know a lot about how many people died and who went to jail. We know much less about the processes by which these crises are set in motion. Weick wrote that we lack an understanding of ways in which separate small failures become linked. We know that single cause incidents are rare, but we don't know how small events can become linked to produce a disastrous outcome. What I seek to accomplish in this series of posts is draw attention to the small things that aren’t obvious from the reports and their implications for HRO.

* Weick, K. E. (1990). The vulnerable system: An analysis of the Tenerife air disaster. Journal of management, 16(3), 571-593.

HRO 9b Collision at Sea-Sequence of Events1

HRO 9b Collision at Sea-Sequence of Events1

HRO8 HRO Roles

HRO8 HRO Roles