Skip to content

Reliability For Complex Artificial Intelligence Design

(Investigator(s): Robert J Marks II, PhD, and Daniel Diaz, PhD)

The more complex a system, the greater number of performance contingencies. Contingency growth can be exponential with respect to system complexity. Establishment of guidelines to mitigate the explosion of undesirable contingencies is the goal of this research. High end product performance reliability is the end goal of AI design ethics. Examples of undesirable contingencies in AI are ubiquitous.

Here is a partial list in order of increasing danger:

  • During IBM Watson’s ultimate win over two human contestants in the quiz show Jeopardy, a curious thing happened. A query under the category NAME THE DECADE was “The first modern crossword is published and Oreo cookies are introduced.” A human, Ken Jennings, was first to hit the buzzer and responded “What are the 20’s?” The quizmaster, Alex Trebek, ruled the response incorrect. IBM Watson then buzzed in and gave the same exact response. “What are the 20’s?” Alex Trebek responded “No. Ken said that.” The programmers of IBM Watson had made no provision for Watson to respond with an answer that had already been declared incorrect. Watson’s duplicate response was an unintended contingency of the IBM software.
  • A deep convolutional neural network was trained to detect wolves. After some curious results, programmers did some forensics and discovered there was undesirable bias in the training data. The pictures of wolves all contained snow. The picture of the misclassified bear also contained snow. In training, the neural network had learned the presence and absence of snow. The features of the animals were not considered in the classification problem.
  • A potentially-serious issue for self-driving cars is false classification of objects like plastic bags. A stationary plastic bag can be categorized as a large rock while a wind-blown plastic bag can be mistaken for a deer. These are unintended contingencies of the self-driving car’s software.
  • A more serious problem with self-driving cars is failure to sense pedestrians. In 2018, an Uber self-driving car in Tempe, Arizona struck and killed pedestrian Elaine Herzberg. The incident was an unintended contingency and preventable. Steven Shladover, a UC Berkeley research engineer, noted “I think the sensors on the vehicles should have seen the pedestrian well in advance.” The death was a tragic example of the an unintended contingency of a complex AI system. Unintended contingencies remain a major obstacle in the development of general (level 5) self-driving cars. Some developers, believing the problem in insurmountable, have given up.
  • During the height of the cold war, the Soviets deployed a satellite early warning system called Oko to watch for incoming missiles fired from the United States. On September 26, 1983, Oko detected incoming missiles. At a military base outside of Moscow, sirens blared and the Soviet officers were told by Oko to launch a thermonuclear counterstrike. Doing so would result in millions being killed. The officer in charge, Lieutenant Colonel Stanislav Petrov, felt something was fishy. After informing his superiors of his hunch Oko was not operating correctly, Petrov did not obey the Oko order. Upon further investigation, Oko was found to have mistakenly interpreted the sun reflecting off of clouds. There was no U.S. missile attack. Petrov’s skepticism of Oko’s alarm may have saved millions of lives.

Unintended contingencies are a serious problem in the development of AI. Contingency counts D–11 can increase exponentially with respect to sensor numbers and environmental types. The examples above illustrate unexpected and undesirable contingencies. How can proper performance of AI systems be guaranteed? Absolute guarantees are not possible but, in legal parlance, desired performance must be assured “beyond a reasonable doubt” in the case of potentially-serious consequences like the crash of a self-driving car. In less serious cases, like the proper response from an oral query made to Amazon’s Alexa, performance is acceptable when there is “a preponderance of evidence” of proper operation. Quantifying these guidelines is required. Doing so falls in the realm of reliability engineering.

Our initial results identifying sources of contingency explosion will be used to propose best practices for AI design. Exponential contingency growth is characteristic of conjunctive or tightly coupled design. Loosely couped systems, the result of disjunctive design, display linear contingency growth. Pure disjunctive design is typically not an option, so a compromise must be struck.

AI system reliability can potentially be measured by application of the idea of active information [48]. Active information measures the degree of domain expertise infused in a design and the use of the AI. There are three stages to consider: (1) design (2) test and (3) deploy. Each level requires a variant form of domain expertise. There are many useful and needed guidelines achievable from our work that will improve designing, testing, and deploying advanced AI systems.