Fault Diagnosis in Multiprocessor Systems (Reprise)

Anton Dahbura, Johns Hopkins University

Several results are given for the problem of identifying the set of faulty processors in a multiprocessor system on the basis of a given collection of test results performed by the processors of the systems on one another.

For the general case of bounded combinations of permanent and intermittent faults, known as hybrid fault situations, necessary and sufficient conditions are given for identifying a processor as faulty in spite of unapplied tests and intermittencies. Based on this approach, a design for intermittent/transient-upset tolerant systems is given. For the special case of all permanent faults, a class of systems is characterized in which the set of faulty processors can be identified in a straightforward manner based on any given collection of test results. Finally, it is shown that the classic tp-diagnosable systems introduced in the 1960s by Preparata, Metze and Chien, possess heretofore unknown graph-theoretic properties relative to minimum vertex cover sets and maximum matchings. An O(n2.5) algorithm is given which exploits these properties to identify the set of faulty processors in a tp-diagnosable system.

Note: This talk is a 30th-anniversary reprise of my PhD dissertation defense.

Speaker Biography

Anton (Tony) Dahbura received the BSEE, MSEE, and PhD in Electrical Engineering and Computer Science from the Johns Hopkins University in 1981, 1982, and 1984, respectively. From 1983 until 1996 he was a researcher at AT&T Bell Laboratories, was an Invited Lecturer in the Department of Computer Science at Princeton University, and served as Research Director of the Motorola Cambridge Research Center in Cambridge, Massachusetts. From 1996-1999 he was a consultant to Digital Equipment Corporation’s (now HP) Cambridge Research Laboratory where he pioneered research and development in mobile, wireless, and wearable computing. From 1996-2012 he served at Hub Labels, Inc. as Corporate Vice President. In January, 2012 he was named Interim Executive Director of the Johns Hopkins University Information Security Institute in Baltimore. From 2000-2002 he served as Chair of the Johns Hopkins University Engineering Alumni and in 2004 was the recipient of the Johns Hopkins Heritage Award for his service to the University. He chaired The Johns Hopkins Computer Science Department Advisory Board from 1998 until 2012 and also served on the Johns Hopkins University Whiting School of Engineering National Advisory Council during that time.