On Synchronizing Distributed Activities Despite Transient and Byzantine Failures

Danny Dolev

We address the problem of designing algorithms that self-stabilize while at the same time tolerate an eventual fraction of permanent Byzantine failures. The presence of Byzantine nodes presents a special challenge due to the “ambition” of malicious nodes to hamper stabilization when the system tries to recover from a corrupted “transient” state.

We present a distributed “Pulse Synchronization” protocol, which targets at invoking regular and tightly synchronized pulses in such an environment, without using any assumption about the existence of any external timing reference. The system may be in an arbitrary state in which the communication network may behave arbitrarily and in which any number of the nodes may behave arbitrarily. The pulse synchronization algorithm will eventually converge, once the communication network resumes delivering messages within a bounded time (some d time units), and the fraction of permanent Byzantine nodes, f, obeys the n >= 3f+1 ratio (for a network of size n). The attained pulse synchronization tightness is 3d with a convergence time of O(f) communication rounds.

In the talk we present a Clock Synchronization algorithm that takes advantage of the Pulse Synchronization protocol. We will also discuss a general scheme to convert a Byzantine algorithm into a self-stabilizing Byzantine algorithm, using the Pulse Synchronization protocol.