Future requirements for computing speed, system reliability, and cost-effectiveness entail the development of alternative computers to replace the traditional von Neumann organization. As computing networks come into being, one of the latest dreams is now possible - distributed computing.
Distributed computing brings transparent access to as much computer power and data as the user needs for accomplishing any given task - simultaneously achieving high performance and reliability.
The subject of distributed computing is diverse, and many researchers are investigating various issues concerning the structure of hardware and the design of distributed software. Distributed System Design defines a distributed system as one that looks to its users like an ordinary system, but runs on a set of autonomous processing elements (PEs) where each PE has a separate physical memory space and the message transmission delay is not negligible. With close cooperation among these PEs, the system supports an arbitrary number of processes and dynamic extensions.
Distributed System Design outlines the main motivations for building a distributed system, including:
inherently distributed applications
performance/cost
resource sharing
flexibility and extendibility
availability and fault tolerance
scalability
Presenting basic concepts, problems, and possible solutions, this reference serves graduate students in distributed system design as well as computer professionals analyzing and designing distributed/open/parallel systems.
Chapters discuss:
the scope of distributed computing systems
general distributed programming languages and a CSP-like distributed control description language (DCDL)
expressing parallelism, interprocess communication and synchronization, and fault-tolerant design
two approaches describing a distributed system: the time-space view and the interleaving view
mutual exclusion and related issues, including election, bidding, and self-stabilization
prevention and detection of deadlock
reliability, safety, and security as well as various methods of handling node, communication, Byzantine, and software faults
efficient interprocessor communication mechanisms as well as these mechanisms without specific constraints, such as adaptiveness, deadlock-freedom, and fault-tolerance
virtual channels and virtual networks
load distribution problems
synchronization of access to shared data while supporting a high degree of concurrency