Overview of
Distributed Computing
[ One of the goals of this chapter is to have it grow into the definative
source of distributed computing information. Such a source of information
is does not currently exist on the net. Think mini-textbook. ]
Overview
Computers started being connected to one another through networks so that data
such as files and email could be exchanged. Over time, more and more of the
computer's capabilities are being shared over the network creating the realm
of distributed computing.
Distributed computing is simply applying the two old sayings to the realm of
computer resources. The first, "Many hands make light work" refers to the idea
that you can take a task and break it down so that many different people or
computers can work on it at the same time. Then it only takes a small amount
of work by each computer (or human) to complete the bigger task.
The second saying, "The whole is greater than the sum of its parts" applies to
the fact that when a number of computers work in conjunction the result is
something that can not be achieved by the computers working alone. One of
the benefits that a distributed system has is that it can continue to operate
even if some of its parts are missing, this resulting in much better reliability
of the system than a single computer could ever provide. Working together a
group of humans can lift a stone that none of them could lift alone, working
together a group of computers can perform calculations none of them could have
completed alone during a lifetime.
A traditional operating system on a standalone computer controls the hardware
of that computer, and provides a nice abstracted interface to applications
that run on that computer. A network operating system works with a standalone
operating system to provide communication facilities to the applications that
run on that computer. A network operating system usually is not defined
separately but combined into the overall lump of things called the operating
system. An application running on a computer with network connectivity has
to know what other computers are out there and how to communicate with them.
A distributed operating system takes the abstraction to a higher level, and
allows hides from the application where things are. The application can use
things on any of many computers just as if it were one big computer.
A distributed operating system will also provide for some sort of security
across these multiple computers, as well as control the network communication
paths between them. A distributed operating system can be created by merging
these functions into the traditional operating system, or as another
abstraction layer on top of the traditional operating system and network
operating system.
Any operating system, including distributed operating systems, provide a number
of services. First, they control what application gets to use the CPU and
handle switching control between multiple applications. They also manage use
of RAM and disk storage. Controlling who has access to which resources of
the computer (or computers) is another issue that the operating system handles.
In the case of distributed systems, all of these items need to be coordinated
for multiple machines. As systems grow larger handling them can be complicated
by the fact that not one person controls all of the machines so the security
policies on one machine may not be the same as on another.
Some problems can be broken down into very tiny pieces of work that can be
done in parallel. Other problems are such that you need the results of step
one to do step two and the results of step two to do step three and so on.
These problems can not be broken down into as small of work units. Those
things that can be broken down into very small chunks of work are called
fine-grained and those that require larger chunks are called coarse-grain.
When distributing the work to be done on many CPUs there is a balancing act
to be followed. You don't want the chunk of work to be done be so small
that it takes too long to send the work to another CPU because then it is
quicker to just have a single CPU do the work, You also don't want the chunk
of work to be done to be too big of a chunk because then you can't spread it
out over enough machines to make the thing run quickly.
A computer system with multiple processors in a single machine can handle
very fine-grained problems well, while a system built from computers
distributed over the Internet can handle only coarse-grained problems.
Computer systems centered around a small fast local network can handle
problems somewhere in the middle.
Cosm is an example of a distributed system built on top of traditional
operating systems. It scales from handling systems on a local network
to systems built on the Internet. Cosm provides the programmer with a
framework for writing distributed programs (what constitutes the
cores in Cosm).
Another example of a distributed computing system is Beowolf. Unfortunately,
Beowolf is a well-known name, but it is not as well known when and where
Beowolf makes sense to use. Beowolf is designed to work with very fine-grained
problems, and only with a very small and very fast computer network. It does
not scale to thousands of machines working together or to slower communication
mechanisms such as going over the Internet.
v0.01 by Daniel Oelke
© Mithral Communications & Design Inc.
1995-2010.
All rights reserved.
Mithral® and Cosm® are trademarks of
Mithral Communications & Design Inc.