Distributed Operating Systems

Anuncio
Distributed (Operating) Systems
Introduction
Schedule
Sessions
1. Introduction: Distributed systems
(Hardware/Software issues)
2. Process management in clusters: Load
balancing and job scheduling
3. Distributed communications
4. Distributed services
Scenarios
• High-performance solutions for scientific
applications (process management)
• Distributed systems for transactional
services
Distributed Operating Systems
1
Mon
Tue
8:00
3-comm
9:00
10:00
1-Intro
4-serv
11:00
12:00
LUNCH
13:00
14:00
2-proc
Scenario 2
15:00
16:00
Scenario 1
17:00
María S. Pérez
Fernando Pérez
José María Peña
Bibliography
• Distributed Systems: Concepts and Design
G. Coulouris, J. Dollimore, T. Kindberg; Addison-Wesley, 2001
• Distributed Systems: Principles and Paradigms
A. S. Tanenbaum, M. Van Steen; Prentice-Hall, 2007
• Distributed Operating Systems: Concepts & Practice
D. L. Galli; Prentice-Hall, 2000
• Distributed Operating Systems & Algorithms
R. Chow, T. Johnson; Addison-Wesley, 1997
• Distributed Computing: Principles and Applications
M.L. Liu; Addison-Wesley, 2004
Distributed Operating Systems
2
María S. Pérez
Fernando Pérez
José María Peña
Distributed (Operating) Systems
Introduction and
Concepts
Distributed System (DS)
• Hardware: Network-connected processor without shared
physical memory:
–
–
–
–
–
Loosely-coupled system
Non-common clock
Processor-dependent I/O systems
Independent failures of system components
Heterogeneous system
• Goal of this seminar: Distributed System Software
– Distributed Operating Systems (classical view)
– Software interface that hide distributed system complexity:
• Single System Image
Distributed Operating Systems
4
María S. Pérez
Fernando Pérez
José María Peña
Advantages and Drawbacks
• Advantages:
–
–
–
–
–
–
Cost/performance ratio
Parallel processing: high performance
Fault tolerance: high availability
Scalable, open and heterogeneous
Most appropriate for originally distributed applications
E.g., geographically distributed enterprise
• Drawbacks:
– More complex software development
– Networks connection problems: latency, bandwidth and availability
– Security
Distributed Operating Systems
5
María S. Pérez
Fernando Pérez
José María Peña
New Paradigms for DS
• Cluster Computing:
– Dedicated systems:
• High performance.
• High availability.
– Homogeneous system:
• Nodes.
• LAN (generalist or specific).
– Open issues: Coupling degree, distributed services.
• Gird Computing:
– Resource sharing and idle processor usage.
– Restricted to some specific tasks.
– Different scopes:
• Inter-departmental grids.
• Inter-organization grids.
– Open issues: Coordination, security and dynamic changes.
Distributed Operating Systems
6
María S. Pérez
Fernando Pérez
José María Peña
Operating System Support
1.
OS for Distributed Systems:
•
•
2.
3.
Requirements
Characteristics
Distributed Systems
Parallel/Distributed OS:
•
•
•
Operating Systems Parallelisation
Distributed System Services
Microkernels
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
7
Distributed Architectures
A distributed system is a collection of independent computers
presented to the user as a single computer.
Distributed Computer Architectures:
– Flynn’72: SISD, SIMD, MISD, MIMD
– Johnson’88: UMA, NUMA, NORMA
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
8
Distributed System Application
• Internet Services: e-mail, news, web, ...
• Corporate networks or intranets.
• Parallel processing:
– Massive processing (+efficiency).
– Distributed topology (distributed-nature problems)
•
•
•
•
Distributed massive data management.
High performance multimedia.
Industrial and control systems.
Real-time systems.
<and many others...>
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
9
Distributed System Profile
Distributed systems have:
1.
2.
3.
No common clock: Message and co-ordination aspects.
Global concurrency: Real parallel execution.
Independent failures: Partial failures.
Distributed system usage:
1.
2.
Collaborative processing: combined features and services.
Parallel processing: massive or high-performance calculation.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
10
System Requirements
Collaborative systems
–
–
–
–
–
Openness
Scalability
Reliability
Transparency
Security
Parallel systems
–
–
–
–
–
Performance
Scalability
Reliability
Transparency
Security
Common characteristics but different hardware
platforms and applications.
All of them DISTRIBUTED
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
11
Operating System Distribution
• Operating systems for multiprocessors with shared memory
(SMP):
– Software tightly coupled
– Hardware tightly coupled
• Distributed operating systems (DOS):
– Software tightly coupled
– Hardware loosely coupled
• Network operating system:
– Software loosely coupled
– Hardware loosely coupled
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
12
Operating Systems for SMPs
Architectures with multiple processors (2 to 8) with uniform
access shared memory (SMP: Symmetric Multiprocessors)
Characteristics:
–
–
–
–
–
“Small” variations of the traditional OS versions.
There is only one copy of the OS.
Concurrency with real parallelism (≠ shared time).
Commercial versions (Linux, WinNT, Solaris, AIX, ...).
Different problems: kernel code running on multiple processors
(concurrent system calls), synchronisation mechanisms (spin-locks),
optimisation and scheduling (processor affinity), ...
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
13
Distributed Operating Systems (DOS)
A distributed operating system is a group of processor
interconnected by a communication network that hides its
complexity presenting to the user a “virtual uniprocessor”.
Characteristics:
– It runs on a distributed systems making them appear as a
centralised system.
– Transparency: Must hide complex factor of the distribution.
– It is easier to say than to do.
– This goal is reached partially by the experimental systems.
– Failures make the users comply.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
14
Distributed Operating Systems (DOS)
Problems:
– Each node has a copy of the OS: Which tasks are performed locally
and which globally?
– How mutual exclusion is achieved without shared memory?
– How deadlocks are detected without global states?
– Process scheduling: Each operating system copy has an own task
queue (process migration).
– How a single directory tree is defined?
– Problems due to no-common clock, partial failures and heterogeneity.
Main result:
– New concepts have been developed and they are useful for other
domains.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
15
DOS Evolution
• First network operating systems:
– New network services in a conventional OS
– E.g.: UNIX 4BSD (≈1980)
• New network functionalities:
– Sun’s ONC (≈1985): includes NFS, RPC, NIS
• First DOS:
– New OS based on conventional (monolithic) versions.
– E.g.: Sprite, University of Berkeley (≈1988)
• DOS based on μ-kernel. E.g.:
– Mach, CMU (≈1986)
– Amoeba, designed by Tanenbaum (≈1984)
– Chorus, INRIA, France (≈1988)
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
16
Network Operating Systems
Network of computers loosely coupled that share resources with
no external control on the hardware/software of each node.
Characteristics:
–
–
–
–
No virtual uniprocessor vision is presented (independent nodes).
Each node runs a copy of the OS (different).
Conventional OS+ network utilities.
Communication protocols for resource sharing and high-level service
access.
– From rcp/rlogin to Sun’s Open Network Computing (ONC).
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
17
Cooperative Systems
High-level services-oriented software systems that requires
communication mechanisms to build upper level services.
Characteristics:
– A grade of transparency is provided but the single-system vision is
not presented. Autonomous independent systems.
– They are founded on middlewares (CORBA, DCE, COM+, ...)
– These systems are designed as a combination of multiple services
offered by different network elements.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
18
Middleware
Middleware:
– Software layer over the operating system that provides standard
distributed services.
– Open systems independent of the vendor.
– Hardware and OS independent.
Examples:
– DCE (Open Group).
– CORBA (OMG).
– ...
Middleware
OS
OS
OS
Hardware
Hardware
Hardware
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
19
Single System Image (SSI)
The illusion, created by hardware/software, that presents a
collection of resources as one.
– Hardware SSI: DEC Memory Channel or SMPs
– Operating System: DOS or Gluing layer
– Application and Services: Middlewares (many levels).
Every SSI has a boundary.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
20
Why SSI is useful?
• It is easy to program/use:
– Traditional programming, known interfaces.
– Low-level issues hidden.
• Allows centralized and distributed management depending on
task requirement.
• (Potentially) provides:
– Fault tolerance.
– Scalability.
– Modular improvement.
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
21
Operating System Layers
A simplified vision of an Operating System has the following
layers:
•
•
•
•
•
Hardware.
Kernel.
System services.
Application programs.
Users.
Users
Applications
Services
Kernel
Hardware
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
22
Kernel Responsabilities
Monolithic Kernels:
Services
Kernel
Many OS functionalities inside the kernel
scheduler, memory manager, drivers, file systems...
Computer
μ−Kernels:
Many OS tasks are performed outside the kernel.
Remaining: (i) process communication, (ii) memory
management, (iii) low-level management and
scheduling y (iv) low-level i/o
Services
μ−Kernel
Computer
Distributed Services:
Services
μ−Kernel
μ−Kernel
μ−Kernel
Operating System Support
Distributed system structure. Depending on
the level: Distributed operating systems
Network operating systems or (Cooperative).
María S. Pérez
Fernando Pérez
José María Peña
23
Operating System on Distributed Systems
MPPs
SMPs
Clusters
Distributed
Size
100s – 1000s
10s
100s or less
10s – 1000s
OS
N x kernels
Single OS kernel
N x OS platforms
N x OS platforms
OS type
Specific purpose
Special variants
of standard OSs
Standard OS plus
tools (not always)
Standard OS and
special tools
Communic.
Message / DSM
Shared Memory
Message passing
(e.g.: MPI)
Message passing
or middleware
Scheduling
Single queue
Single queue
Multiple queues
coordinated
Independent
queues
Single System Image (SSI)
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
24
Tools for Distributed/Cluster Systems
• Operating system:
– Modular/Layered Monolithic
– Based on μ-Kernels
• Runtime systems:
– Parallel file systems or I/O libaries
– Distributed shared memory software
• Resource management:
– Process scheduling tools
– Load balancing
• Applications:
– Management and administration tools.
– Processing tasks and jobs
Operating System Support
María S. Pérez
Fernando Pérez
José María Peña
25
Distributed Operating Systems
Hardware and
Software Overview
María S. Pérez
Fernando Pérez
José María Peña
Concept of Cluster
• Alternative to traditional supercomputing facilities.
• Instead of traditional systems:
–
–
–
–
Specific hardware.
High-cost.
Slow hardware development.
Painful software development.
• the use of general-purpose systems provides:
–
–
–
–
Commodity hardware (Commercial-off-the-self: COTS).
Moderate-cost.
Fast hardware development.
Even more painful software development.
Distributed Operating Systems
27
María S. Pérez
Fernando Pérez
José María Peña
Concept of Cluster
Cluster: Hardware system based on commodity hardware
connected by a dedicated (high-performance) network.
– Nodes: PCs or workstations (SMPs).
– Network: From high-speed networks to specific hardware.
Mysterious acronyms:
–
–
–
–
–
PoPCs: Pile of PCs
COWs: Clusters of workstations
CLUMPS: Clusters of multiprocessors
NOWs: Networks of workstations
....
Distributed Operating Systems
28
María S. Pérez
Fernando Pérez
José María Peña
Hardware Characteristics
• Nodes:
– Processor: Intel Pentium, AMD Athlon, Compaq Alpha, IBM
PowerPC, Sun SuperSparc (3-4...Ghz)
– Memory: SDRAM, DDR or similar (2-8 GB)
– Storage: SCSI or RAID
• Network:
– Key element.
– It could cost 50+% of the system value
– Cheap alternative: Ethernet (100-1000Mb/seg)
Distributed Operating Systems
29
María S. Pérez
Fernando Pérez
José María Peña
Cluster Networks (I)
• General purpose network technologies:
– Improvement in network bandwidth.
– Only reduced improvements in the latency
Å Not well-suited
• Low-latency protocols:
–
–
–
–
–
Active Messages (Berkeley): “Zero-copy” synchronous model. GAM.
Fast Messages (Illinois): Reliable AM in order.
VMMC (Princeton): Distributed shared memory pages (DSM).
U-net (Cornell): Virtual interfaces for memory pages.
BIP (ENS Lyon): Low-latency basic interface.
Distributed Operating Systems
30
María S. Pérez
Fernando Pérez
José María Peña
Cluster Networks (II)
• Cluster communication standards:
– VIA: Hardware interface (native/emulated) for communications. Mpas
physical memory regions and virtual network interfaces. MPI versions
over VIA.
– InfiniBand: I/O hardware standard (2.5Gbps) using one-way
connections. 6 Communication models. Using RDMA and IPv6.
• Network hardware:
– Ethernet, FastEthernet, GigaEthernet: Cheap but limited. Collision
problems. VIA emulations.
– Giganet (cLAN): Implementation over VIA (1.26Gbps)
– Myrinet: Low-latency programmable networks. Cut-through routing
and failure detection. GM protocol.
– Others: QsNet, ServerNet, SCI, ATM, FiberChannel, HIPPI, ATOLL,...
Distributed Operating Systems
31
María S. Pérez
Fernando Pérez
José María Peña
Technologies Comparative
Gigabit
Ethernet
Giganet
Myrinet
QsNet
SCI
ServerNet2
35-50
105
140
208
80
65
MPI latency (μseg)
100-200
20-40
~18
5
6
20.2
Maximum number of
nodes
1000’s
1000’s
1000’s
1000’s
1000’s
64k
VIA support
Win/Linux
Win/Linux
Over GM
NOne
Software
Hardware
MPI support type
MPICH
over MVIA
or TCP
Thrird
parties
Thrird
parties
Quadrics or
Compaq
Thrird
parties
Compaq or
Thrird
parties
MPI badwidth – stable
(MB/sec)
© Amy Apon / Mark Baker 2000
Distributed Operating Systems
32
María S. Pérez
Fernando Pérez
José María Peña
Software Development (I)
• Operating Systems:
– Linux:
• Free, cheap, fast and fast-development.
• e.g., Beowulf
– Solaris:
• Good parallelism support and good network services.
• e.g., Solaris MC
– AIX:
• Powerful and well-optimized software development tools.
• e.g., SP2
– Windows:
• Why not?
• e.g., Wolfpack
Distributed Operating Systems
33
María S. Pérez
Fernando Pérez
José María Peña
Software Development (II)
• Middleware and SSI:
– SSI (Single System Image): The whole cluster is presented as a
single monoprocessor.
– Layered development:
• Hardware (Local).
• Operating system (μkernel) or gluing level: GLUnix or MOSIX
• Application, services and middleware: CODINE
– Common services (desirable):
•Single access point.
•Single file hierarchy.
•Single management point.
•Single network connection.
•Single work-management service.
Distributed Operating Systems
34
•Single user interface
•Single I/O space
•Single process space
•Checkpointing.
•Process migration
María S. Pérez
Fernando Pérez
José María Peña
Software Development (III)
• Programming tools:
– Thread support: Pthreads or OpenMP
– Message passing in clusters:
• MPI: MPICH or LANMPI.
• PVM: Worse performance but more features.
– DSM: Distributed shared memory:
• Software: TreadMarks, Linda or Nanos
• Hardware: DASH or Merlin
– Parallel debuggers
– Instrumentation tools.
Distributed Operating Systems
35
María S. Pérez
Fernando Pérez
José María Peña
Software Development (IV)
• Administration tools:
– Remote management:
•
•
•
•
Administrative commands: install software, copy files.
Process-level resource management.
User list and other system information: NIS.
e.g., SP2 tools, Cluster Command & Control (C3)
– Scheduling systems:
• Work queues and workload management
• Resource supervision.
• e.g., CODINE, CONDORPBS (Portable Batch System)
Distributed Operating Systems
36
María S. Pérez
Fernando Pérez
José María Peña
Input/Output System
• I/O Crisis:
– Exponential growth of CPUs power (Moore’s law).
– I/O systems much smaller growth.
– I/O phase is the actual bottleneck of high-performance systems.
• Solution based on I/O parallelism:
– Parallel I/O systems: MPI I/O
– Parallel filesystems: ParFiSys, GPFS
– Intelligent I/O: Armada, Panda
Distributed Operating Systems
37
María S. Pérez
Fernando Pérez
José María Peña
Descargar