RES Node Policies - BSC-CNS

Anuncio
Rendimiento y monitorización
RED ESPAÑOLA DE
SUPERCOMPUTACIÓN
-Operations Department
-Barcelona Supercomputing Center
Foreword
All Information contained in this document refers to BSC´s & RES´s
internal proceedings/scripts/developments. This information is
confidential and should not be published nor distributed.
2
Index
●
●
●
●
Introduction
RES node architecture
RES node policies
Monitorización
3
Introduction
● Resource Manager
● Handles any allocatable resource (check, start application,
stop application, ...)
● Scheduler
● Decides which job to run at every moment in base of priorities
and policies defined
● IBM´s LoadLeveler was our de-facto (Resource Manager +
Scheduler solution)
● Since June 2007 MareNostrum production tools are:
● Slurm as Resource Manager (OpenSource)
● Moab as Scheduler (from ClusterResources)
4
Index
●
●
●
●
Introduction
RES Node Architecture
RES Node Policies
Monitorización
5
RES Node Architecture
SYSTEM ARCHITECTURE
Head node
Cluster Management
Users` job control commands
Blade Centers
Servers
Login nodes
GPFS
6
RES Node Architecture
COMPONENTS DEPLOYED
Servers
Blade Centers
Head node
Cluster Management
Moab
SlurmCtld
User’s job control commands
slurmd
slurmd
slurmd
slurmd
slurmd
slurmd
slurmd
slurmd
Login nodes
slurmd
slurmd
slurmd
GPFS
7
Index
●
●
●
●
Introduction
RES Node Architecture
RES Node Policies
Monitorización
8
RES Node Policies
INTRODUCTION
● MareNostrum´s CPU time is divided and prioritized ensuring access for:
● Access Committee assigned projects (70%)
● Site own projects (20%)
● Other (10%)
● Scheduling policies should guarantee this consumption at the end of
every period and year
9
RES Node Policies
ACCESS COMMITTEE
● For every project, Scientific Committee provides:
● # Number of hours –in thousands● Class of hours:
● A - maximum priority, should be executed before the rest
● B - if there are no A jobs, or filling the gaps
● To accomplish this BSC:
● Defines internal ‘Class C’
● for those users that wasted all their A and/or B time
● only run if there are no suitable A or B jobs on queue
● Establishes manual Priority Management Rules:
● «One ‘A+B’ project that wastes A, is moved to B»
● «One only ‘A’ or ‘B’ project that wastes all its time, is moved to C»
10
RES Node Policies
JOB PRIORITY MODEL
● To evaluate priority weights from components:
CREDENTIAL + FAIR-SHARING + SERVICE
11
RES Node Policies
CREDENTIALS - JOB PRIORITY MODEL
● To evaluate priority weights from components:
CREDENTIAL + FAIR-SHARING + SERVICE
CREDWEIGHT 1
QOSWEIGHT
1000
GROUPWEIGHT 10
USERWEIGHT
1
This sets priority depending on the:
* Group
* User
* Quality of Service
12
RES Node Policies
FAIR-SHARE - JOB PRIORITY MODEL
● To evaluate priority weights from components:
CREDENTIAL + FAIR-SHARING + SERVICE
FSWEIGHT 100
FSUSERWEIGHT
1
FSGROUPWEIGHT 10
FSINTERVAL
07:00:00:00
FSDEPTH
16
FSDECAY
0.95
FSPOLICY
DEDICATEDPES
FSTREEISPROPORTIONAL TRUE
13
RES Node Policies
FAIR-SHARE TREE - COMMITTEE BRANCH
Root
70
20
projects
1000
class_a
10
other
bsc
100
2
class_c
class_b
Initial Group Share == # thousand hours from Access Committee
14
RES Node Policies
SERVICE - JOB PRIORITY MODEL
● To evaluate priority weights from components:
CREDENTIAL + FAIR-SHARING + SERVICE
SERVICEWEIGHT 1
QUEUETIMEWEIGHT 100
This sets priority depending on the time the job has spent in the queue
15
Index
●
●
●
●
Introduction
RES Node Architecture
RES Node Policies
Monitorización
16
Necesidades básicas - Monitorización
● Monitorización de sistema
● Diagnósticos (detección de anomalías)
● Monitorización de aplicaciones
● Estado de las ejecuciones (rendimiento)
● Contabilidad
● Fuentes
● Software específico (Ganglia)
● Sistema de colas
● Software propio
● Frecuencia
● Elevada, pero sin excesos
● Minimización de interferencias con la ejecución
● Inicio y final de las ejecuciones
Centro Nacional de Supercomputación
17
Herramientas – Monitorización de
sistema
● Ganglia
● Monitorización de sistema
● Carga cpu
● Uso de memoria/swap
● Uso de red
●…
● Posibilidad de envío de información adicional
● Desde scripts
● Componentes
● Gmond – daemon local
● Gmetad – recolector remoto
● Interfaz web
Centro Nacional de Supercomputación
18
Herramientas – Monitorización de
sistema
● Ganglia
● Puntos fuertes
● Daemon local ligero
● Fácilmente modificable (open source)
● Puntos débiles
● Broadcast de información
● Recolector no fácilmente escalable
● Modificaciones BSC-CNS
● Modificación Gmond (métricas adicionales)
● Generación automàtica de configuración
● Limitación de broadcast a blade center
● Desarrollo de un recolector escalable
● Desarrollo de herramientas de consulta
Centro Nacional de Supercomputación
19
Herramientas – entorno de ejecución
● Desarrollos en el BSC-CNS
● Prólogo
● Verificación del estado del nodo
● Drivers, red, sistemas de ficheros, hardware, …
● Cancelación automática del trabajo en caso de fallo
● Extracción del nodo del sistema de colas en caso de fallo
● Propagación de información al script inicial del usuario a través
de variables de entorno
● Nodo master, lista de nodos
● Generación de información de contabilidad
● Epílogo
● Localización y eliminación de procesos de usuario
● Verificación del estado del nodo y reconfiguración en caso
necesario
Centro Nacional de Supercomputación
20
Thank you !
www.bsc.es
21
Descargar