lago distributed network of data repositories

Anuncio
LAGO DISTRIBUTED NETWORK OF DATA
REPOSITORIES
H. Asorey1,2 , A. Martínez-Méndez*3,4 , L.A. Núñez3,5 and A. Valbuena-Delgado3
for the LAGO Collaboration1
1 Laboratorio Detección de Partículas y Radiación, Centro Atómico Bariloche & Instituto Balseiro, Bariloche, Argentina
2 Sede Andina, Universidad Nacional de Río Negro, S.C. de Bariloche, Argentina.
3 Escuela de Física, Universidad Industrial de Santander, Bucaramanga, Colombia.
4 Escuela de Ingeniería de Sistemas, Universidad Industrial de Santander, Bucaramanga, Colombia.
5 Departamento de Física, Universidad de Los Andes, Mérida, Venezuela.
6 http://lagoproject.org, see the full list of members and institutions at http://lagoproject.org/collab.html
∗ Speaker
October 06, 2016
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
1
Summary
1
The LAGO (Latin American Giant Observatory) Collaboration
2
LAGO-Virtual
The Dart Challenge
Data curation
LAGOData metadata
PID and LAGOData
OAI-PMH Protocol
lagoproject.uis.edu.co
3
Dspace Extensions
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
2
The LAGO Collaboration
Research field
high energy phenomena
space weather
atmospheric radiation at ground level
Collaboration
More than eighty Ibero-American
researchers
More than thirty Institutions
10 Countries
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
3
the Latin American Astroparticle Network
LAGO Data
Two types of data : Measured
and simulated
Measured data : 4 quality
levels : Raw data, preliminary,
data quality and high quality
Massive data production :
raw (1.5 TB/year/detector)
sims (3 TB/year/site)
LAGO data challenge : DART
(Data Accesibility,
Reproducibility and
Trustworthiness) inititative
Deploying LAGO-CORSIKA
implementation on GRID
1
1. Asorey H. Et al "LAGO : the Latin American Giant Observatory" 2015
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
4
LAGO Astroparticle Network
2
2. Asorey H. Et al "Data accessibility, reproducibility and trustworthiness with lago data repository" Proceedings
of "The 34th International Cosmic Ray Conference, PoS(ICRC2015)672,2015
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
5
The Dart Challenge
DART Challenge
The DART (Data Accesibility, Reproducibility and Trustworthiness) challenge was
launched by CHAIN-REDS (Coordination and Harmonisation of Advanced
e-infrastructure for Research and Education Data Sharing) to allow researchers to
access and efficiently use worldwide distributed resources (i.e., computing, storage,
data, services, tools, applications)
F IGURE – The DART challenge. (Barbera, R. Et al "CHAIN-REDS DART Challenge")
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
6
The Dart Challenge
Accesibility
F IGURE – Asorey, H. Et al “Data accessibility, reproducibility and trustworthiness with lago data
repository” Proceedings of The 34th International Cosmic Ray Conference, PoS(ICRC2015)672,
2015
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
7
The Dart Challenge
Reproducibility
F IGURE – Asorey, H. Et al “Data accessibility, reproducibility and trustworthiness with lago data
repository” Proceedings of "The 34th Interna,onal Cosmic Ray Conference, PoS(ICRC2015)672,
2015
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
8
Data curation
Data curation through DSpace
Dspace
Open source software
Many types of content
Dublin Core metadata schema
Support OAI-PMH, protocol for
metadata harvesting
Generally used for institutional
repositories (Libraries)
F IGURE – Usage of Open Access Repository
Software, Worldwide(www.opendoar.org)
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
9
Data curation
LAGOData metadata
"Metadata", data about data. Metadata
articulates a context for objects of interest –
"resources" such as MP3 files, library
books, or satellite images – in the form of
"resource descriptions"
[http://dublincore.org]
Dublin Core Base Schema
Author
Date (Available, Accessioned, Issued)
Identifier
Type
Title
Lago Schema
Data Version ; Channel Trigger Level ; PMT
Voltage ; GPS ; Item Site (Latitude,
Longitude, Altitude...)
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
10
Data curation
PID and LAGOData
To provide, resolve and mint persistent identifiers (PID), LAGOData use the GRNET
Handle service, one of its advantages is the ability to cover a wide range of Digital
objects
PID Syntaxt
< PREFIX > / < SUFFIX > (e.g. http ://hdl.grnet.gr :8002/11456/LAGO-104)
F IGURE – PID Service by GRNET(Image from Clarin Project (http://www.clarin.eu))
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
11
Data curation
OAI-PMH Protocol
The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for
harvesting records containing metadata from repositories.
Data Provider
Service Provider
A Data Provider maintains one or more
repositories (web servers) that support the
OAI-PMH as a means of exposing
metadata.
A Service Provider issues OAI-PMH
requests to data providers and uses the
metadata as a basis for building
value-added services.
F IGURE – Basic functioning of OAI-PMH(Image from Clarin Project (http://www.clarin.eu))
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
12
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
13
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
14
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
15
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
16
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
17
Data curation
lagoproject.uis.edu.co
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
18
DSpace Extensions
To potentiate the functionality of the software and meet specific needs the following
features are on production :
Uploads : Django Module to upload multiple records to the server. Script on
Backend to ingest records into repository
Downloads : jsp modified interface to allow multiple downloads
Standardization : A VM(Virtual Machine) with a default installation.
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
19
Remarks
High volumes of data
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
20
Remarks
High volumes of data
Access
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
20
Remarks
High volumes of data
Access
Authorship
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
20
Remarks
High volumes of data
Access
Authorship
Potentiation
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
20
Thanks...
H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the
Lago
LAGO
Distributed
Collaboration
NetworkUniversidad
Of Data Repositories
Industrial de Santander
October 06, 2016
21
Descargar