LAGO DISTRIBUTED NETWORK OF DATA REPOSITORIES H. Asorey1,2 , A. Martínez-Méndez*3,4 , L.A. Núñez3,5 and A. Valbuena-Delgado3 for the LAGO Collaboration1 1 Laboratorio Detección de Partículas y Radiación, Centro Atómico Bariloche & Instituto Balseiro, Bariloche, Argentina 2 Sede Andina, Universidad Nacional de Río Negro, S.C. de Bariloche, Argentina. 3 Escuela de Física, Universidad Industrial de Santander, Bucaramanga, Colombia. 4 Escuela de Ingeniería de Sistemas, Universidad Industrial de Santander, Bucaramanga, Colombia. 5 Departamento de Física, Universidad de Los Andes, Mérida, Venezuela. 6 http://lagoproject.org, see the full list of members and institutions at http://lagoproject.org/collab.html ∗ Speaker October 06, 2016 H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 1 Summary 1 The LAGO (Latin American Giant Observatory) Collaboration 2 LAGO-Virtual The Dart Challenge Data curation LAGOData metadata PID and LAGOData OAI-PMH Protocol lagoproject.uis.edu.co 3 Dspace Extensions H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 2 The LAGO Collaboration Research field high energy phenomena space weather atmospheric radiation at ground level Collaboration More than eighty Ibero-American researchers More than thirty Institutions 10 Countries H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 3 the Latin American Astroparticle Network LAGO Data Two types of data : Measured and simulated Measured data : 4 quality levels : Raw data, preliminary, data quality and high quality Massive data production : raw (1.5 TB/year/detector) sims (3 TB/year/site) LAGO data challenge : DART (Data Accesibility, Reproducibility and Trustworthiness) inititative Deploying LAGO-CORSIKA implementation on GRID 1 1. Asorey H. Et al "LAGO : the Latin American Giant Observatory" 2015 H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 4 LAGO Astroparticle Network 2 2. Asorey H. Et al "Data accessibility, reproducibility and trustworthiness with lago data repository" Proceedings of "The 34th International Cosmic Ray Conference, PoS(ICRC2015)672,2015 H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 5 The Dart Challenge DART Challenge The DART (Data Accesibility, Reproducibility and Trustworthiness) challenge was launched by CHAIN-REDS (Coordination and Harmonisation of Advanced e-infrastructure for Research and Education Data Sharing) to allow researchers to access and efficiently use worldwide distributed resources (i.e., computing, storage, data, services, tools, applications) F IGURE – The DART challenge. (Barbera, R. Et al "CHAIN-REDS DART Challenge") H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 6 The Dart Challenge Accesibility F IGURE – Asorey, H. Et al “Data accessibility, reproducibility and trustworthiness with lago data repository” Proceedings of The 34th International Cosmic Ray Conference, PoS(ICRC2015)672, 2015 H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 7 The Dart Challenge Reproducibility F IGURE – Asorey, H. Et al “Data accessibility, reproducibility and trustworthiness with lago data repository” Proceedings of "The 34th Interna,onal Cosmic Ray Conference, PoS(ICRC2015)672, 2015 H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 8 Data curation Data curation through DSpace Dspace Open source software Many types of content Dublin Core metadata schema Support OAI-PMH, protocol for metadata harvesting Generally used for institutional repositories (Libraries) F IGURE – Usage of Open Access Repository Software, Worldwide(www.opendoar.org) H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 9 Data curation LAGOData metadata "Metadata", data about data. Metadata articulates a context for objects of interest – "resources" such as MP3 files, library books, or satellite images – in the form of "resource descriptions" [http://dublincore.org] Dublin Core Base Schema Author Date (Available, Accessioned, Issued) Identifier Type Title Lago Schema Data Version ; Channel Trigger Level ; PMT Voltage ; GPS ; Item Site (Latitude, Longitude, Altitude...) H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 10 Data curation PID and LAGOData To provide, resolve and mint persistent identifiers (PID), LAGOData use the GRNET Handle service, one of its advantages is the ability to cover a wide range of Digital objects PID Syntaxt < PREFIX > / < SUFFIX > (e.g. http ://hdl.grnet.gr :8002/11456/LAGO-104) F IGURE – PID Service by GRNET(Image from Clarin Project (http://www.clarin.eu)) H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 11 Data curation OAI-PMH Protocol The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for harvesting records containing metadata from repositories. Data Provider Service Provider A Data Provider maintains one or more repositories (web servers) that support the OAI-PMH as a means of exposing metadata. A Service Provider issues OAI-PMH requests to data providers and uses the metadata as a basis for building value-added services. F IGURE – Basic functioning of OAI-PMH(Image from Clarin Project (http://www.clarin.eu)) H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 12 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 13 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 14 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 15 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 16 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 17 Data curation lagoproject.uis.edu.co H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 18 DSpace Extensions To potentiate the functionality of the software and meet specific needs the following features are on production : Uploads : Django Module to upload multiple records to the server. Script on Backend to ingest records into repository Downloads : jsp modified interface to allow multiple downloads Standardization : A VM(Virtual Machine) with a default installation. H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 19 Remarks High volumes of data H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 20 Remarks High volumes of data Access H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 20 Remarks High volumes of data Access Authorship H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 20 Remarks High volumes of data Access Authorship Potentiation H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 20 Thanks... H. Asorey, A. Martínez-Méndez*, L.A. Núñez and A. Valbuena-Delgado for the Lago LAGO Distributed Collaboration NetworkUniversidad Of Data Repositories Industrial de Santander October 06, 2016 21