Nov. 12, 2020 – National and International Trends in Research Storage at Scale

National and International Trends in Research Storage at Scale

When: Thursday, Nov 12, 2020 – 9AM Pacific Time/5PM UTC

Click here to watch the seminar

Tune in for updates from influential projects including Germany’s Helmholtz Federated IT Services (HIFIS), the European Open Science Cloud’s (EOSC) Nordic project and other US initiatives.

Ilya Baldin

Anita Nikolich

FABRIC and FAB

Abstract:

The Internet is broken. FABRIC and FAB are large NSF-funded projects that will allow researchers and practitioners to re-imagine a future, better Internet by enabling cutting-edge and exploratory research in networking, cybersecurity, distributed computing and storage, machine learning, IoT and science applications. During the four-year construction phase, a national and International testbed will be built in support of this vision. One of the goals of FABRIC is to serve as the conduit for collecting, storing and sharing experimental data. However, since this is not the core mission of FABRIC, the project is wrestling with the best approach to storing and sharing vast amounts of data that can be collected from large-scale experiments on FABRIC infrastructure. We'll talk about our main short and long term storage challenges, our strategic goals in this area, and explore some ideas on how to solve them in order to provide a better experience for researchers and more robust data for experiments.

Bio:

Ilya Baldin leads RENCI’s network research and infrastructure programs. He is a networking researcher with a wide range of interests, including high-speed optical network architectures, cross-layer interactions, novel signaling schemes and network security. Before coming to RENCI, Baldin was the principal scientist at the Center for Advanced Network Research at the Research Triangle Institute and a network research engineer at the Advanced Network Research group at MCNC, where he was a team member and a leader of a number of federally funded research efforts. He holds a Ph.D. and MS degrees in computer science from North Carolina State University.

Bio:

Anita Nikolich is a Research Scientist and Director of Research Innovation at the Information School at UIUC. She served as Program Director for Cybersecurity at the National Science Foundation (NSF), was the Executive Director of Infrastructure at the University of Chicago, and has held a variety of research and operational roles in industry and government. She is the co-organizer of the DEFCON AI Village, a 2020-21 AAAS Leadership Fellow in AI Public Engagement, and serves on the ARIN Advisory Committee. She does work in cryptocurrency security and analytics and remains optimistic about bringing together hackers, academia, industry and government to make the world a better place.

 

Slides:

osn-20201112-nikolich

Uwe Jandt

Federated data storage for Helmholtz Research & Friends

Abstract:

The platform “Helmholtz Federated IT Services” (HIFIS) [1] has been established to provide common access to IT resources within the German Helmholtz Association [2], as well as training and support for professional and sustainable scientific software development. The so-called Helmholtz Cloud Services offered by HIFIS include services for large data transfer, high performance computing, as well as documentation and collaboration tools of all kinds – the need for the latter is being demonstrated forcefully this year. From the very beginning of HIFIS, the requirements of all Helmholtz scientific communities - including scientific platforms like the Helmholtz Imaging Platform and the Helmholtz Artificial Intelligence Cooperation Unit - have been surveyed extensively, allowing to shape and operate services according to their needs.

One central component in such federated service landscape is a large scale distributed storage, allowing high throughput data handling and transparent access due to easy integration with existing systems. In the context of HIFIS, the dCache project has been integrated into the Helmholtz Cloud, allowing to access storage by users from all Helmholtz centres – and their collaboration partners – either directly or via connected services, e.g., HPC. The dCache identity management allows to seamlessly integrate into the authentication and authorization infrastructure (AAI) of the Helmholtz Cloud using OpenID-Connect. On top of this, dCache features advanced authorization delegation using macaroons, thus facilitating to share access rights between research groups at definable levels, without any need to share secrets, creating additional accounts, etc.

Central storage for computing in a federated landscape inevitable raises the question of connection latency between the dCache and (remote) computing site. dCache provides a user-transparent workflow to enable buffering and pre-fetching of data transfers by remotely deployed small dCache instances with integrated caching nodes. This effectively and drastically reduces performance losses due to latency. In addition to smart, dynamic caching, dCache supports large scale data ingress and egress by providing seamless integration with transfer services such as CERN’s File Transfer Service [3] and Globus [4]. These services allow scientists to move large volumes of data in a reliable and managed fashion. They are further enhanced by data placement services, such as Rucio [5], that allow scientists to build complex, dynamic rules describing where data is needed to support scientific work-flows.

Last and definitely not least, dCache generates events to which agents can subscribe. dCache sends these events under various circumstances, including data ingest and data access. This may be used to trigger automatic replication of uploaded data, and provide fine-grain usage monitoring. This mechanism further allows scientists to build work-flows and processing pipelines that are invoked when data is added or updated. This includes, for example, model fitting and classification, file normalization, metadata extraction and data catalog maintenance.

In our talk, we showcase the benefits of integrating the needs of the scientific communities from the very beginning in order to sustainably foster cross-institutional and multi-disciplinary research.

[1] https://hifis.net

[2] https://www.helmholtz.de/en/research/information-data-science/helmholtz-incubator/

[3] https://fts.web.cern.ch/fts/

[4] https://www.globus.org/

[5] https://rucio.cern.ch/

 

Bio:

Uwe Jandt works at DESY, Germany. He coordinates one of the five so-called Incubator platforms of the German Helmholtz Association, namely Helmholtz Federated IT Services, in short: HIFIS.  Uwe Jandt has a background in information technology, medical imaging and bioinformatics.  He holds a Ph.D. and MS degrees in computer science from North Carolina State University.

 

Slides:

osn-20201112-jandt

Lene Krol Andersen

Constructing the European Open Science Cloud via regional building blocks

Abstract:

EOSC-Nordic is one of four regional implementation projects, building a complete European Open Science Cloud. The ambitions behind EOSC-Nordic is to facilitate the coordination of EOSC relevant initiatives within the Nordic and Baltic countries and exploit synergies to achieve greater harmonisation at policy and service provisioning across these countries, in compliance with EOSC agreed standards and practices. By doing so, the project seeks to establish the Nordic and Baltic countries as frontrunners in the take-up of the EOSC concept, principles and approach. EOSC-Nordic brings together a strong consortium of 24 partners including e-Infrastructure providers, research performing organisations and expert networks, with national mandates with regards to the provision of research services and open science policy, and wide experience of engaging with the research community and mobilising national governments, funding agencies, international bodies and global initiatives and high-level experts on EOSC strategic matters.

The talk will present the highlights and challenges within the first year of the EOSC-Nordic project life time.

Bio:

Lene Krøl Andersen is the project manager of the EOSC-Nordic project, which is one out of the four regional European Open Science Cloud implementation projects, granted through the European Commission H2020. EOSC-Nordic is the first infrastructural project to implement and integrate the actual European Open Science Cloud in the Nordic and Baltic region. EOSC-Nordic is a 3-year project that runs until September 2022, with a budget of six mill EUR, 24 partners in 10 different countries.

The Danish eInfrastructure Corporation (DeiC) is Lene Krøl Andersen’s home institution, situated in Denmark. At DeiC, Lene is heading the section of international projects.  Lene Krøl Andersen is one of the 12 members appointed by the Governing Board of EuroHPC Joint Undertakingto enter the Infrastructure Advisory Group (INFRAG).  This Advisory Group provides advice to the Governing Board for the acquisition and operation of the supercomputers, drawing up and regularly updating the draft multiannual strategic agenda for such acquisition. She has recently taken on the position as Chair of DiSSCo’s Technical Advisory Board. DiSSCo is a new pan-European research infrastructure of natural science collections.  Lene Krøl Andersen has a research background within natural sciences and holds an MBA degree.

 

Slides:

osn-20201112-andersen