Apr. 29, 2021 – The Role of Data Sharing and Distributed Storage in Research – Visions for the Future

The Role of Data Sharing and Distributed Storage in Research - Visions for the Future

When: Thursday, April 29th, 2021 – 9AM Pacific Time/4PM UTC to 10:15AM Pacific Time/5:15PM UTC

Click here to watch the seminar

Christine Kirkpatrick

Panel Moderator

Division Director, Research Data Services

San Diego Supercomputer Center, UC San Diego

Bio:

Christine Kirkpatrick oversees the San Diego Supercomputer Center’s (SDSC) Research Data Services division, which manages infrastructure, networking, and services for research projects of regional and national scope. Kirkpatrick focuses on the implementation of research computing services, with an emphasis on operational cyberinfrastructure (CI) at scale. Kirkpatrick founded and hosts the US GO FAIR Office at SDSC, is Co-PI and Co-ED of the West Big Data Innovation Hub, co-PI of the Open Storage Network, and PI of the EarthCube Office (ECO). She co-Chairs the FAIR Data Object Forum, serves on the Technical Advisory Board (TAB) for the Research Data Alliance (RDA), and the external Advisory Board for the European Open Science Cloud (EOSC) Nordic, and the National Academies of Sciences’ U.S. National Committee for the Committee on Data (CODATA).

Ryan Abernathey

Analyzing Big Earth System Data with OSN and Pangeo Cloud

Associate Professor, Columbia University

Abstract:

Pangeo is an open source platform for interactive analysis of large, complex datasets. Pangeo is used primarily in the Earth System Science field, where satellite observations, simulations, and other data sources generate PB of data per month, leading to bottlenecks in analysis workflows. To meet this challenge, Pangeo has deployed data-proximate analysis hubs, open to a broad community of researchers, in the commercial cloud (AWS, Google Cloud, MS Azure). Our software stac, based on Jupyter, Xarray, Dask, and other scientific python tools, operates natively and scalably on multidimensional arrays stored in object storage in the Zarr format. However, commercial cloud object storage is very expensive and penalizes moving data across cloud boundaries. In this talk, I will demonstrate hybrid workflows that use commercial cloud for computing and OSN for storage. The excellent performance of OSN suggests it can be a viable alternative to commercial object storage for scientific data.

Bio:

Ryan is a computational physical oceanographer who leads the Ocean Transport Group, whose mission is to advance scientific understanding of how stuff moves around the ocean and how this transport influences Earth’s large-scale climate and ecosystems. This research involves working with satellite data, numerical simulations, and observational datasets. Ryan is an enthusiastic advocate for open source scientific software and is an active contributor the the Pangeo Project, a community platform for Big Data geoscience.

Ian Foster

Services for Science

Professor, University of Chicago

Abstract:

Today’s science infrastructure comprises observatories and instrumentation, (super) computers, commercial cloud, and other resources of unprecedented power. New capabilities such as autonomous laboratories are on the horizon. It is now time to upscale science environments by deploying science services of comparable sophistication. I use examples from Globus and elsewhere to illustrate how such services can be constructed and used, and discuss lessons learned for scientific software architecture, dissemination, and sustainability.

Bio:

Dr. Foster is Senior Scientist and Distinguished Fellow, and also director of the Data Science and Learning Division, at Argonne National Laboratory, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research deals with distributed, parallel, and data-intensive computing technologies, and innovative applications of those technologies to scientific problems in such domains as materials science, climate change, and biomedicine.

Ed Lazowska

Don't be Stupid: Use the Commercial Cloud

Professor, University of Washington

Abstract:

We should be negotiating with the commercial cloud providers for acceptable terms (financial and other), rather than building our own private cloud. The path we are pursuing is the road to ruin.

Bio:

Dr. Lazowska's research and teaching concern the design, implementation, and analysis of high-performance computing and communication systems. For the first ten years of his career, Lazowska's principal focus was computer system performance: the development of effective performance evaluation techniques, and the use of these techniques to gain insight about significant computer systems and computer system design issues. Lazowska then turned his attention to the design and implementation of distributed and parallel computer systems - work that yielded a number of widely-embraced approaches to kernel and system design in areas such as thread management, high-performance local and remote communication, load sharing, cluster computing, and the effective use of the underlying architecture by the operating system. Current research includes information technology to support sustainable rural development, data architecture for the Ocean Observatories Initiative, control theory applied to computer system management, and evolving a broad research agenda in Network Science & Engineering.

Alex Szalay

Science at the Mid-Scale

Professor, Johns Hopkins University

Abstract:

The talk will discuss how the emergence of many mid-scale science projects is changing the computational needs of the US science community.  These projects bring new automated instruments capable of generating big data sets, yet do not have enough resources to build their own dedicated computational infrastructure. This trend is here to stay and demonstrates the need to filling the “missing middle” in today’s CyberInfrastructure.

Bio:

Alexander Szalay is a Bloomberg Distinguished Professor and Professor in the Department of Computer Science. He is the Director of the Institute for Data Intensive Science. He is a cosmologist, working on the statistical measures of the spatial distribution of galaxies and galaxy formation. He is a Corresponding Member of the Hungarian Academy of Sciences, and a Fellow of the American Academy of Arts and Sciences. In 2004 he received an Alexander Von Humboldt Award in Physical Sciences, in 2007 the Microsoft Jim Gray Award. In 2008 he became Doctor Honoris Causa of the Eotvos University, Budapest.