CSDMS3.0 - Bridging Boundaries

Pangeo: Scalable Geoscience Tools in Python — Xarray, Dask, and Jupyter

Joseph Hamman

NCAR, United States

Earth scientists face serious challenges when working with large datasets. Pangeo is a rapidly growing community initiative and open source software ecosystem for scalable geoscience using Python. Three of Pangeo’s core packages are 1) Jupyter, a web-based tool for interactive computing, 2) Xarray, a data-model and toolkit for working with N-dimensional labeled arrays, and 3) Dask, a flexible parallel computing library. When combined with distributed computing, these tools can help geoscientists perform interactive analysis on datasets up to petabytes in size. In this interactive tutorial we will demonstrate how to employ this platform using real science examples from hydrology, remote sensing, and oceanography. Participants will follow along using Jupyter notebooks to interact with Xarray and Dask running in Google Cloud Platform.

Of interest for:
  • Cyberinformatics and Numerics Working Group
  • Artificial Intelligence & Machine Learning Initiative