CSDMS 2022: Environmental Extremes and Earthscape Evolution

Xarray for Scalable Scientific Data Analysis

Anderson Banihirwe

National Center for Atmospheric Research, United States

This tutorial introduces Xarray which is a Python library that provides (1) data structures for multi-dimensional labeled arrays, (2) a toolkit for scalable data analysis on large, complex datasets using Dask which extends the SciPy ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments. Attendees should be comfortable with basic Python programming (e.g., data structures, functions, etc.). Some prior exposure to Python data science libraries (e.g., NumPy, Pandas) is helpful. No specific domain knowledge is required to effectively participate in this tutorial.

Please acknowledge the original contributors when you are using this material. If there are any copyright issues, please let us know ( and we will respond as soon as possible.

Of interest for:
  • Terrestrial Working Group
  • Coastal Working Group
  • Marine Working Group
  • Education and Knowledge Transfer (EKT) Working Group
  • Cyberinformatics and Numerics Working Group
  • Hydrology Focus Research Group
  • Chesapeake Focus Research Group
  • Critical Zone Focus Research Group
  • Human Dimensions Focus Research Group
  • Geodynamics Focus Research Group
  • Ecosystem Dynamics Focus Research Group
  • Coastal Vulnerability Initiative
  • Continental Margin Initiative
  • Artificial Intelligence & Machine Learning Initiative
  • Modeling Platform Interoperability Initiative
  • River Network Modeling Initiative