PyData global is hosting multiple sprints for open source projects. Each sprint will have a project maintainer leading the sprint to help guide the contributors throughout the session.

What are sprints?

Development sprints offer an opportunity to enhance and contribute to open source projects in a focused session with the project maintainers. It is a fun exercise that helps open source projects to improve with the help of the open source community.

Who can participate?

All experience levels are welcome to participate. Contribution guides and environment setup instructions are provided with each sprint.

Sprints Schedule

You can access the schedule of the sprints here.

Which projects are sprinting?

In PyData Global 2020, 13 projects will have separate sprint sessions with some of its awesome project core developers and maintainers!


Go to NumFOCUS academy page.

Bokeh img

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications.

Sprint leaders

Pavithra Eswaramoorthy

Pavithra is a member of Bokeh’s core team and works on some outreach initiatives.

Bryan Van de Ven

Bryan is the co-creator and core-team member for Bokeh and also works on the open-source RAPIDS project at Nvidia.

Timo Cornelius Metzger

Timo is a freelance journalist and technical writer. He loves doing things with words and data and is currently working on overhauling Bokeh’s documentation.

Flow forecast

Go to NumFOCUS academy page.

Flow img

Flow Forecast repository is an open-source deep learning for time series library. It provide all the latest SOTA models and cutting edge concepts with easy to understand interpretability metrics, cloud provider integration, and serving capabilities.

Sprint leaders

Isaac Godfried

I’m a deep learning practitioner focused on applying A.I. to real world problems in health, climate, and agriculture. Most of my research focuses on time series forecasting, transfer learning, and multi-modal learning (though I occasionally dip into NLP). As most of my solutions need to preform in real world scenarios a lot of my time also goes to making high quality DL frameworks that integrate with cloud services, employ good coding practices, make it easy to ship and track model performance in production. I’m currently leading a team of researchers at CoronaWhy to forecast COVID-19 spread and am developing an easy to use deep learning for time series framework.

Kriti Mahajan

I am a Data Scientist focusing on the development of easy to use, scalable, and fair ML-based solutions for public policy problems. I am currently working at a start up called Monsoon CreditTech which uses ML to help financial institutions make better lending decisions. I am also a core member of the research team at CoronaWhy and am one of the core developers of flow-forecast (an open-source, general-purpose deep learning library for time series forecasting). I previously worked with the Reserve Bank of India (the Indian Central Bank) where my work was on the cusp of both academic research and production of ML models to help policymakers in India, focusing primarily on forecasting inflation using deep learning and developing an early warning system to identify companies before they default.


Go to NumFOCUS academy page.

Hypothesis img

Hypothesis is a family of testing libraries which let you write tests parameterized by a source of examples. A Hypothesis implementation then generates simple and comprehensible examples that make your tests fail. This simplifies writing your tests and makes them more powerful at the same time, by letting software automate the boring bits and do them to a higher standard than a human would, freeing you to focus on the higher level test logic.

Sprint leader

Zac Hatfield-Dodds


Go to NumFOCUS academy page.

kedro img

Kedro is an open-source Python framework that applies software engineering best-practice to data and machine-learning pipelines. You can use it, for example, to optimise the process of taking a machine learning model into a production environment. You can use Kedro to organise a single user project running on a local environment, or collaborate within a team on an enterprise-level project.

Sprint leader

Yetunde Dada


Go to NumFOCUS academy page.

matplotlib img

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.

Sprint leader

Thomas A Caswell

Thomas is a soft-matter physicist who now developer software for scientists. He develops data acquisition, management, and analysis tools at NSLS-II at BNL, as a core maintainer of h5py, and is the current Project Lead of Matplotlib.


Go to NumFOCUS academy page.

Modin img

Modin: speed up your pandas workflows by changing a single line of code.

Sprint leader

Devin Petersohn

Devin Petersohn is a 5th year Computer Science PhD student at the UC Berkeley RISELab. In the early years of his PhD program, he focused on building tools for large scale genomics. In more recent years, Devin has focused on making data science more accessible to the domain experts. As a part of this work, Devin has created Modin, which he has been developing for the past 2 years.


Go to NumFOCUS academy page.

NX img

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Sprint leader

Mridul Seth


Go to NumFOCUS academy page.

Pandas img

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Sprint leader

Marco Gorelli

Marco is a Data Scientist at the Samsung R&D Institute UK. Outside of work, he is a maintainer of pandas (data wrangling platform for Python widely adopted in the scientific computing community) and co-author of nbQA (tool and pre-commit hook to run any standard Python code quality tool on a Jupyter Notebook). He holds an MSc in Mathematics and Foundations of Computer Science from the University of Oxford.


Go to NumFOCUS academy page.

PyT-Ignite img

PyTorch-Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Sprint leader

Victor FOMIN

Active Open-Source contributor to PyTorch ecosystem, core developer and maintainer of PyTorch-Ignite library. Member of the Team of Toulouse Data Science meetup. Currently, Software Engineer @ Quansight. Previously, worked as Deep Learning applications engineer in computer vision.

VolEsti : a sampling and volume approximation library

Go to NumFOCUS academy page.

volesti img

VolEsti is a C++ library for volume approximation and sampling of convex bodies (e.g. polytopes) with an R and limited python interface. VolEsti is part of the GeomScale project.

Sprint leaders

Vissarion Fisikopoulos

Vissarion has a decade of experience in research and development of algorithms in both academia and industry. His interests lie at the intersection of geometric computing, optimization, statistical computing, mathematical software and algorithm engineering. He holds a PhD in computer science, more than 20 scientific publications in top rank journals and conferences, more than 40 talks in international conferences, seminars with invitation and technological events. Co-author, maintainer and contributor in several open-source projects (https://vissarion.github.io).

Apostolos Chalkis

Apostolos is a PhD student in Computer Science at the Department of Telecommunications and Informatics in National Kapodistrian University of Athens. His scientific and research interests are:

  • Markov chains for sampling from high dimensional multivariate distributions.
  • Volume estimation of convex and non-convex bodies in high dimensions.
  • Crises detection and portfolio performance evaluation in big stock markets.
  • Randomized methods for convex optimization.
  • Optimized implementation of state-of-the-art geometric algorithms.

Elias Tsigaridas

Elias is a permanent research scientist at INRIA Paris. He is an expert in computational nonlinear algebra and geometry with extensive experience in mathematical software. He has published 36 peer reviewed journal papers, 44 peer reviewed conference papers, and he has given many invited lectures.


Go to NumFOCUS academy page.

napari img

napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images. It’s built on top of Qt (for the GUI), vispy (for performant GPU-based rendering), and the scientific Python stack (numpy, scipy).

Sprint leader

Nicholas Sofroniew

Nicholas has a background in neuroscience, microscopy, and data analysis, and a passion for open source software in science. He’s on the steering council for napari, a fast, interactive, multi-dimensional image viewer for Python, and leads the imaging tech team at the Chan Zuckerberg Initiative, a philanthropy with a focus on accelerating science.


Go to NumFOCUS academy page.

sktime img

sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. It currently supports Forecasting, Time series classification, and Time series regression.

Sprint leader

Markus Löning

Markus is a PhD student at UCL and core developer of sktime, a scikit-learn compatible Python toolbox for machine learning with time series. He was a visiting student at The Alan Turing Institute, where he began development of sktime in collaboration with researchers from UCL and the University of East Anglia. He holds a Master’s degree in Economics & Philosophy.


Go to NumFOCUS academy page.

tdb img

TerminusDB is an open-source model-driven graph database that stores data like Git. It is designed for knowledge graph representation and is a native revision control database.

Sprint leader

Cheuk Ting Ho

Cheuk has been a Data Scientist in one of the biggest worldwide wholesalers in the travel business; an AWS partnered consultancy which delivers machine learning model; a startup aiming to revolutionise revenue management with data science and; a global bank using machine learning to investigating financial crime. Now Cheuk is working in a team of developers building a revolutionary graph database. She constantly contributes to the community by giving AI and deep learning workshops and organize sprints for open source projects, at the same time contribute to open source projects including Pandas, Keras, Scikit-learn, Dateutil and maintaining her open-source library - PicknMix