Data processing pipelines for Small Big Data


Go to NumFOCUS academy page.

Small Big Data is a grey area in data science between “it fits in memory” and 100 Tb. Some of the tools used for big data are overkill, and they might require a particular set of expertise that not every organization has. In contrast, many of the libraries and paradigms used for small data can become expensive when deploying to the cloud. How can we process large-ish data fast and efficiently?


Esteban J. G. Gabancho

I’m a software engineer with 9+ years of experience in full-stack web development, operations, and leading small/medium projects. Python is my preferred programming language, but I don’t shy away from using a different one. Although I have worked mainly as a backend DevOps, I also enjoy working on the UI end of things, and I have become to love Javascript, together with some of its frameworks. During the past few years, I have also worn a “data engineer hat,” which has allowed me to broaden my knowledge over the data science domain and some of its peculiarities.

Anthony Franklin, PhD

Accomplished advanced analytics expert and consultant. Serial entrepreneur and co-founder of Fanalytical Inc. Former Div. I college football player and lifelong academic.