New Member Introductions

Hello! If you are new to this community and would like to introduce yourself, please feel free to do so here.

Ideas for your introduction:

  • List of projects you maintain or contribute to
  • Where you work or any other motivations you have for contributing to open source
  • Topics of interest that brought you here
6 Likes

My name is ZJ and I make {diskframe} which is a larger-than-RAM data manipulation framework for single machine setting available in R.

I am currently working at a Melbourne Machine Learning start-up.

I also make varous Julia data science and machine learning projects including

  • JDF.jl - a dataframe serialization format written in Julia
  • JLBoost.jl - an implement of XGBoost-like algorithms in Julia
  • SortingLab.jl - fast sorting algorithms in Julia
3 Likes

Hi my name is Nick Poorman. I’m a Director managing the Data Platform teams of a large data driven company in Charlotte, NC.

Since I don’t get time to write code at work anymore, I’ve been spending my free time working on data projects involving Go. Some recent side projects:

1 Like

Hi everyone! I’m Sean Law and I am the creator of STUMPY: a powerful and scalable Python library for modern time series analysis. STUMPY efficiently computes something called the matrix profile, which can be used for a variety of time series data mining tasks such as:

  • pattern/motif (approximately repeated subsequences within a longer time series) discovery
  • anomaly/novelty (discord) discovery
  • shapelet discovery
  • semantic segmentation
  • density estimation
  • time series chains (temporally ordered set of subsequence patterns)
  • and more …

Whether you are an academic, data scientist, software developer, or time series enthusiast, STUMPY is straightforward to install via conda/pip. Our goal is to allow you to get to your time series insights faster. See documentation for more information.

4 Likes

hi @nickpoorman! I wasn’t aware of the Go Parquet effort, I’m sure that other Go Arrow developers would be interested (we can discuss further in a new topic on here, or on dev@arrow.apache.org)

2 Likes

Hi, I’m Hani Safadi. I’m a researcher and educator teaching Java and Python to business school students. I am interested in studying open-source organizing. I’m also a developer leading the Open Data Innovation Project, a visual text analytics web platform.

1 Like

Hi, I’m Samuel.

Great with a place for tool developers to discuss!

I did my PhD at pharmb.io working on improving reproducibility of early drug discovery, especially using scientific workflow systems. During this process, I developed a few libraries I’m trying to maintain today (with wildly varying success at that):

  • SciLuigi - A wrapper around Luigi to add named inputs/outputs to allow defining the dependency DAG in one place separate from tasks (hadn’t had time to update in a while)
  • SciPipe - A Go based scientific workflow library with 0 dependencies, based on flow-based programming ideas. Was needed since none of the tools we tried at the time had dynamic scheduling comibined with unlimited nesting of parameter sweeps.
  • RDFIO - Semantic MediaWiki plugin for importing RDF datasets into MediaWiki/SMW. More for “really small data” use cases.

These days I’m working as a consultant in AI/ML at Savantic AB in Stockholm, while doing the occasional software / web dev project through our own company RIL Partner AB.

I’m trying hard to find more time to maintain these libraries though as I have various personal projects I would like to pursue using them.

1 Like

Hello good people, I’m Philippe Nguyen, somewhere between data scientist and software engineer at L2F in Lausanne, Switzerland.

I worked on developing Giotto, a collection open-source Python libraries like:

I’m also contributing to Nixpkgs, a collection of packages for Nix language (including the declarative operating system NixOS).

2 Likes

Hi, I’m Max.

I am a PhD student at KTH Royal Institute of Technology in Stockholm, Sweden. I mainly work on arcon, a streaming-first analytics engine in Rust. I do also contribute to kompact, a hybrid actor + component model framework that our research group utilises for arcon.

2 Likes

Hello, my name is Leslie and I’m a technical writer in the IaaS space (Linode). I’m looking to learn more about Data Science both for personal projects and to be able to write more about it. Linode’s docs are open source and I contribute there everyday.

1 Like

Hey all! My name is Michael Chow and I’m working on a data analysis library called siuba.

Some key points on siuba:

  • It’s a port of dplyr (and tidyverse packages) to python
  • It draws inspiration from the package Ibis
  • a difference between it and other dplyr ports is that it has functionality to speed up grouped operations.

I’m taking a sabbatical through 2020 to focus on open source tools, and co-directing at the non-profit Code for Philly–so am excited to connect with other people working on data science tools!

1 Like

I’m Holden :slight_smile: I’ve mostly worked on Apache Spark (with some detours into Apache Beam). I’m currently exploring the wonderful world of non-Spark distributed data processing tools in my free time.

3 Likes

Hey all! Thanks Wes for setting this up, I think it’s long overdo.

I work at Quansight and am working on a library for writing domain specific languages in Python and translating/evaluating them with pattern matching, using type annotations to also support static analysis with Mypy. It’s called metadsl.

3 Likes

Hey everyone! Congrats Wes it looks a great initiative!

My name is Ivan, I am working at Quansight. and in the last years, as part of my work at Quansight, I did some few contributions to some projects like Jupyter, PyTorch and Ibis-framework.

Also I am contributing as reviewer in the PyOpenSci project.

I love topics related to Open Science so I am very excited to be here :slight_smile:

3 Likes

Hi All. I am Bargava and I run a startup focused on helping small retailers do experimentation (a/b testing) better.

This is based on two open source packages that am developing:
PyBandit and Recoflow.

I spend my time between Bangalore and Austin.

2 Likes

Hello, everyone! My name is Paige Bailey, and I’m a product manager in Research & Machine Intelligence at Google.

I work on a variety of projects in the open-source data science and machine learning ecosystem (most notably TensorFlow and increasingly JAX).

Based in the Bay Area (Mountain View), and am always happy to grab coffee.

4 Likes

Hi I’m Yaw. I’m co-founder of ODK. We make mobile data collection software for social good. Think Survey Monkey, but for election monitors, polio vaccinators, climate scientists working in disconnected settings.

I’m here to find ways to work with the broader community, particularly on sustainability.

3 Likes

'ello! I’m Hannah. I’m one of the folks trying to redesign the Matplotlib data model to be more flexible and data structure agnostic and would love to hear from y’all about use cases and the like and also feedback on what y’all think it should look like/how it should work/etc.

3 Likes

Hey I’m Julien
Projects I’m involved in / maintaining:

  • Apache Parquet
  • Marquez
  • Apache Arrow
  • Apache Iceberg
  • Apache Heron
  • Apache Pig
2 Likes