New Member Introductions

Hi! I’m Julia, I work at Saturn Cloud (think databricks for dask). I contribute to HoloViz, although I used to do this more when I worked at Anaconda. I am also trying to help make sense of the Python visualization landscape (pyviz.org) and I co-organize PyData Philly.

3 Likes

Hi. My name is Thiago and I work on vespa.ai. I am interested in the intersection between search engine and machine learning. Vespa is written in C++ and Java and have native tensor evaluation support. This seems like an interesting initiative.

1 Like

:wave: Hi I’m Cam Davidson-Pilon. Almost all of my professional and personal work has been in Python. I maintain the lifetimes and lifelines libraries (and some small other ones). I really love automatic differentiation (shout-out to JAX and autograd), optimizing algorithms, and debugging.

Some of my goals for joining this group are how to integrate my libraries with the other parts of the Python stack, learn about new ideas and algorithms, and share stories about our times deep in the guts of the computer.

2 Likes

Hi, I’m Chewxy.

I’m based in Sydney, AU.

The projects I maintain revolve around the Go data science/machine learning space. Here’s a non exhaustive list

  • gorgonia.org/gorgonia - Expression graph engine with built in support for symbolic and automatic differentiation - like TensorFlow and PyTorch had a baby.
  • gorgonia.org/tensor - generic multidimensional arrays, engine for Gorgonia
  • gorgonia.org/golgi - a neural network library built on top of Gorgonia
  • gorgonia.org/cu - CUDA library
  • lingo - NLP library, similar to Spacy.
  • stl - Seasonal trend decompositon with LOESS
  • dmmclust - clustering of short texts

I have several others that I am trying to open source - a ALS library for matrix factorization, an attentional neural network library and several others that I can’t remember.

2 Likes

Oh sweet. I used lifelines in my work a few years back. Thanks for the library!

2 Likes

Hi, I’m Yetu, a product manager based in London.

  • I work for QuantumBlack, an advanced analytics unit that is part of McKinsey & Company.
  • I maintain QB’s first open source project called Kedro - a Python library that helps data scientists and data engineers build reproducible, versioned and maintainable data pipelines.
  • I just want to stay up-to-date with new developments in the Python data processing space.

Hit me up for coffee, if you’re in the area!

2 Likes

Hello all,

Adam from https://konduit.ai. We work on eclipse deeplearning4j.
I serve on the board of the eclipse foundation and lead the eclipse deeplearning4j project.
Konduit is also heavily involved in the tensorflow sig-jvm.
We support https://github.com/bytedeco/javacpp and https://github.com/bytedeco/javacpp-presets
allowing the same kind of low level pointer math you get from python but in java a swell.

Javacpp is the base of the new packaging for tensorflow -java: https://github.com/tensorflow/java
We are also working on defining a new ndarray format for the JVM in collaboration with the https://djl.ai/ team at AWS.

We’re currently working on supporting onnx on the jvm as well.
All of this gets put in to our project https://github.com/KonduitAI/konduit-serving

Our team is remote all over the world. I live in Tokyo, Japan and we have a wide swath of people out in everywhere from Germany to various parts of SE asia. Happy to help people in the JVM ecosystem and interopping with the python world.

2 Likes

Hi, I’m Saul! For the past several years I’ve been passionating on VisiData, a terminal-based data exploration multi-tool. Previously I designed the .xd file format for crosswords (which led to a scandal). In my day job, I’m developing an embedded algorithms platform for real-time biosignal processing. I primarily use C and Python, though I carry an old flame for Forth, and a nascent awe for the APL family of languages.

I’m interested in:

  • the process of cleaning and restructuring data
  • file format design
  • CLI and TUI design
    • tabular user interfaces in particular
  • data games like SQL Murder Mystery and The Command Line Murders
  • data pipelines (long-term, low-bandwidth, low-budget)

Please reach out if any of the above catches your eye! I love connecting and collaborating and I’m constantly surprised at the inspiration I can get from having a fresh conversation with someone.

2 Likes

Hi! I’m Carol Willing. I’m a core developer of Python and Jupyter (JupyterHub, Binder, and education) as well as a Steering Council member.

I’m currently working at a startup providing Continuous Verification Tools for Kafka.

Thanks Wes for creating this platform for discussion.

5 Likes

Hi! I am Joris. I am a Python developer (and regularly teach as well), core developer of Pandas and maintainer of GeoPandas (to work with geospatial data in pandas). I have also contributed to scikit-learn (but not that much anymore nowadays). And more recently working on Apache Arrow (mostly pyarrow), since I am now working parttime at Ursa Labs (with Wes).

4 Likes

Hello, I’m Patrick Ryan and I explain how to hijack AI systems via data poisoning, identifying and exploiting flimsy hyperparameters, and identifying psychological weaknesses and cognitive biases in individual data scientists.

I’ve helped make public 900+ internal documents from Google explaining how they use poor data science habits to justify politically-motivated AI policies.

If you think you’re not safe, let me know!

1 Like

Hi, my name is Antoine Pitrou. I’m a CPython core developer and a Apache Arrow core developer and project manager. I’ve also been a core developer on Numba and Dask.distributed.

3 Likes

Josh Patterson here, I’m the director of engineering for RAPIDS at NVIDIA. I write little code these days, but my main job is to make sure we do our best to align with other communities and promote standards.

8 Likes

Tom Drabas. I’m a Senior Data Scientist/Software Engineer (depending how you look at it ;)) at Microsoft. I’ve been an avid user and advocate of OSS projects (everything from pandas, NumPy, SciKit-Learn, Spark/PySpark to, most recently, RAPIDS and Dask) and have started contributing back a bit recently, mostly via filing bugs against RAPIDS but also by co-developing dask.distributed.cloud_provider.AzureMLCluster (soon to be PR’d to dask_cloudprovider): https://github.com/drabastomek/dask-cloudprovider

3 Likes

Hi Everyone,

My name is Doug, I’m a Ph.D. student in particle physics (almost finished with it). I have one open source software project that is general purpose: pygram11, a blazing fast histogramming library for Python. I also write a lot of software for my research using the SciPy/PyData stack. I enjoy writing code in C++ and Python, while I can handle a bit of C. I’m also a big Emacs fan. The amount of time I’ve spent writing software during grad school has helped me discover that engineering software tools for data analysis is what I actually want to do. I try to keep a small technical blog active at https://ddavis.io/.

1 Like

Hi,

My name is Maarten Breddels, I was an astronomer working on working with large datatsets (Gaia), which me to create Vaex: https://github.com/vaexio/vaex/. Vaex (now) is an out of core dataframe library, sharing many of the data ideas of Apache Arrow, mainly columnar based, immutable data, zero-copying, mmapping. It furthermore tries to do things lazy as possible (virtual columns, lazy filtering), to save time/memory and focusses on fast data aggregations and visualization of that.
It seems very similar to {diskframe}, that was new for me.
A side effect of vaex was the creation of ipyvolume (3d volume), which led me into the ipywidgets world, which led to me to freelancing/consulting.

6 Likes

Hi all! My name is Alex Baden and I’m a software engineer at OmniSci working on OmniSciDB (https://github.com/omnisci/omniscidb), an open source in-memory SQL database designed for parallel hardware, namely nvidia GPUs. While our internal format differs (slightly) from Apache Arrow, we support outputting Arrow buffers and have an interface designed for memory sharing of an Arrow buffer via IPC. We also rely entirely on Apache Calcite for our query parser / cost based relational algebra optimization. Prior to OmniSci, I was a grad student at Johns Hopkins University, also working on open source software – this time, a web viewer called Neuroglancer (https://github.com/google/neuroglancer). A somewhat different angle to data processing, but the design of the viewer allows for exploring terabytes of imaging data through a web browser (obviously relying on some tricks, including pre-computed downsampling of the source data). I am always happy to talk OSS across a wide range of fields and am looking forward to participating in this community!

1 Like

Hi! I’m Stefane Fermigier, a long-time Python user (since ~1996 IIRC). In 1996, while working on my thesis on number theory, I’ve developed Python bindings to PVM (the ancestor of MPI) and to PARI/GP.

I’ve since founded two companies that produce open source software in Python, the last one (and only one in which I’m still involved) being Abilian, and founded the PyData Paris conference which became PyParis in 2017.

More recently, I have been working on the Wendelin project and more specifically on OlaPy which aims to provide standards-based (MDX, XMLA) OLAP access to Python-based data analysis engines.

Hello! My name is Cody - I’m a PM (and pretend data scientist) in Microsoft’s AI Platform, working on our Azure Machine Learning service with a focus on OSS data preparation for ML. Myself and some others have been working on enabling using AzureML as a cloudprovider for Dask - we currently have a fork here we hope to soon contribute back into Dask!

Looking forward to learning from this community :slight_smile:

1 Like

Wow this is an impressive and inspiring group of people, thank you Wes for setting this up.

My name is Fred Monroe (@313v on twitter), I sometimes contribute in small ways to the fast.ai codebase

I helped setup a non-profit wamri.ai that matches up data scientists with people in medicine (research and practice) - everything we produce is open source.

I sometimes collaborate with scientists at the Salk Institute. We created an open source tool that helps people working with electron and fluorescent microscopes train custom nets to do super resolution on their images/movies coming off the microscope - its call PSSR on github - i’m limited to two links or i would include it here.

I don’t one hundred percent fit the profile of many of you as a core big data package maintainer - but i am aspiring to be because i’m actively working on tools for generically working with single cell RNA sequence data that i hope to release soon.

I am an active user of many of the tools on this page and hope to release life sciences packages that integrate well with many of these tools, (i’m also working with the relatively large zinc15 database for example)

please feel free to kick me off if i don’t quit fit the vision of membership here - but i promise to keep my head down and not be too distracting and appreciate the opportunity to learn from you all. I’m a long time fan of many of the packages on this page and am very grateful to all of you.

4 Likes