Python Deferred Evaluation

I wanted to open a thread here to add support to Python to support creating deferred evaluation systems:

I wrote about this a bit in a blog post a few months ago, under the " Python Control Flow" header. The basic issue as I understand is that Python has no nice ways of overriding control flow. So different literary’s have to take different work around’s here. I’ll try to start to outline them here:

  • Functional Control Flow: One way to work around this is to simply ignore Python’s control flow operators and build all your operations as functions. This is the direction taken by NumPy, Pandas, and the original TensorFlow library. The pro is that it’s conceptually rather simple. The con is that it forces users to stick to a subset of Python without control flow.
  • Parsing Bytecode: Numba on the other hand takes a different approach. It parses the bytecode of a function you decorate and has it’s own transpiler for a subset of Python.
  • Rewrite as functional form: The approach taking by the TF 2.0 autograph library, documented in the “The 800 Pound Python in the Machine Learning Room” paper and snek-LMS repo, is a bit of a hybrid here. I believe it relies on the AST instead of the bytecode parsing, but instead of building some intermediate IR from it directly, it rewrites the function to replace the control flow with a functional form.
  • Rewrite bytecode on the fly: The codetransformer library lets you modify the bytecode before the interpreter executes it. It seems closest to modifying the CPython interpreter itself, without requiring that.
1 Like

A solid quotation system for Python has been my white whale for a long, long time, so I’m interested every time it rears up again. Glad to see Guido encouraging development on this front, but… I think there are widely divergent set of use cases.

At this point in time, my views have evolved since the ~2015 days. I look at:

(1) the development of massive, complex, “big” tech stacks to encapsulate all the dynamism intrinsic to EDA & numerical modeling computation (whether “shove it all into a big docker” or “here’s my Ur-Framework to rebuild z/OS on k8s”);

(2) the imminent explosion in novel hardware (with new HW ISAs, and - most importantly - vastly differing memory hierarchies);

(3) the influx of extreme novitiates into cut-n-paste-code-culture, and the business propensity to spend oodles of money on “hopeware” (i.e. an App That Turns Excel Jockies Into Unicorns);

… and I wonder if my white whale of “quote operator & embedded DSL” is truly a windmill no longer worth tilting at.

We’ve always been able to wrap Python with an importhook to do NSE. (Jon Riehl’s Mython was a more complete, less hacky demonstration of the similar concept.)

Embedded DSLs and quoting expressions is great… for people who want to apply SQL-ish concepts to Python data structures. But (as Jeremy Howard noted in the above Twitter thread), F# and friends have much more powerful primitives, because they are built on a more robust, functional language core.

I guess what I’m saying is: Maybe even if we manage to bolt a quote operator onto this language whose fundamental syntax emerged from constraints & needs that are two decades past… would it be better and more efficient to invest that energy into a new, smaller data-oriented language, that would be backwards compatible (“compile down”), but which forms a coherent, solid, abstraction plateau that is a more livable space for both “data science novices” and maintainers of the language?

2 Likes

Also it’s Friday night and I’ve been drinking a little bit, so I may not be in my most cheerful, optimistic place.

But I look around and all I see are bandaids upon bandaids, and while I can absolutely appreciate all of the benefits of a Python quote operator, at this point I can’t help but wonder if it’s still just a bandaid over a much deeper chasm between [modern] people who want to solve [modern] data-intensive modeling problems and [modern] computers.

1 Like