A crash course in Python “comprehensions” and “generators”

121 points by rbanffy 3 years ago

pedrovhb 3 years ago

The author conflates generators with generator expressions[1]. He's correct in that generator expressions lazily evaluate and lists don't, but actual generators are a much more powerful construct - they're functions which use the `yield` keyword. They're most frequently used as iterators, but they can also communicate with the outside state via receiving values (`x = yield y`), being explicitly closed with `.close()`, or having an exception raised at the yield point with `.throw()`.

They're so powerful in fact, that asyncio is implemented under the hood with a lot of generator cleverness. You rarely see it, but the `__await__` dunder method which powers `await` is just a synchronous function which returns a generator.

Generator expressions are useful for when there's memory constraints that preclude using a list comprehension, but list comprehensions are actually faster because of cPython's internal optimizations.

[0] https://peps.python.org/pep-0255/ [1] https://peps.python.org/pep-0289/

henrydark 3 years ago
Generators and generator expressions are useful every time your api involves an unbounded sequence.
I once wrote a library of signal processing components based on the premise that the raw input signal is such a sequence, and its concrete type was Iterable[np.ndarray] (iterable of chunks of audio signal). Since the signal is unbounded - we just continue sampling from, say, a microphone forever - the types can't involve finite sequences like lists. A general iterable is works here, and then components that transform the sequence are themselves implemented as generators, or if they are very simple, as generator expressions.
Consider the following function for amplifying a signal:
```
  def amplify(signal: Iterable[np.ndarray], a: float):
    return (a*chunk for chunk in signal)
```
You can flip the whole infinite streams interface and talk about "pushing" chunks around, and it's an esthetic choice.
enragedcacti 3 years ago

> but they can also communicate with the outside state via receiving values (`x = yield y`)
I'm dumbfounded that I have written python for as long as I have without ever encountering this, TIL!
https://peps.python.org/pep-0342/
- photochemsyn 3 years ago
  
  Just found an interesting example explaining how that works, I wasn't aware of it either (this site also has nice 'yield from' explainer):
  https://lerner.co.il/2020/05/08/making-sense-of-generators-c...
  > "The above code looks a bit weird, in that “yield’ is on the right side of an assignment statement. This means that “yield” must be providing a value to the generator. Where is it getting that value from?
  Answer: From the “send” method, which can be invoked in place of the “next” function. The “send” method works just like “next”, except that you can pass any Python data structure you want into the generator. And whatever you send to the generator is then assigned to “x”."
  
  memco 3 years ago
  
  Knew about it, but didn't quite understand how to use send until I read through the link you posted. Thanks for that!
  Regarding that article specifically I think it could've done better service to 'yield from' had it mentioned recursive generators. One example would be flattening nested lists: have a flatten function that checks if its argument is iterable: if not just yield the thing else yield from flatten for every item in the iterable.
- tomn 3 years ago
  
  Back in the days before async/await, twisted (an asynchronous I/O framework) used these to emulate async/await -- you could yield a differed value (like a promise/future), and it would pass it back when the value was available, saving some effort making your state explicit and writing many callbacks:
  https://twisted.org/documents/21.2.0/api/twisted.internet.de...
- knlb2022 3 years ago
  
  I used this in AoC for the first time a couple of years ago, as a consequence of diving deeply into asyncio. https://explog.in/static/aoc2019/AoC23.html
  
  memco 3 years ago
  
  I'm glad to see someone with the same idea was able to take it to completion. Life got in the way for me and I never got to implement that part. If I ever get back around to it I will consult your solution if I get stuck.
nijave 3 years ago

Besides memory, they're also great for an expensive operation. You can do pagination with network calls, disk I/O, etc
charlieflowers 3 years ago

To be fair, I don’t see a conflation here. He chose not to elaborate on generator functions, but that appears to be due to a defensible conscious choice about article scope.
Imo it’s a well written, technically accurate article.

Waterluvian 3 years ago

I have a rule for comprehensions that I try not to break: they should only have two phrases in them. Whether that’s two “for x in y” or one “for” and one “if”. Anything more becomes confusing and illegible.

Complicated comprehensions are not clever or impressive, they’re annoying.

code_biologist 3 years ago

You can abuse it, but the lineage of comprehensions (Haskell's list monad, SQL) make it clear that longer examples are certainly useful.
Haskell list comprehensions limited to two clauses, SQL queries limited to a single FROM and a WHERE, or a FROM and a JOIN sound pretty limited. I use Python comprehensions to do those same kinds of data querying without leaving the Python context, so seems weird to limit myself for the same reason.
Using imperative constructs certainly isn't wrong, but they tend to crawl off the right hand side of the page.
- maxbond 3 years ago
  
  Yeah, I personally find a triply-nested comprehension to be more readable than a triply-nested loop. Once you get comfortable with the syntax, it's nice to be able to fit it compactly on your screen, so that your eyes can roll all over it without having to scroll. I agree with the general principle that terseness can be a detriment to readability, but sometimes it's nice for everything to be in one place.
  I suppose this is only true in combination with verbose function names that capture the logic of the operation you're performing, with the comprehension expressing how this logic is composed. In my mind, comprehensions are sugar over map(), filter(), etc., and before I was using comprehensions I had a mess of nested calls to these functions with lots of lambdas. Comprehensions are a big step up from that as far as readability goes.
- im3w1l 3 years ago
  
  Tbh I often find I would like to use variables in sql. Something like
  americans := SELECT username FROM users WHERE country = "us"; SELECT url, visits FROM posts, americans ON posts.author = americans.username ORDER BY visits DESC LIMIT 10;
  
  hackandthink 3 years ago
  
  Common Table Extensions are often good enough:
  With americans as (select ...) select from ..., americans ...
  https://www.draxlr.com/blogs/common-table-expressions-and-it...
  
  code_biologist 3 years ago
  
  The lack of variables is a major pain point for many devs when learning SQL. CTEs address one aspect. It's maybe not 1:1 or ideal, but lateral joins are a great way to get in-query variables and are extremely powerful for controlling cardinality!
  https://sqlfordevs.com/for-each-loop-lateral-join
  
  im3w1l 3 years ago
  
  Oh huh, I didn't know that was a thing. Exactly what I wanted.
  
  bit_for_a_byte 3 years ago
  
  It wouldn’t work as a variable, you’d get race conditions unless you explicitly wrapped both query statements in a transaction.
  It could work as an expression which gets injected into the 2nd query. So the second query would effectively have a nested SELECT statement as defined by the americans expression.
  But this has the downside that if you reuse americans you have to know that it will be recomputed at each query. So unless you know the semantics of these variable-expression things over time you get race conditions. The solution again would be to wrap everything in a transaction, but at that point you just have to constantly think about transactions, which you don’t if your query is a one-liner.
  Also good luck writing a query planner that has to efficiently take into account variables.
  
  hyencomper 3 years ago
  
  For cases like these, I use sql views - CREATE VIEW view as SELECT * .. Then DROP VIEW view.
ddejohn 3 years ago

My rule has always been that if it doesn't fit on one line it gets re-written, or if practical, I will factor out what I can into helpers. As soon as a comprehension spills over into a new line, the readability tanks for me.
- 83457 3 years ago
  
  With proper indention, a comp with long variable names or 2 part if is very readable for me when if is on it's own line. Put from on it's own line and it is like a sql statement.
TheAdamist 3 years ago

Yeah i did a multi level nested one once including else clauses, and it didnt actually assign to anything, only side effects. Felt clever at the time.
But it was completely incomprehensible and unmaintainable. So got replaced by me later.
- Waterluvian 3 years ago
  
  omg. A comprehension for side effects only. That’s so horrible I wish I could see it. :)
  
  leetrout 3 years ago
  
  I worked with a guy that did this.
  Instead of writing for loops he would write comprehensions everywhere because "list comprehensions are pythonic".
  Think like:
  [send_welcome_email(u) for u in users]
  Just hanging out in the middle of the module or function.
  
  83457 3 years ago
  
  I prefer
  sent = [(u, send_welcome_email(u)) for u in users]
  
  Ultimatt 3 years ago
  
  I prefer
  not_sent = users - {u for u in users if send_welcome_email(u)}
  
  83457 3 years ago
  
  woah
  
  Waterluvian 3 years ago
  
  Just make the function return the argument passed into it. ;)
  
  BeetleB 3 years ago
  
  I do it all the time :-)
  Just make sure it isn't consuming a lot of RAM and it's totally OK.
cuteboy19 3 years ago

Yes, the equivalent itertools expression is much more easily readable
- code_biologist 3 years ago
  
  What do you mean by the equivalent itertools expression? It's not clear to me how an expression like this is made more readable by itertools. I'm not sure how itertools is even relevant.
  sum( a + b for a in range(10) if a % 2 == 0 for b in range(10) )
  
  cuteboy19 3 years ago
  
  I don't even understand what this code does but I was referring to itertools.chain for the nested comprehension in the post
- Waterluvian 3 years ago
  
  Yes!! itertools. Always be aware of itertools.

RobertoG 3 years ago

Well, I learn a few details with the article, so, thanks.

On the other hand, and maybe is me being too literal, I think it gives the wrong impression about what a generator is. It has nothing to do with one liners using brackets. My generators use 'yield' and look like a 'for' and they are still generators.

gompertz 3 years ago

For those interested... Generators first appeared in Icon programming language in 1977. https://en.m.wikipedia.org/wiki/Icon_%28programming_language... (ctrl+f Generators and Goal Directed evaluation)

Incredible it has taken 40+ years for some languages to just catch up to the concept!

antod 3 years ago

40yrs? Python has had generators for approx 20yrs already. Unless you're talking about a different language.

antirez 3 years ago

I always thought list comprehension shows the problem with Python: too many ad hoc ideas. It's a language that looks like a lot more like a sum of many tools than a few well chosen orthogonal ideas. Ruby is a much better language with a much worse implementation, community, set of libraries.

js2 3 years ago

I find chain much easier to read than nested comprehensions:

    from itertools import chain
    list(chain(*trees))
    list(chain.from_iterable(trees))
    list(chain(*dog_breeds.values()))
    list(chain.from_iterable(dog_breeds.values()))

Jensson 3 years ago

Nested comprehensions are a more powerful construct, you can do more with it than chain or the itertools module. Chain is cleaner for this case since it is weaker, so it is clearer what it does, but since list comprehension is more powerful you ought to learn it first and chain is just a nicety.
- js2 3 years ago
  
  I’ve been using Python as my primary language since before it had list comprehensions and I still find nested comprehensions to be hard to comprehend, hard to compose, and hard to refactor.
  I usually resort to building up generator expressions, using itertools, or writing traditional loops.
  $0.02.

aunty_helen 3 years ago

Nice article. I learnt some stuff.

WRT to length of a statement, look at how many operations are being done. If it’s doing something like filtering that should be 2-3 ops on the same line. Perfect application of a comprehension with an if statement.

If it’s 5+ different things jammed into a comprehension, you’re failing PR. Don’t think it needs to hit 2 LOC before refactoring.

Build a list, take a property, call a function with that value, iterate, if something, else something, all on one line?? Aren’t you clever. Now, let’s write professional code.

xwowsersx 3 years ago

   In a REPL session, "_" means "the previous output"

Wow TIL. After all these years. I would always just arrow up and go to the beginning of the line to bind to a variable if I missed it the first time. Nice to learn something new, however small!

JonathanMerklin 3 years ago

I just checked and this is true for the Node.js REPL (Tried in v12, v14, v16, and v18) as well, which I'm also just learning for the first time (and I had the same workflow as you before this). Neat!
- xwowsersx 3 years ago
  
  Ha! Funny that you'd even think to check. In Scala and Python (I'm guessing other langs as well), the underscore can be used for ignored variables, among other things. Never thought it had another usage in the REPL. Actually really useful!
  
  JonathanMerklin 3 years ago
  
  Yeah, I've used it in Haskell for ignored variables, holes, type wildcards, etc.
  Another interesting little rabbit hole that this little conversation led me to: In the Chrome (and Firefox) developer tools, the variable for the previous output is "$_", which I imagine that is the case because of how common it was to assign the main export of the Underscore.js library to "_" (and in the days before a lot of websites would mostly e.g. have their site's code in a webpack-induced closure, they would e.g. grab underscore (or lodash, in those times?) from a CDN and pollute the global scope).
  Since _ is a valid identifier, it also turns out that in Node.js's REPL, it warns you when you clobber _, but (weirdly) not if the clobbering is with a block-scoped declaration.
  $ node Welcome to Node.js v18.12.1. Type ".help" for more information. > "asdf" 'asdf' > _ 'asdf' > var _ = 1 Expression assignment to _ now disabled. undefined > "asdf" 'asdf' > _ 1 $ node Welcome to Node.js v18.12.1. Type ".help" for more information. > const _ = 1 undefined > _ 1 > "asdf" 'asdf' > _ 1
  And obviously, there's no warning for assigning to it outside of the REPL (`node -e "var _ = 2;"`), which makes obvious sense to me.

buttocks 3 years ago

I love explainers like these because my Python code looks like C without the curly braces and semicolons.

pizza 3 years ago

Neat trick with generators: refactoring out related looping structures across multiple different functions, for greater reusability

henron 3 years ago

I really prefer working with functional transformations over containers (including iterators).

saeranv 3 years ago

What's a functional transformation?
- ghostwriter 3 years ago
  
  fmap, fold, traverse, join
  
  saeranv 3 years ago
  
  I guess I'm lacking some background to understand this.
  So, I recognize fold, from the above operations ('reduce' in python), what makes it a "functional transformation"?
  I'm googling function/functional transformations, and the results refer to translating, flipping or scaling a function, like transforming f(x) to f(x + 2) or f(x) + 2. It doesn't seem like it has anything to do with what you guys are referring to, but correct me if I'm wrong.
  
  yababa_y 3 years ago
  
  function as in functional programmi ng. https://youtu.be/2MXyNS33t2k is an indepth video on how c++ and haskell sequence transformations work and contrast.

ggm 3 years ago

A parenthetical comment: I've always liked the imagery a "crash" course gives me:

Either its the course of treatment you get after a crash

Or it's the course of actions which is going to lead to a crash.

eyelidlessness 3 years ago

I’ve always taken it to mean it’s fast and unthorough, which I suppose aligns with your latter case but I think it’s a specialized usage still.
- ggm 3 years ago
  
  It's usually the least worst choice for somebody thrown into unexpected situations. Deliberately walking to a crash course on anything to avoid the long learning path is like autodidactism: it works, kinda, and kinda not.
  I should add I'm a serial offender.