This is a great feature for those who want a mix of positional and keyword-only arguments.
I should have mentioned originally (and I've since updated my post) that this and the kw_only= flag both require Python 3.10 and higher, so code that works with older versions can't opt into it yet.
I like keyword-only arguments, but they become tedious too quickly - especially when the variable names already match (fn(x=x, y=y, z=z)). I wish Python had JavaScript’s shorthand property syntax. (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...).
JS's shorthand property syntax is lovely and elegant. Python can't really adopt it though as it clashes with the set syntax (which is such a niche use case it really shouldn't have a special syntax).
You could do something like f(**{x, y, z}) with just that. Not the prettiest, but at least it would be DRY.
But in general Python devs seem to prefer "explicit" ad-hoc syntax for each use case instead of composable "primitives". Which is approaching a kind of C++ situation where relatively few users know the syntax comprehensively.
> But in general Python devs seem to prefer "explicit" ad-hoc syntax for each use case instead of composable "primitives".
I hope you don't consider JS having "composable primitives". To backup my point: there is no anything similar to underscore-like libraries for Python. All is covered by stdlib and syntax.
But underscore is literally a bunch of functions taking lambdas. Python also has functions and lambdas. [1] shows it could provide the same interface. I don't get what composable primitives JS has.
By the look of it, the feeling I get is that a decent and convenient syntax proposal has been bikeshedded into rejection. Some particular form of keyword arguments at invocation would definitely make quite a few of my scripts more readable.
That's annoying for sure. Though a different problem.
All the kw_only=True argument for dataclasses does is require that you pass any fields you want to provide as keyword arguments instead of positional arguments when instantiating a dataclass. So:
obj = MyDataclass(a=1, b=2, c=3)
Instead of:
obj = MyDataclass(1, 2, 3) # This would be an error with kw_only=True
The problem you're describing in boto3 (and a lot of other API bindings, and a lot of more layered Python code) is that methods often take in **kwargs and pass them down to a common function that's handling them. From the caller's perspective, **kwargs is a black box with no details on what's in there. Without a docstring or an understanding of the call chain, it's not helpful.
Python sort of has a fix for this now, which is to use a TypedDict to define all the possible values in the **kwargs, like so:
from typing import TypedDict, Unpack
class MyFuncKwargs(TypedDict):
arg1: str
arg2: str
arg3: int | None
def my_outer_func(
**kwargs: Unpack[MyFuncKwargs],
) -> None:
_my_inner_func(**kwargs)
def _my_inner_func(
*,
arg1: str,
arg2: str,
arg3: int | None,
) -> None:
...
By defining a TypedDict and typing **kwargs, the IDE and docs can do a better job of showing what arguments the function really takes, and validating them.
Also useful when the function is just a wrapper around serializing **kwargs to JSON for an API, or something.
But this feature is far from free to use. The more functions you have, the more of these you need to create and maintain.
Ideally, a function could type **kwargs as something like:
And then the IDEs and other tooling can just reference that function. This would help make the problem go away for many of the cases where **kwargs is used and passed around.
I don't see a point in using them in new code when I could just use a dataclass (or Pydantic in certain contexts). I've only found them useful when interfacing with older code that uses dicts for structured data.
Boto3 in general is just an utter pain in the ass to work with. It's like they did everything they could to go out of their way to make sure your IDE couldn't figure out how to do anything.
Like, why the hell did they use strings to decide what kind of client you want? Why make us do `s3 = boto3.client('s3')` instead of `s3 = boto3.client.s3()` or something similar? It means my IDE can't figure out what the type of `s3` is.
Everything about it is so unpythonic. Functions and classes in it might use the Python snake_case, but keyword arguments use PascalCase.
I've often considered writing a wrapper around it to provide a sane interface to the AWS API, but the amount of functionality is so vast that it would be a massive undertaking.
I guess some prefer to stick with the stdlib instead of third party libs.
Also, dataclasses feels more straightforward and less "magic" to me (in the sense that it is more or less "just" a way to avoid boilerplate for class definition, while pydantic does way more "magic" stuff like de-/serialization and validation, and adding numerous methods and attributes to the classes).
You could argue the same thing (forcing kwargs) for all Python functions/methods, although, that would make using your APIs very annoying. The `__init__` method for dataclasses are just another method like any other.
As a general rule of thumb, I only start forcing kwargs once I'm looking at above 4-5 arguments, or if the arguments are similar enough that forcing kwargs makes the calling code more readable. For a small number of distinct arguments, forcing kwargs as a blanket rule makes the code verbose for little gain IMO.
For Objective C, using named parameters is the only way to call methods. I don't think I read many critique about this particular aspect. IMO it's actually a good thing and increases readability quite a bit.
For JavaScript/TypeScript React codebase, using objects as a poor man's named parameters also very popular approach.
Also I'd like to add, that it seems a recent trend to add feature to IDEs, where it'll add hint for every parameter, somewhat simulating named parameters. So when you write `mymethod(value)`, it'll display it as `mymethod(param:value)`.
So may be not very annoying.
The only little thing I'd like to borrow from JavaScript is using "shortcut", so you could replace `x=x` with `x`, if your local variable happened to have the same name, as parameter name (which happens often enough).
To be pedantic, Objective-C doesn't have named parameters. Method names are composed of multiple parts, each corresponding to a parameter. Such design contributes to the method's readability and memorability. In contrast, Python methods have their own names, and parameter names are chosen as an afterthought. While there's no reason why Python methods couldn't be named in accordance with parameter names, unfortunately it hasn't been a part of Python's culture.
I find that anything above 2 arguments benefits from explicit keyword notation. With 4-5 arguments, especially when most of them are of the same type, it can be difficult to tell which is which.
> You could argue the same thing (forcing kwargs) for all Python functions/methods, although, that would make using your APIs very annoying. The `__init__` method for dataclasses are just another method like any other.
While that is self evident at a technical level, it is not quite so from a clarity / documentary perspective: “normal” functions and methods can often hint at their parameters through their naming but it is uncommon for types, for which the composite tends to be much more of an implementation detail.
Of course neither rule is universal e.g. the composite is of prime importance for newtypes, and indeed they often use tuple-style types or have special support with no member names.
> Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
This is an awesome way to prevent future breaking changes!
...but unfortunately, adding this to an existing project would also likely result in breakings changes haha
That's always the challenge when iterating on interfaces that other people depend on.
What we do is go through a deprecation phase. Our process is:
* We provide compatibility with the old signature for 2 major releases.
* We document the change and the timeline clearly in the docstring.
* The function gets decorated with a helper that checks the call, and if any keyword-only arguments are provided as positional, it warns and converts them to keyword-only.
* After 2 major releases, we move fully to the new signature.
We buit a Python library called housekeeping (https://github.com/beanbaginc/housekeeping) to help with this. One of the things it contains is a decorator called `@deprecate_non_keyword_only_args`, which takes a deprecation warning class and a function using the signature we're moving to. That decorator handles the check logic and generates a suitable, consistent deprecation message.
That normally looks like:
@deprecate_non_keyword_only_args(MyDeprecationWarning)
def my_func(*, a, b, c):
...
But this is a bit more tricky with dataclasses, since `__init__()` is generated automatically. Fortunately, it can be patched after the fact. A bit less clean, but doable.
So here's how we'd handle this case with dataclasses:
from dataclasses import dataclass
from housekeeping import BaseRemovedInWarning, deprecate_non_keyword_only_args
class RemovedInMyProject20Warning(BaseRemovedInWarning):
product = 'MyProject'
version = '2.0'
@dataclass(kw_only=True)
class MyDataclass:
a: int
b: int
c: str
MyDataclass.__init__ = deprecate_non_keyword_only_args(
RemovedInMyProject20Warning
)(MyDataclass.__init__)
Call it with some positional arguments:
dc = MyDataclass(1, 2, c='hi')
and you'd get:
testdataclass.py:26: RemovedInMyProject20Warning: Positional arguments `a`, `b` must be passed as keyword arguments when calling `__main__.MyDataclass.__init__()`. Passing as positional arguments will be required in MyProject 2.0.
dc = MyDataclass(1, 2, c='hi')
We'll probably add explicit dataclass support to this soon, since we're starting to move to kw_only=True for dataclasses.
Even better is ... just not breaking the interface, leave the code be.
Add parameters at the end. Or just add a new function. Or even a new module if things get too messy for your liking.
Just leave the old users in peace. If you are worried about breaking stuff, it means people are using the function you wrote and find value in it. Why change it at all? I think it is very rare that the reasons for breaking are so compelling that they are worth the trouble.
But the problem here is that with a dataclass you never explicitly defined any parameters. It's the dataclass annotation that defined them for the constructor based on the order of the fields. So, yes the solution is to never re-order the fields and only add new ones to the end, but this can be a surprising requirement because normally the order of fields in a class doesn't matter to users of the class.
I tried using python dataclasses in some projects, the only complaint is that I can not directly pass any dict to a dataclass but have to lint only supported keys.
I think a better way to init a dataclass would be something like `mydataclass(my_dict, lint=True)` instead of passing values as kwargs.
You can also explicitly specify which arguments need to be keyword only using the KW_ONLY sentinel type annotation:
This is a great feature for those who want a mix of positional and keyword-only arguments.
I should have mentioned originally (and I've since updated my post) that this and the kw_only= flag both require Python 3.10 and higher, so code that works with older versions can't opt into it yet.
I've never seen this before. But yes, this does work. And that's very nice.
Thank you for the tip.
I like keyword-only arguments, but they become tedious too quickly - especially when the variable names already match (fn(x=x, y=y, z=z)). I wish Python had JavaScript’s shorthand property syntax. (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...).
JS's shorthand property syntax is lovely and elegant. Python can't really adopt it though as it clashes with the set syntax (which is such a niche use case it really shouldn't have a special syntax).
You could do something like f(**{x, y, z}) with just that. Not the prettiest, but at least it would be DRY.
But in general Python devs seem to prefer "explicit" ad-hoc syntax for each use case instead of composable "primitives". Which is approaching a kind of C++ situation where relatively few users know the syntax comprehensively.
Julia uses a semicolon prefix in a tuple to denote this restructuring. I think that could fit into Python without breaking the whole ecosystem.
Python would probably just use the asterisk to mimic the syntax used to declare kw-only kwargs.
def f(x, , y): ...
f(foo, , y)
> But in general Python devs seem to prefer "explicit" ad-hoc syntax for each use case instead of composable "primitives".
I hope you don't consider JS having "composable primitives". To backup my point: there is no anything similar to underscore-like libraries for Python. All is covered by stdlib and syntax.
> for(in) / for(of)
Cough, cough.
Libraries like underscore are possible because of JS having composable primitives.
JS has many many warts, and the for-loop is a good example of this. And the _.each is a good example how the core JS allows for working around it.
But underscore is literally a bunch of functions taking lambdas. Python also has functions and lambdas. [1] shows it could provide the same interface. I don't get what composable primitives JS has.
[1]: https://pydash.readthedocs.io/en/latest/
Python 3.14 adds that shorthand, but with a slight different (and IMO uglier) syntax:
https://peps.python.org/pep-0736/edit: nevermind, that PEP was rejected :/
That PEP says “rejected”
Thankfully so. As much as I want a shorthand, this syntax wasn't it.
By the look of it, the feeling I get is that a decent and convenient syntax proposal has been bikeshedded into rejection. Some particular form of keyword arguments at invocation would definitely make quite a few of my scripts more readable.
there should definitely be an ocaml-like equivalent of "~argument"
it's also a gentle force towards consistent naming, ie having the same name on caller and callee site.
Oh man, the aws Boto3 library does this for a huge number of calls, and it’s awful.
“What parameters does this take?” you ask, “why, it takes ‘kwargs’” responds the docs and your IDE.
How incredibly helpful!
That's annoying for sure. Though a different problem.
All the kw_only=True argument for dataclasses does is require that you pass any fields you want to provide as keyword arguments instead of positional arguments when instantiating a dataclass. So:
Instead of: The problem you're describing in boto3 (and a lot of other API bindings, and a lot of more layered Python code) is that methods often take in **kwargs and pass them down to a common function that's handling them. From the caller's perspective, **kwargs is a black box with no details on what's in there. Without a docstring or an understanding of the call chain, it's not helpful.Python sort of has a fix for this now, which is to use a TypedDict to define all the possible values in the **kwargs, like so:
By defining a TypedDict and typing **kwargs, the IDE and docs can do a better job of showing what arguments the function really takes, and validating them.Also useful when the function is just a wrapper around serializing **kwargs to JSON for an API, or something.
But this feature is far from free to use. The more functions you have, the more of these you need to create and maintain.
Ideally, a function could type **kwargs as something like:
And then the IDEs and other tooling can just reference that function. This would help make the problem go away for many of the cases where **kwargs is used and passed around.TypedDicts are so underutilized in general. I'm using them a lot even for simpler scripts
I don't see a point in using them in new code when I could just use a dataclass (or Pydantic in certain contexts). I've only found them useful when interfacing with older code that uses dicts for structured data.
Boto3 in general is just an utter pain in the ass to work with. It's like they did everything they could to go out of their way to make sure your IDE couldn't figure out how to do anything.
Like, why the hell did they use strings to decide what kind of client you want? Why make us do `s3 = boto3.client('s3')` instead of `s3 = boto3.client.s3()` or something similar? It means my IDE can't figure out what the type of `s3` is.
Everything about it is so unpythonic. Functions and classes in it might use the Python snake_case, but keyword arguments use PascalCase.
I've often considered writing a wrapper around it to provide a sane interface to the AWS API, but the amount of functionality is so vast that it would be a massive undertaking.
When I was plagued by having to use elasticsearch, the python client we used had the same issue. Every function took kwargs.
In boto3 it helps to add a stubs package for development (and type checking).
Next time I am subjected to Python, I’ll use this for sure, thank you.
Is there a reason to use data classes over pedantic base models anymore?
I guess some prefer to stick with the stdlib instead of third party libs.
Also, dataclasses feels more straightforward and less "magic" to me (in the sense that it is more or less "just" a way to avoid boilerplate for class definition, while pydantic does way more "magic" stuff like de-/serialization and validation, and adding numerous methods and attributes to the classes).
Speed and size, mainly. If you don't need the data validation there's no reason to use pydantic, it's a huge dependency
I’ve never really gotten along with Pydantic. Something about it just doesn’t feel ergonomic.
If I need something more than dataclasses, I’ll normally go for attrs/cattrs. Dataclasses were originally based on attrs, so it’s not much of a leap.
speed? Not pulling in a huge dependency?
did you mean: "pydantic base models" ?
Yeah haha I got autocorrected
Are there other ways to get this argument-enforcing behaviour in functions, not just data classes?
Keyword-only arguments? Yes:
https://peps.python.org/pep-3102/
More recently, Python also added support for positional-only parameters:
https://peps.python.org/pep-0570/
You could argue the same thing (forcing kwargs) for all Python functions/methods, although, that would make using your APIs very annoying. The `__init__` method for dataclasses are just another method like any other.
As a general rule of thumb, I only start forcing kwargs once I'm looking at above 4-5 arguments, or if the arguments are similar enough that forcing kwargs makes the calling code more readable. For a small number of distinct arguments, forcing kwargs as a blanket rule makes the code verbose for little gain IMO.
> that would make using your APIs very annoying
For Objective C, using named parameters is the only way to call methods. I don't think I read many critique about this particular aspect. IMO it's actually a good thing and increases readability quite a bit.
For JavaScript/TypeScript React codebase, using objects as a poor man's named parameters also very popular approach.
Also I'd like to add, that it seems a recent trend to add feature to IDEs, where it'll add hint for every parameter, somewhat simulating named parameters. So when you write `mymethod(value)`, it'll display it as `mymethod(param:value)`.
So may be not very annoying.
The only little thing I'd like to borrow from JavaScript is using "shortcut", so you could replace `x=x` with `x`, if your local variable happened to have the same name, as parameter name (which happens often enough).
To be pedantic, Objective-C doesn't have named parameters. Method names are composed of multiple parts, each corresponding to a parameter. Such design contributes to the method's readability and memorability. In contrast, Python methods have their own names, and parameter names are chosen as an afterthought. While there's no reason why Python methods couldn't be named in accordance with parameter names, unfortunately it hasn't been a part of Python's culture.
I find that anything above 2 arguments benefits from explicit keyword notation. With 4-5 arguments, especially when most of them are of the same type, it can be difficult to tell which is which.
> You could argue the same thing (forcing kwargs) for all Python functions/methods, although, that would make using your APIs very annoying. The `__init__` method for dataclasses are just another method like any other.
While that is self evident at a technical level, it is not quite so from a clarity / documentary perspective: “normal” functions and methods can often hint at their parameters through their naming but it is uncommon for types, for which the composite tends to be much more of an implementation detail.
Of course neither rule is universal e.g. the composite is of prime importance for newtypes, and indeed they often use tuple-style types or have special support with no member names.
> Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
This is an awesome way to prevent future breaking changes!
...but unfortunately, adding this to an existing project would also likely result in breakings changes haha
That's always the challenge when iterating on interfaces that other people depend on.
What we do is go through a deprecation phase. Our process is:
* We provide compatibility with the old signature for 2 major releases.
* We document the change and the timeline clearly in the docstring.
* The function gets decorated with a helper that checks the call, and if any keyword-only arguments are provided as positional, it warns and converts them to keyword-only.
* After 2 major releases, we move fully to the new signature.
We buit a Python library called housekeeping (https://github.com/beanbaginc/housekeeping) to help with this. One of the things it contains is a decorator called `@deprecate_non_keyword_only_args`, which takes a deprecation warning class and a function using the signature we're moving to. That decorator handles the check logic and generates a suitable, consistent deprecation message.
That normally looks like:
But this is a bit more tricky with dataclasses, since `__init__()` is generated automatically. Fortunately, it can be patched after the fact. A bit less clean, but doable.So here's how we'd handle this case with dataclasses:
Call it with some positional arguments: and you'd get: We'll probably add explicit dataclass support to this soon, since we're starting to move to kw_only=True for dataclasses.Shouldn't you also be able to patch MyDataclass in a class decorator (on top of/after @dataclass)?
Yeah, that's the approach we'll be taking in housekeeping. I didn't want to complicate the example any more than I already did :)
Even better is ... just not breaking the interface, leave the code be.
Add parameters at the end. Or just add a new function. Or even a new module if things get too messy for your liking.
Just leave the old users in peace. If you are worried about breaking stuff, it means people are using the function you wrote and find value in it. Why change it at all? I think it is very rare that the reasons for breaking are so compelling that they are worth the trouble.
But the problem here is that with a dataclass you never explicitly defined any parameters. It's the dataclass annotation that defined them for the constructor based on the order of the fields. So, yes the solution is to never re-order the fields and only add new ones to the end, but this can be a surprising requirement because normally the order of fields in a class doesn't matter to users of the class.
Nice! Explicit is better.
I need another blog like this but for slots=True
I tried using python dataclasses in some projects, the only complaint is that I can not directly pass any dict to a dataclass but have to lint only supported keys.
I think a better way to init a dataclass would be something like `mydataclass(my_dict, lint=True)` instead of passing values as kwargs.
What do you mean pass a dict? As in kwargs?
Works fine for a dataclass with arg1 and arg2 attributes.Doesnt work recursively?
At that point, you probably want proper validation with something like Pydantic, where you can use MyType.model_validate(dict).
i mean if `arg2` wasn't defined in MyDataclass, there would an error. You have to exclude fields first.