That prevents other control flow mechanisms (return, break) from operating past the function boundary. In general, I avoid single-callsite functions as much as possible (including the iterator api) for this reason.
It sounds like you're fighting the language - Rust is sort of FP-light and you're encouraged to return a null/error value from the intermediate calculation instead of doing an early return from the outer scope. It's a nice and easy to follow way to structure the code IME. Yes, it's more verbose when an early return would have been just right - so be it.
For the case where `try` is useful over the functional form (i.e. parent's situation of having a desired Result, plus some unrelated early-returning), that ends up with nested `Result`s though, i.e. spamming an `Ok(Ok(x))` on all the non-erroring cases, which gets ugly fast.
Not sure if that is relevant to your point, but: For better and for worse, closing over any outer scope variables is syntactically free in Rust lambdas. You just access them.
Because then you couldn't use ? to propagate errors if they occurred inside any loops or branches within the function, which would be a significant limitation.
Try blocks let you encapsulate the early-return behavior of Try-returning operations so that they don't leak through to the surrounding function. This lets you use the ? operator 1. when the Try type doesn't match that of the function this is taking place in 2. when you want to use ? to short circuit, but don't want to return from the enclosing function. For instance, in a function returning Result<T,E>, you could have a try block where you do a bunch of operations with Option and make use of the ? operator, or have ? produce an Err without returning from the enclosing function. Without try blocks, you pretty much need to define a one-off closure or function so that you can isolate the use of ? within its body.
The best part of try blocks is the ability to use the ? operator within them. Any block can return a result, but only function blocks (and try blocks) can propagate an Err with the ? operator.
The closest thing I can think of that will let you return a result from within a separate scope using a set of foo()? calls would be a lambda function that's called immediately, but that has its own problems when it comes to moving and it probably doesn't compile to very fast code either. Something like https://play.rust-lang.org/?version=stable&mode=debug&editio...
There are some situations with tricky lifetime issues that are almost impossible to write without this pattern. Trying to break code out into functions would force you to name all the types (not even possible for closures) or use generics (which can lead to difficulties specifying all required trait bounds), and `drop()` on its own is of no use since it doesn't effect the lexical lifetimes.
Conversely, I use this "block pattern" a lot, and sometimes it causes lifetime issues:
let foo: &[SomeType] = {
let mut foo = vec![];
// ... initialize foo ...
&foo
};
This doesn't work: the memory is owned by the Vec, whose lifetime is tied to the block, so the slice is invalid outside of that block. To be fair, it's probably best to just make foo a Vec, and turn it into a slice where needed.
Unless I'm misunderstanding, you'd have the same lifetime issue if you tried to move the block into a function, though. I think the parent comment's point is that it causes fewer issues than abstracting to a separate function, not necessarily compared to inlining everything.
More significantly the new variables x and y in the block are Drop'd at the end of the block rather than at the end of the function. This can be significant if:
- Drop does something, like close a file or release a lock, or
- x and y don't have Send and/or Sync, and you have an await point in the function or are doing multi-threaded stuff
This is why you should almost always use std::sync::Mutex rather than tokio::sync::Mutex. std's Mutex isn't Sync/Send, so the compiler will complain if you hold it across an await. Usually you don't want mutex's held across an await.
I have been using this in a web application that acquires a lock, retrieves and returns a few variables to the outer scope an then immediately unlocks the mutex again
Can this also affect stack usage? Like if `x` gets dropped before `y` is introduced, can `y` reuse `x`'s stack space (let's assume they are same size/alignment). Or does the compiler already do that if it can see that one is not used after the other is introduced?
Our codebase is full of this pattern and I love it. Every time I get clean up temporaries and expose an immutable variable outside of the setup, makes me way too happy.
A lot of the time it looks like this:
let config = {
let config = get_config_bytes();
let mut config = Config::from(config);
config.do_something_mut();
config.do_another_mut();
config
};
It helps that the specific pattern of redeclaring a variable just to change its mutability for the remainder of its scope is about the least objectionable use of shadowing possible.
All of these are however poor solutions to the problem, because they're not true nested functions — they can access arbitrary variables defined outside their scope. Python at least restricts their modification, but Go doesn't. I'm guessing in Rust it's at least explicit in some way?
In any case, the real solution here is to simply allow proper nested functions that behave exactly like freestanding functions in that they can only access what's passed to them:
This way you can actually reason about that block of code in isolation—same effect as when calling a freestanding function, except this doesn't expose the nested function to callers outside the parent function, which is valuable.
Blocks being expressions is one of the features of the Rust language I really love (and yes I know it's not something Rust invented, but it's still not in many other popular languages).
That last example is probably my biggest use of it because I hate having variables being unnecessarily mutable.
For those who might not have seen it, you can use this to make a `while` act like a `do-while` loop by putting the entire body in the boolean clause (and then putting an empty block for the actual body):
// double the value of `x` until it's at least 10
while { x = x * 2; x < 10 } {}
This isn't something that often will end up being more readable compared to another way to express it (e.g. an unconditional `loop` with a manual `break`, or refactoring the body into a separate function to be called once before entering the loop), but it's a fun trick to show people sometimes.
I typically use closures to do this in other languages, but the syntax is always so cumbersome. You get the "dog balls" that Douglas Crockford always called them:
```
const config = (() => {
const raw_data = ...
...
return compiled;
})()'
const result = config.whatever;
// carry on
return result;
```
Really wish block were expressions in more languages.
Yes, I constantly use this pattern in C++/JavaScript, although I haven't tested how performant it is in the former (what does the compiler even do with such an expression?)
I think this comes from functional programming. I'd just call it "everything is an expression" (which isn't quite true in Rust but it's a lot more true than it is in traditional imperative languages like C++ and Python).
Not mentioned in the article but kinda neat: you can label such a block and break out of it, too! The break takes an argument that becomes the value of the block that is broken out of.
I just learned this one, and am gradually starting to use it! It applies for loops too. I saw it in ChatGPT code, and had to stop and look it up. Rust is a big language, for worse and for better.
I wouldn't call Rust "a big language" because of labeled break. This is a pretty standard language feature, you can do the same in C (and therefore C++), Go, Javascript, Java, C#...
Those languages aren't expression-oriented, so you would need to assign the result to a previously-initialized variable in a higher scope. But that just makes this pattern clunkier in those languages. This subthread is about jumping to labels, which is a relatively obscure yet widespread feature supported by many languages (though C and Go allow forward jumps, and the rest only allow backward jumps, since the latter ensures that control flow does not become irreducible).
... is something to be used very sparingly. I reckon I write a new one about once a year.
Very often if you think harder you realise you didn't want this, you should write say, a function (from which you can return) or actually you didn't want to break early at all. Not always, but often. If you write more "break 'label value" than just break then you are almost certainly Doing It Wrong™.
Not having put it into practice yet, there is a pattern I use regularly which I plan to replace with the labeled one: I set a flag at the top of the loop I have an inner loop. The inner loop can set this flag. Directly past the inner loop, I check for the flag, then break. I am pretty sure this is exactly what the labeled break is for.
I got so used to taking advantage of this feature in my side projects that my work Kotlin code is now full of “run {}” blocks. Even with a GCed language, it’s very nice to restrict variable lifetimes without needing to split the logic out to its own function.
Each does different things, and Rust also has plenty of them. and_then(), or(), or_else(), then(), the list goes on. Kotlin just implements them more widely.
Actually, Kotlin's with() and apply() are more powerful than what Rust can provide. Then again, Rust isn't designed with OO in mind, so you probably shouldn't use those patterns in Rust anyway.
I think you've misunderstood the point they were making by addressing the number as if it was the only concern and then only mentioning the actual point they were trying to make as if it were an incidental afterthought. I don't think it's likely they're criticizing five functions in the standard library is too many, but that having five special functions with certain semantics that only apply to them is too many. The methods you mention in Rust are all in the first category; you could easily write them yourself for any type you define without needing to resort to wrapping any of them. It's not clear to me that someone could write a function in Kotlin with special scoping semantics around an object without resorting to wrapping one of those functions.
These all heavily rely on Kotlin's ability to write an extension function for any class. When you write `with(x) { something() }` you're extending the type of `x` (be that int, List<String>, or SomeObject) with an anonymous method, and passing that as a second parameter.
Consider the signature here:
public inline fun <T, R> with(receiver: T, block: T.() -> R): R
The first object is a generic object T, which can be anything. The second is a member function of T that returns R, which again can be just about anything, as long as it operates on T and returns R.
Let does it kind of diferently:
public inline fun <T, R> T.let(block: (T) -> R): R
This is an extension method that applies to every single class as T isn't restricted, so as long as this function is in scope (it's in the standard library so it will be), every single object will have a let() method. The only parameter, block, is a lambda that takes T and returns R.
So for instance:
val x = makeFoo()
with (x) {
bar = 4
}
is syntactic sugar for something like:
fun Foo.anonymous() {
this.bar = 4
}
val x = makeFoo()
with(x, Foo::anonymous)
You could absolutely write any of these yourself. For instance, consider this quick example I threw together: https://pl.kotl.in/S-pHgvxlX
The type inference is doing a lot of heavy lifting, i.e. taking a lambda and automatically turning it into an anonymous extension function, but it's nothing that you cannot do yourself. In fact, a wide range of libraries write what might look like macros in Kotlin by leveraging this and the fact you can define your own inline operators (i.e. https://pl.kotl.in/TZB0zA1Jr).
This isn't possible in many other languages because taking a generic type definition and letting it possibly apply to every single existing type is not exactly popular. Combined with Kotlin's ability to extend nullable types (i.e. this = null) as well makes for a language system that wouldn't work in many other flexible languages.
Fair enough, I retract my previous comment. Unfortunately there seem to a lot of pieces that are unfamiliar here so I'm not really able to understand parts of this but I trust that you understood what I was saying well enough to know that it was wrong.
I agree, i started with (scope) blocks in Rust, but keep the habit in Kotlin win the run - scope-function. Since run takes no arguments, it feels like the closest equivalent to Rust scopes (compared to other Korlin scope functions, which also keep their local variables from polluting the rest of the function body).
This seems like a great way to group semantically-related statements, reduce variable leakage, and reduce the potential to silently introduce additional dependencies on variables. Seems lighter weight (especially from a cognitive load perspective) than lambdas. Appropriate for when there is a single user of the block -- avoids polluting the namespace with additional functions. Can be easily turned into a separate function once there are multiple users.
This is one of those natural consequences of "everything is an expression" languages that I really like! I like more explicit syntax like Zig's labelled blocks, but any of these are cool.
Try this out, you can actually (technically) assign a variable to `continue` like:
let x = continue;
Funnily enough, one of the few things that are definitely always a statement are `let` statements! Except, you also have `let` expressions, which are technically different, so I guess that's not really a difference at all.
I'm not sure why you picked continue here? All the diverging control flow instructions have the same type, ! aka "Never". In stable Rust you're not allowed to use its name but it's "just" an empty type and you can easily make one of those yourself - an enum with no variants.
Here’s a little idiom that I haven’t really seen discussed
anywhere, that I think makes Rust code much cleaner and
more robust.
I don’t know if there’s an actual name for this idiom; I’m
calling it the “block pattern” for lack of a better word.
This idiom has been discussed and codified in various languages for many years. For example, Scala has supported the same thusly:
val foo: Int = {
val one = 1
val two = 2
one + two
}
Java (the language) has also supported[0] similar semantics.
That's all fine until later on, probably in some obscure loop, `i_think_this_is_setup` is used without you noticing.
Instead doing something like this tells the reader that it will be used again:
i_think_this_is_setup = even_more_stuff
the_thing = begin
setup_a = some_stuff
setup_b = some_more_stuff
run_setup(setup_a, setup_b, i_think_this_is_setup)
end
I now don't mentally have to keep track of what `setup_a` or `setup_b` are anymore and, since the writer made a conscious effort not to put it in the block, you will take an extra look for it in the outer scope.
function abc() {
let a = 1
{
let b = 2
}
console.log(typeof a)
console.log(typeof b)
}
abc()
Used to do this occasionally for exactly the same reasons- don't leave dangling variables junking up your scope, and don't make weirdo functions with parameter passing that you'll only ever call once!
Clojure also has the threading macro -> and ->> which are great at converting exactly the same type of code into a stream of modifications instead of breaking out everything into variables. Naming things can be very useful sometimes but sometimes it is entirely gratuitous and distracting to have
let input = read_input();
let trimmed_input = input.trim();
let trimmed_uppercase_input = trimmed_input.uppercase();
...
The extra variable names are almost completely boilerplate and make it also annoying to reorder things.
In Clojure you can do
(-> (read-input) string/trim string/upcase)
And I find that so much more readable and refactorable.
I like it. IIFEs always make me nervous because they look like they beg to be removed if you don't know why they are used. Using an explicit function such as `run` looks much more intentional, and provide a single intuitive place (the documentation of the `run` function) to explain the pattern.
The first example given is not at all convincing. Its is clear as the sky that loading the config file should be be a separate function of its own. Coupling sending HTTP requests with it makes no sense.
The second example "erasure of mutability" makes more sense. But this effectively makes it a Rust-specific pattern.
It's essentially an inline function with only 1 client. Can be a preference for inline readability and automatically enforces there are no other clients of the "function".
I use this all the time. It's features like these that sell Rust for me honestly; even if you wrapped your whole program in `unsafe` it would still be a massively better language than C++ or C.
I feel like indentation is a really useful structural signal that has been hijacked, in C-family languages, by unnecessarily strict conventions and most recently by autoformatters, to correspond exclusively to language structure, when it could be used for semantic structure as well (or occasionally instead).
Much of the value of this block pattern is that it makes the scope of the intermediate variables clear, so that you have no doubt that you don’t need to keep them in mind outside that scope.
But it’s also about logical grouping of concepts. And that you can achieve with simple ad hoc indentation:
fn foo(cfg_file: &str) -> anyhow::Result<()> {
// Load the configuration from the file.
// Cached regular expression for stripping comments.
static STRIP_COMMENTS: LazyLock<Regex> = LazyLock::new(|| {
RegexBuilder::new(r"//.*").multi_line(true).build().expect("regex build failed")
});
// Load the raw bytes of the file.
let raw_data = fs::read(cfg_file)?;
// Convert to a string to the regex can work on it.
let data_string = String::from_utf8(&raw_data)?;
// Strip out all comments.
let stripped_data = STRIP_COMMENTS.replace(&config_string, "");
// Parse as JSON.
let config = serde_json::from_str(&stripped_data)?;
// Do some work based on this data.
send_http_request(&config.url1)?;
send_http_request(&config.url2)?;
send_http_request(&config.url3)?;
Ok(())
}
(Aside: that code is dreadful. None of the inner-level comments are useful, and should be deleted (one of them is even misleading). .multi_line(true) does nothing here (it only changes the meanings of ^ and $; see also .dot_matches_new_line(true)). There is no binding config_string (it was named data_string). String::from_utf8 doesn’t take a reference. fs::read_to_string should have been used instead of fs::read + String::from_utf8. Regex::replace_all was presumably intended.)
It might seem odd if you’re not used to it, but I’ve been finding it useful for grouping, especially in languages that aren’t expression-oriented. Tooling may be able to make it foldable, too.
I’ve been making a lightweight markup language for the last few years, and its structure (meaning things like heading levels, lists, &c.) has over time become almost entirely indentation-based. I find it really nice. (AsciiDoc is violently flat. reStructuredText is mostly indented but not with headings. Markdown is mostly flat with painfully bad and footgunny rules around indentation.)
—⁂—
A related issue. You frequently end up with multiple levels of indentation where you really only want one. A simple case I wrote yesterday in Svelte and was bothered by:
$effect(() => {
if (loaded) {
… lots of code …
}
});
In some ancient code styles it might have been written like this instead:
$effect(() => { if (loaded) {
… lots of code …
} });
Not the prettiest due to the extra mandatory curlies, but it’s fine, and the structure reasonable. In Rust it’s nicer:
effect(|| if loaded {
… lots of code …
});
But rustfmt would insist on returning it to this disappointment:
effect(|| {
if loaded {
// … lots of code …
}
});
Perhaps the biggest reason around normalising indentation and brace practice was bugs like the “goto fail” one. I think there’s a different path: make the curly braces mandatory (like Rust does), and have tooling check that matching braces are at the same level of indentation. Then the problem can’t occur. Once that’s taken care of, I really see no reason not to write things more compactly, when you decide it is nicer, which I find quite frequently compared with things like rustfmt.
I would like to see people experiment with indentation a bit more.
—⁂—
One related concept from Microsoft: regions. Cleanest in C♯, `#region …` / `#endregion` pragmas which can introduce code folding or outlining or whatever in IDEs.
I think the technique is important to have in your vocabulary, but I think the examples given are a weak sell.
In the example given, I would have preferred to extract to a method—-what if I want to load the config from somewhere else? And perhaps the specific of strip comments itself could have been extracted to a more-semantically-aptly named post-processing method.
I see the argument that when extracted to a function, that you don’t need to go hunting for it. But if we look at the example with the block, I still see a bunch of detail about how to load the config, and then several lines using it. What’s more important in that context—-the specifics of the loading of config, or the specifics of how requests are formed using the loaded config?
The fact that you need to explain what’s happening with comments is a smell. Properly named variables and methods would obviate the need for the comments and would introduce semantic meaning thru names.
I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method. For example, you can write a block to capture a bunch of values for logging context—-then you can call that block in every log line to get a logging context based on current method state. It totally beats extracting a logging context method that consumes many variables and is unlikely to be reused outside of the calling method, and yet you get delayed evaluation and single point of definition for it.
So yes to the pattern, but needs a better example.
> what if I want to load the config from somewhere else?
There are DRY and WET principles. We can argue which one of them is better, but to move something used exactly once to a method just due to an anxiety you can need it again seems to me a little bit too much. I move things into functions that are called once, but iff it makes my code clearer. It can happen when code is already complicated and long.
The block allows you to localize the code, and refactoring it into a separate function will be trivial. You need not to check if all the variables are temporary, you just see the block, copy/paste it, add a function header, and then add function call at the place where the block was before. No thinking and no research is needed. Veni, vidi, vici.
> The fact that you need to explain what’s happening with comments is a smell.
It is an example for the article taken out of a context. You'd better comment it for the sake of your readers.
> I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method.
I do it each time I need a temporary variable. I hate variables that exist but are not used, they make it harder to read the code, you need to track temporaries through all the code to confirm that they are temporaries. So even if I have just two local variables (not "a lot of") and one of them is temporary, I'd probably localize the temporary one even further into its own block. What really matters is a code readability: if the function has just three lines, it doesn't matter, but it becomes really ugly if a lifetime of a variable overshoots its usefulness for 20 lines of a dense code.
The other thing is mutability/immutability: you can drop mutability when returning a value from a block. Mutability makes reasoning harder, so dropping it when you don't need it anymore is a noble deed. It can and will reduce the complexity of reading the code. You'll thank yourself many times later, when faced with necessity to reread your own code.
There is a code and there is the process of devising the code. You cannot understand the former without reverse engineering the latter. So, when you write code, the more of your intentions are encoded somehow in your code, the easier it will be to read your code. If you create temporary variables just to parse config with the final goal to get the parsed config in a variable, then you'd better encode it. You can add comments, like "we need to parse config and for that we need three temporary variables", or you can localize those three temporary variables in a block.
This is a great addition to the best patterns and practices in Rust. Worth noting and using. In JavaScript there's the proposal of "do expressions" which accomplish the same.
I have one better: the try block pattern.
https://doc.rust-lang.org/beta/unstable-book/language-featur...
Can this just be done as a lambda that is immediately evaluated? It's just much more verbose.
That prevents other control flow mechanisms (return, break) from operating past the function boundary. In general, I avoid single-callsite functions as much as possible (including the iterator api) for this reason.
It sounds like you're fighting the language - Rust is sort of FP-light and you're encouraged to return a null/error value from the intermediate calculation instead of doing an early return from the outer scope. It's a nice and easy to follow way to structure the code IME. Yes, it's more verbose when an early return would have been just right - so be it.
For the case where `try` is useful over the functional form (i.e. parent's situation of having a desired Result, plus some unrelated early-returning), that ends up with nested `Result`s though, i.e. spamming an `Ok(Ok(x))` on all the non-erroring cases, which gets ugly fast.
Why couldnt you flatten it?
Wouldn't that also move any referenced variables too? Unlike the block example that would make this code not identical to what it's replacing.
No, unless you ask for it via the `move` keyword in front of the closure.
This works fine: https://play.rust-lang.org/?version=stable&mode=debug&editio...
My instinct is this would get hairy much faster if you want to actually close over variables compared to using a block.
Not sure if that is relevant to your point, but: For better and for worse, closing over any outer scope variables is syntactically free in Rust lambdas. You just access them.
It's syntactically free, but it can cause borrow-checker errors thst cause your code to outright fail to compile.
Yes, exactly. My concerns were semantic, not syntactic.
If the verbose return type syntax can't be elided, I think it's more or less dead as a pattern.
I want that stabilized so bad but it's not been really moving forward.
There's some active work recently on fixing blocking issues, e.g.:
https://github.com/rust-lang/rust/pull/148725
https://github.com/rust-lang/rust/pull/149489
I was not a fan when I first saw it but I'm becoming desperate to have it the more Rust I write.
[flagged]
Out of curiosity why can’t a block just do this natively?
Because it would massively alter langage semantics? It converts returns from the nearest function into returns from the nearest (try) block.
Because then you couldn't use ? to propagate errors if they occurred inside any loops or branches within the function, which would be a significant limitation.
#![feature(try_blocks)]
You only live once.
I've tried it recently, from memory error inference wasn't that great through it.
That's exactly what's currently being fixed before stabilizing it.
One of the first things I tried in Rust a couple of years ago coming from Haskell. Unfortunately it's still not stabilized :(
Why does this need special syntax? Couldn't blocks do this if the expression returns a result in the end?
Try blocks let you encapsulate the early-return behavior of Try-returning operations so that they don't leak through to the surrounding function. This lets you use the ? operator 1. when the Try type doesn't match that of the function this is taking place in 2. when you want to use ? to short circuit, but don't want to return from the enclosing function. For instance, in a function returning Result<T,E>, you could have a try block where you do a bunch of operations with Option and make use of the ? operator, or have ? produce an Err without returning from the enclosing function. Without try blocks, you pretty much need to define a one-off closure or function so that you can isolate the use of ? within its body.
The best part of try blocks is the ability to use the ? operator within them. Any block can return a result, but only function blocks (and try blocks) can propagate an Err with the ? operator.
Not without being able to use the ? operator.
The closest thing I can think of that will let you return a result from within a separate scope using a set of foo()? calls would be a lambda function that's called immediately, but that has its own problems when it comes to moving and it probably doesn't compile to very fast code either. Something like https://play.rust-lang.org/?version=stable&mode=debug&editio...
One reason is that would be a breaking change.
Now that is pretty cool.
Ah yes, do-notation.
There are some situations with tricky lifetime issues that are almost impossible to write without this pattern. Trying to break code out into functions would force you to name all the types (not even possible for closures) or use generics (which can lead to difficulties specifying all required trait bounds), and `drop()` on its own is of no use since it doesn't effect the lexical lifetimes.
Conversely, I use this "block pattern" a lot, and sometimes it causes lifetime issues:
This doesn't work: the memory is owned by the Vec, whose lifetime is tied to the block, so the slice is invalid outside of that block. To be fair, it's probably best to just make foo a Vec, and turn it into a slice where needed.Unless I'm misunderstanding, you'd have the same lifetime issue if you tried to move the block into a function, though. I think the parent comment's point is that it causes fewer issues than abstracting to a separate function, not necessarily compared to inlining everything.
Avoiding that kind of use after free problem is exactly why people choose Rust, isn’t it?
There is some experimental work for that here I believe:
https://doc.rust-lang.org/beta/unstable-book/language-featur...
AFAIU it essentially creates a variable in inner scope but defers drop to the outer scope so that you can return the reference
More significantly the new variables x and y in the block are Drop'd at the end of the block rather than at the end of the function. This can be significant if:
- Drop does something, like close a file or release a lock, or
- x and y don't have Send and/or Sync, and you have an await point in the function or are doing multi-threaded stuff
This is why you should almost always use std::sync::Mutex rather than tokio::sync::Mutex. std's Mutex isn't Sync/Send, so the compiler will complain if you hold it across an await. Usually you don't want mutex's held across an await.
oops: Of course the Mutex is Sync/Send, that's the whole point of a Mutex. It's the std::sync::MutexGuard that's not.
I have been using this in a web application that acquires a lock, retrieves and returns a few variables to the outer scope an then immediately unlocks the mutex again
Can this also affect stack usage? Like if `x` gets dropped before `y` is introduced, can `y` reuse `x`'s stack space (let's assume they are same size/alignment). Or does the compiler already do that if it can see that one is not used after the other is introduced?
Conceivably, yes.
Our codebase is full of this pattern and I love it. Every time I get clean up temporaries and expose an immutable variable outside of the setup, makes me way too happy.
A lot of the time it looks like this:
You can also de-mut-ify a variable by simply shadowing it with an immutable version of itself:
let mut data = foo(); data.mutate(); let data = data;
May be preferable for short snippets where adding braces, the yielded expression, and indentation is more noise than it's worth.
Variable shadowing felt wrong for a while because it's considered verboten in so many other environments. I use it fairly liberally in rust now.
It helps that the specific pattern of redeclaring a variable just to change its mutability for the remainder of its scope is about the least objectionable use of shadowing possible.
Cute, essentially equivalent to Python's inner functions and Go's closures, e.g in Go:
All of these are however poor solutions to the problem, because they're not true nested functions — they can access arbitrary variables defined outside their scope. Python at least restricts their modification, but Go doesn't. I'm guessing in Rust it's at least explicit in some way?In any case, the real solution here is to simply allow proper nested functions that behave exactly like freestanding functions in that they can only access what's passed to them:
This way you can actually reason about that block of code in isolation—same effect as when calling a freestanding function, except this doesn't expose the nested function to callers outside the parent function, which is valuable.Blocks being expressions is one of the features of the Rust language I really love (and yes I know it's not something Rust invented, but it's still not in many other popular languages).
That last example is probably my biggest use of it because I hate having variables being unnecessarily mutable.
In my opinion it's the 'correct' design, I don't see any advantage from not doing this.
For those who might not have seen it, you can use this to make a `while` act like a `do-while` loop by putting the entire body in the boolean clause (and then putting an empty block for the actual body):
This isn't something that often will end up being more readable compared to another way to express it (e.g. an unconditional `loop` with a manual `break`, or refactoring the body into a separate function to be called once before entering the loop), but it's a fun trick to show people sometimes.I love that this is part of the syntax.
I typically use closures to do this in other languages, but the syntax is always so cumbersome. You get the "dog balls" that Douglas Crockford always called them:
``` const config = (() => { const raw_data = ...
})()'const result = config.whatever;
// carry on
return result; ```
Really wish block were expressions in more languages.
By the by, code blocks on here are denoted by two leading spaces on each line
Interesting that you can use blocks in JS:
But I don’t see a way to get the result out of it. As soon as you try to use it in an expression, it will treat it as an object and fail to parse.Yes, I constantly use this pattern in C++/JavaScript, although I haven't tested how performant it is in the former (what does the compiler even do with such an expression?)
At least in simple cases the compiler will just inline the closure, as if it never existed. There shouldn't be any measurable overhead.
https://github.com/tc39/proposal-do-expressions
(Not to be confused with do notation)
Block expression https://doc.rust-lang.org/reference/expressions/block-expr.h...
Also in Kotlin, Scala, and nim.
I think this comes from functional programming. I'd just call it "everything is an expression" (which isn't quite true in Rust but it's a lot more true than it is in traditional imperative languages like C++ and Python).
Not mentioned in the article but kinda neat: you can label such a block and break out of it, too! The break takes an argument that becomes the value of the block that is broken out of.
I just learned this one, and am gradually starting to use it! It applies for loops too. I saw it in ChatGPT code, and had to stop and look it up. Rust is a big language, for worse and for better.
I wouldn't call Rust "a big language" because of labeled break. This is a pretty standard language feature, you can do the same in C (and therefore C++), Go, Javascript, Java, C#...
Those languages don't treat blocks as expressions, so you really can't do the same thing there. Something very similar, yes. But not the same.
Those languages aren't expression-oriented, so you would need to assign the result to a previously-initialized variable in a higher scope. But that just makes this pattern clunkier in those languages. This subthread is about jumping to labels, which is a relatively obscure yet widespread feature supported by many languages (though C and Go allow forward jumps, and the rest only allow backward jumps, since the latter ensures that control flow does not become irreducible).
Very often if you think harder you realise you didn't want this, you should write say, a function (from which you can return) or actually you didn't want to break early at all. Not always, but often. If you write more "break 'label value" than just break then you are almost certainly Doing It Wrong™.
Not having put it into practice yet, there is a pattern I use regularly which I plan to replace with the labeled one: I set a flag at the top of the loop I have an inner loop. The inner loop can set this flag. Directly past the inner loop, I check for the flag, then break. I am pretty sure this is exactly what the labeled break is for.
GCC adds similar syntax as an extension to C: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
It's used all throughout the Linux kernel and useful for macros.
The best part of statement expressions is that a return there returns from the function itself, not from the statement expr.
I use that with with macros to return akins to std::expected, while maintaining the code in the happy-path like with exceptions.
I got so used to taking advantage of this feature in my side projects that my work Kotlin code is now full of “run {}” blocks. Even with a GCed language, it’s very nice to restrict variable lifetimes without needing to split the logic out to its own function.
It's idiomatic in Kotlin as well!
https://kotlinlang.org/docs/scope-functions.html
So many options why oh why. let run with also apply
Each does different things, and Rust also has plenty of them. and_then(), or(), or_else(), then(), the list goes on. Kotlin just implements them more widely.
Actually, Kotlin's with() and apply() are more powerful than what Rust can provide. Then again, Rust isn't designed with OO in mind, so you probably shouldn't use those patterns in Rust anyway.
I think you've misunderstood the point they were making by addressing the number as if it was the only concern and then only mentioning the actual point they were trying to make as if it were an incidental afterthought. I don't think it's likely they're criticizing five functions in the standard library is too many, but that having five special functions with certain semantics that only apply to them is too many. The methods you mention in Rust are all in the first category; you could easily write them yourself for any type you define without needing to resort to wrapping any of them. It's not clear to me that someone could write a function in Kotlin with special scoping semantics around an object without resorting to wrapping one of those functions.
The Kotlin functions are actually quite easy to write, they're all written in standard Kotlin.
also: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
apply: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
let: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
with: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
run (two overloads): https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std... and https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
These all heavily rely on Kotlin's ability to write an extension function for any class. When you write `with(x) { something() }` you're extending the type of `x` (be that int, List<String>, or SomeObject) with an anonymous method, and passing that as a second parameter.
Consider the signature here:
The first object is a generic object T, which can be anything. The second is a member function of T that returns R, which again can be just about anything, as long as it operates on T and returns R.Let does it kind of diferently:
This is an extension method that applies to every single class as T isn't restricted, so as long as this function is in scope (it's in the standard library so it will be), every single object will have a let() method. The only parameter, block, is a lambda that takes T and returns R.So for instance:
is syntactic sugar for something like: You could absolutely write any of these yourself. For instance, consider this quick example I threw together: https://pl.kotl.in/S-pHgvxlXThe type inference is doing a lot of heavy lifting, i.e. taking a lambda and automatically turning it into an anonymous extension function, but it's nothing that you cannot do yourself. In fact, a wide range of libraries write what might look like macros in Kotlin by leveraging this and the fact you can define your own inline operators (i.e. https://pl.kotl.in/TZB0zA1Jr).
This isn't possible in many other languages because taking a generic type definition and letting it possibly apply to every single existing type is not exactly popular. Combined with Kotlin's ability to extend nullable types (i.e. this = null) as well makes for a language system that wouldn't work in many other flexible languages.
Fair enough, I retract my previous comment. Unfortunately there seem to a lot of pieces that are unfamiliar here so I'm not really able to understand parts of this but I trust that you understood what I was saying well enough to know that it was wrong.
I agree, i started with (scope) blocks in Rust, but keep the habit in Kotlin win the run - scope-function. Since run takes no arguments, it feels like the closest equivalent to Rust scopes (compared to other Korlin scope functions, which also keep their local variables from polluting the rest of the function body).
This seems like a great way to group semantically-related statements, reduce variable leakage, and reduce the potential to silently introduce additional dependencies on variables. Seems lighter weight (especially from a cognitive load perspective) than lambdas. Appropriate for when there is a single user of the block -- avoids polluting the namespace with additional functions. Can be easily turned into a separate function once there are multiple users.
This is also somewhat common in c++ with immediate-invoked lambdas
The same pattern can also be useful in Rust for early returning Result<_,_> errors (you cannot `let x = foo()?` inside of a normal block like that).
would fail to compile, or worse: would return out of the entire method if surrounding method would have return type Result<_,i32>. On the other hand, runs just fine.Hopefully try blocks will allow using ? inside of expression blocks in the future, though.
A blog post for it from a prominent c++er https://herbsutter.com/2013/04/05/complex-initialization-for...
Yeah but languages that make you resort to this then don't let you simply return from the block.
And the workarounds often make the pattern be a net loss in clarity.
This is one of those natural consequences of "everything is an expression" languages that I really like! I like more explicit syntax like Zig's labelled blocks, but any of these are cool.
Try this out, you can actually (technically) assign a variable to `continue` like:
let x = continue;
Funnily enough, one of the few things that are definitely always a statement are `let` statements! Except, you also have `let` expressions, which are technically different, so I guess that's not really a difference at all.
I'm not sure why you picked continue here? All the diverging control flow instructions have the same type, ! aka "Never". In stable Rust you're not allowed to use its name but it's "just" an empty type and you can easily make one of those yourself - an enum with no variants.
From the article:
This idiom has been discussed and codified in various languages for many years. For example, Scala has supported the same thusly: Java (the language) has also supported[0] similar semantics.Good to see Rust supports this technique as well.
0 - https://docs.oracle.com/javase/tutorial/java/javaOO/initial....
I often employ this pattern in Ruby using `.tap` or a `begin` block.
It barely adds any functionality but it's useful for readability because of the same reasons in the OP.
It helps because I've been bitten by code that did this:
That's all fine until later on, probably in some obscure loop, `i_think_this_is_setup` is used without you noticing.Instead doing something like this tells the reader that it will be used again:
I now don't mentally have to keep track of what `setup_a` or `setup_b` are anymore and, since the writer made a conscious effort not to put it in the block, you will take an extra look for it in the outer scope.JavaScript chiming in...
Used to do this occasionally for exactly the same reasons- don't leave dangling variables junking up your scope, and don't make weirdo functions with parameter passing that you'll only ever call once!Clojure also has the threading macro -> and ->> which are great at converting exactly the same type of code into a stream of modifications instead of breaking out everything into variables. Naming things can be very useful sometimes but sometimes it is entirely gratuitous and distracting to have
let input = read_input(); let trimmed_input = input.trim(); let trimmed_uppercase_input = trimmed_input.uppercase();
...
The extra variable names are almost completely boilerplate and make it also annoying to reorder things.
In Clojure you can do
(-> (read-input) string/trim string/upcase)
And I find that so much more readable and refactorable.
We do this via run in TS:
Can you clarify why do you prefer this over an IIFE `(() => {...})()`?
I like it. IIFEs always make me nervous because they look like they beg to be removed if you don't know why they are used. Using an explicit function such as `run` looks much more intentional, and provide a single intuitive place (the documentation of the `run` function) to explain the pattern.
The first example given is not at all convincing. Its is clear as the sky that loading the config file should be be a separate function of its own. Coupling sending HTTP requests with it makes no sense.
The second example "erasure of mutability" makes more sense. But this effectively makes it a Rust-specific pattern.
It's essentially an inline function with only 1 client. Can be a preference for inline readability and automatically enforces there are no other clients of the "function".
Reminds of Brian Wills OOP rant video from 2016. He advocates exactly for this pattern: https://www.youtube.com/watch?v=QM1iUe6IofM&t=2235s
This is Rusts OCaml roots showing :)
> This is why I generally avoid C’s “bottom-up” strategy for organizing code.
I think the author misunderstood something....
Yeah, language choice and the way your organise your code seem orthogonal to me
When would you use the block pattern vs creating a new function?
I use this all the time. It's features like these that sell Rust for me honestly; even if you wrapped your whole program in `unsafe` it would still be a massively better language than C++ or C.
C++ lambdas can be used to achieve a similar result, not as pretty though https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines... But in general I agree!
The fact that you can't return from there makes for a huge difference, though,
Many languages use this idiom. Some popular ones even. So while it's good that Rust joined them, it's hardly a differentiator.
It's a differentiator wrt C and C++, is what I said.
Yes, sadly this isn't a part of standard C or C++.
It is available as a language extension in Clang and GCC and widely used (e.g. by the Linux kernel).
Unfortunately it is not supported by the third major compiler out there so many projects can't or don't want to use it.
In Rust everything is an expression, yes.
Almost everything.
I feel like indentation is a really useful structural signal that has been hijacked, in C-family languages, by unnecessarily strict conventions and most recently by autoformatters, to correspond exclusively to language structure, when it could be used for semantic structure as well (or occasionally instead).
Much of the value of this block pattern is that it makes the scope of the intermediate variables clear, so that you have no doubt that you don’t need to keep them in mind outside that scope.
But it’s also about logical grouping of concepts. And that you can achieve with simple ad hoc indentation:
(Aside: that code is dreadful. None of the inner-level comments are useful, and should be deleted (one of them is even misleading). .multi_line(true) does nothing here (it only changes the meanings of ^ and $; see also .dot_matches_new_line(true)). There is no binding config_string (it was named data_string). String::from_utf8 doesn’t take a reference. fs::read_to_string should have been used instead of fs::read + String::from_utf8. Regex::replace_all was presumably intended.)It might seem odd if you’re not used to it, but I’ve been finding it useful for grouping, especially in languages that aren’t expression-oriented. Tooling may be able to make it foldable, too.
I’ve been making a lightweight markup language for the last few years, and its structure (meaning things like heading levels, lists, &c.) has over time become almost entirely indentation-based. I find it really nice. (AsciiDoc is violently flat. reStructuredText is mostly indented but not with headings. Markdown is mostly flat with painfully bad and footgunny rules around indentation.)
—⁂—
A related issue. You frequently end up with multiple levels of indentation where you really only want one. A simple case I wrote yesterday in Svelte and was bothered by:
In some ancient code styles it might have been written like this instead: Not the prettiest due to the extra mandatory curlies, but it’s fine, and the structure reasonable. In Rust it’s nicer: But rustfmt would insist on returning it to this disappointment: Perhaps the biggest reason around normalising indentation and brace practice was bugs like the “goto fail” one. I think there’s a different path: make the curly braces mandatory (like Rust does), and have tooling check that matching braces are at the same level of indentation. Then the problem can’t occur. Once that’s taken care of, I really see no reason not to write things more compactly, when you decide it is nicer, which I find quite frequently compared with things like rustfmt.I would like to see people experiment with indentation a bit more.
—⁂—
One related concept from Microsoft: regions. Cleanest in C♯, `#region …` / `#endregion` pragmas which can introduce code folding or outlining or whatever in IDEs.
Scala has this too, it's extremely useful
I think the technique is important to have in your vocabulary, but I think the examples given are a weak sell.
In the example given, I would have preferred to extract to a method—-what if I want to load the config from somewhere else? And perhaps the specific of strip comments itself could have been extracted to a more-semantically-aptly named post-processing method.
I see the argument that when extracted to a function, that you don’t need to go hunting for it. But if we look at the example with the block, I still see a bunch of detail about how to load the config, and then several lines using it. What’s more important in that context—-the specifics of the loading of config, or the specifics of how requests are formed using the loaded config?
The fact that you need to explain what’s happening with comments is a smell. Properly named variables and methods would obviate the need for the comments and would introduce semantic meaning thru names.
I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method. For example, you can write a block to capture a bunch of values for logging context—-then you can call that block in every log line to get a logging context based on current method state. It totally beats extracting a logging context method that consumes many variables and is unlikely to be reused outside of the calling method, and yet you get delayed evaluation and single point of definition for it.
So yes to the pattern, but needs a better example.
> what if I want to load the config from somewhere else?
There are DRY and WET principles. We can argue which one of them is better, but to move something used exactly once to a method just due to an anxiety you can need it again seems to me a little bit too much. I move things into functions that are called once, but iff it makes my code clearer. It can happen when code is already complicated and long.
The block allows you to localize the code, and refactoring it into a separate function will be trivial. You need not to check if all the variables are temporary, you just see the block, copy/paste it, add a function header, and then add function call at the place where the block was before. No thinking and no research is needed. Veni, vidi, vici.
> The fact that you need to explain what’s happening with comments is a smell.
It is an example for the article taken out of a context. You'd better comment it for the sake of your readers.
> I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method.
I do it each time I need a temporary variable. I hate variables that exist but are not used, they make it harder to read the code, you need to track temporaries through all the code to confirm that they are temporaries. So even if I have just two local variables (not "a lot of") and one of them is temporary, I'd probably localize the temporary one even further into its own block. What really matters is a code readability: if the function has just three lines, it doesn't matter, but it becomes really ugly if a lifetime of a variable overshoots its usefulness for 20 lines of a dense code.
The other thing is mutability/immutability: you can drop mutability when returning a value from a block. Mutability makes reasoning harder, so dropping it when you don't need it anymore is a noble deed. It can and will reduce the complexity of reading the code. You'll thank yourself many times later, when faced with necessity to reread your own code.
There is a code and there is the process of devising the code. You cannot understand the former without reverse engineering the latter. So, when you write code, the more of your intentions are encoded somehow in your code, the easier it will be to read your code. If you create temporary variables just to parse config with the final goal to get the parsed config in a variable, then you'd better encode it. You can add comments, like "we need to parse config and for that we need three temporary variables", or you can localize those three temporary variables in a block.
This is a great addition to the best patterns and practices in Rust. Worth noting and using. In JavaScript there's the proposal of "do expressions" which accomplish the same.
Obligatory use: it’s a block I guess
Voluntary use: I know this one. It’s a pattern now.