What is `Box<str>` and how is it different from `String` in Rust?

215 points by asimpletune 3 years ago

I recently gave a Rust workshop to Kotlin and Swift developers. Strings in Rust are a really, really difficult topic for complete newcomers because they're understood as a basic type whereas in Rust they require having read half the Rust book to grasp.

Consider: I can teach a lot of Rust basic with `usize`. Defining funcions, calling functions, enums because they're `Copy` and because there's only one type. String requires knowing about &str which requires knowing about deref which requires knowing about (&String -> &str), it also requires understanding lifetimes, moving, heap and stack, cloning. Then, if you want to work with the file system you also need to understand Paths, OsString and AsRef.

With Kotlin and Swift, for all these things, you really just need one type, String, and you handle it just like usize.

It is really a bid of a hurdle for new developers coming from higher level languages (especially if they just give it a quick try).

nicoburns 3 years ago

On the plus side, String makes a really good example to explain ownership, moving, stack vs heap, etc. All of which you need at least a basic understanding of to do anything non-trivial in Rust.
I kind of feel like it goes without saying that Rust isn't ideal for beginners. For developers who already have a good knowledge of other languages I feel like learning about these things shouldn't be a problem, as becoming familiar with these concepts is one of the main benefits of learning Rust.
- smaddox 3 years ago
  
  > I kind of feel like it goes without saying that Rust isn't ideal for beginners.
  I think that depends on, first, what the goal is, and second, what you're comparing to. It think Rust is easier on beginners, in many ways, than C. And C is easier on beginners, in many ways, than assembly or machine code. But if you want to really understand computer programming, starting at machine code or at least assembly isn't a crazy way to start.
  
  msla 3 years ago
  
  > But if you want to really understand computer programming, starting at machine code or at least assembly isn't a crazy way to start.
  I've long suspected that the CS field was founded on two approaches: The people who started from EE and worked their way up, and the people who started from Math and worked their way down. The former people think assembly is the "real" way to approach software, and probably view C++ as "very high-level", whereas the latter people think everyone should start with a course on the lambda calculus and type systems and gradually ease into Haskell, work down to Lisp, and then maybe deign to learn Python for *shudder* numerical work.
  
  kenward 3 years ago
  
  Your comment reminded me of this article[1] that has probably been posted plenty of times on HN. You've described both the "hacker" and the "mathematician" tribes.
  [1] https://josephg.com/blog/3-tribes/
  
  nicoburns 3 years ago
  
  I'd argue there's also a 3rd foundation of CS: language. Programming languages really are languages in the general sense of the word, and their purpose is to allow humans to effectively communicate with machines. Focussing on optimising that communication is the 3rd approach.
  
  eru 3 years ago
  
  Oh, that's just a subtype of mathematician.
  Basically in computing, there's mathematicians who want to deal with languages and those who want to deal with numbers.
  Math itself is all about communication. Finding a new theorem is neat, but it's only proper math once you found a proof for it that you can communicate to other people.
  
  nicoburns 3 years ago
  
  > It think Rust is easier on beginners, in many ways, than C. And C is easier on beginners, in many ways, than assembly or machine code. But if you want to really understand computer programming, starting at machine code or at least assembly isn't a crazy way to start.
  I mean sure. But equally, starting with Python isn't a crazy way to start. And Python is much easier language to learn than any of those (esp. if you want to actually create something practical with it).
  
  hgomersall 3 years ago
  
  Sure, but if your objective is systems programming, you'll probably quickly get to the point of realising python is not the right choice.
  
  nicoburns 3 years ago
  
  If your objective is specifically systems programming then you'll quickly outgrow python, but I'm not convinced that makes it the wrong starting point. For systems programming you'll likely need both high-level and low-level programming concepts. Learning low-level first is absolutely a valid path, but my point is that going high-level first is equally valid. People on the internet like to make out like someone who starts out by learning Python are incapable of later learning low-level concepts, but if anything they're at an advantage compared with someone with no programming experience at all.
  
  zarzavat 3 years ago
  
  It’s easier to learn low level first because then you are going “downhill” as you move into higher level languages. For a Python programmer, the borrow checker is terrifying and confounding “Why do I have to do this lifetime nonsense just to make a string? I don’t have to do that Python!”
  For a C programmer, the Rust borrow checker is likely to elicit feelings of relief and jubilation “You mean I don’t have to spend another afternoon in valgrind tracking down use after free bugs? Awesome!”
  
  nicoburns 3 years ago
  
  My experience hanging out in r/rust for the last few years is that it’s C, C++, and Java programmers who have the most trouble with the borrow checker. Many C/C++ programmers are used to being able to play fast and loose with pointers, and find the borrow checker restrictive because it won’t let them use the patterns they’re used to using. Java programmers are used to an object oriented style with mutable references to objects all over the place (which you can’t do in Rust).
  OTOH hand languages like python (and especially JavaScript where pipelines of pure data transformations are already idiomatic) don’t use references much anyway, so use of them is new and doesn’t come with so many expectations.
  
  zarzavat 3 years ago
  
  There’s C++ programmers and then there’s C++ programmers.
  The C++ programmers who have been using std::unique_ptr have absolutely the easiest time learning Rust as Rust is basically C++11 smart pointers on steroids. These C++ programmers usually have the opposite problem, there’s things you are allowed to do in Rust, that you can’t safely do in C++ without copying.
  Then there are C++ programmers who are still using C++98 style for whatever reason (commonly gamedev) and they don’t benefit from any similarities because their style of C++ is so old that there aren’t any similarities.
  
  nicoburns 3 years ago
  
  > There’s C++ programmers and then there’s C++ programmers.
  Indeed, there's a subset of C++ (and C) programmers who get Rust almost immediately, because they've effectively been informally using a similar ownership model in their C++ (/C) code already. But there's another subset who write "fast and loose" C++ (/C) on the principle that "it's fine if it works at runtime", and they tend to really struggle.
  
  eru 3 years ago
  
  I started with BASIC on a C64. BASIC was always supposed to damage your brain.
  So far, C, Scheme, Python, Haskell, Erlang, Rust etc haven't been too hard. (Though I'm not sure I'm cut out for C++'s antics or Java's dependency injection.)
  
  nvrspyx 3 years ago
  
  This is just my opinion, but I can't imagine systems programming being the objective of any beginner. A beginner probably wouldn't even be able to differentiate systems programming from applications programming.
  
  eru 3 years ago
  
  Depends. I can totally imagine someone who has no clue about programming, but is fascinated by what the demoscene people do with a Super Nintendo.
  Extremely low level programming would be their motivating goal, even if they don't know it yet.
  
  blub 3 years ago
  
  The domain drives programming language choice, so if they’re fascinated by the demoscene they’d look into C or assembly, I imagine.
  On the other hand, if they’re fascinated about cryptocurrencies, Rust would be a logical choice.
  
  benj111 3 years ago
  
  I can't speak for others but I started programming to understand how all the low level stuff works.
  I think there's merit in learning assembler because you then appreciate what all the other languages are doing under the hood, but then python is also good for getting something done quickly.
  I suppose it's what scratches you itch. If you want to print out "hello world" quickly and easily then python is a better bet. If you want to know the mechanics, assembler is the better bet.
  
  jollybean 3 years ago
  
  When we teach software were starting with 'functions' 'variables' and 'algorithms' not specific kinds of programming.
  
  hgomersall 3 years ago
  
  And ownership, types, generics and concurrency... Essentially, we're teaching some aspects of computer science. The more you can express, arguably the more you can learn. I certainly learned a huge amount when learning rust. It would have been valuable to me to know that 15 years ago.
  
  jollybean 3 years ago
  
  I'd argue that ownership and even generics are not really fundamental to computer science.
  
  pjmlp 3 years ago
  
  Depends, if writing a compiler is still considered systems programming in modern times.
  https://www.amazon.com/Writing-Interpreters-Compilers-Raspbe...
  
  less_less 3 years ago
  
  Compilers are their own beast — I wouldn't put them with systems code. They're pretty different from an OS, BLAS, machine learning kernel, game engine, network stack, database or what have you. There's not as much buffer management, speed and memory aren't usually as critical, you don't make direct syscalls, many structures are graphs rather than arrays, etc. They often aren't even multithreaded.
  It's also popular to write compilers in distinctly non-"systems-y" languages, most notably Standard ML but also eg Haskell, and lots of languages are self-hosted.
  
  travisgriggs 3 years ago
  
  I think compilers are a strong sub field of meta programming.
  I see meta programming as anything that deals directly or indirectly with ASTs. Compilers (er parsers and linkers) make ASTs and transform them. Refactoring engines (should) work with them.
  Sadly, I’m not sure what language I’d recommend as ideal for this (meta programming) now days. I think it’s zen cool when a language like Lisp or Zig or Elixir slides sideways in and out of meta programming, but that doesn’t mean they have good ASTs to work with, or that they’re ideal as a pseudo language to manipulate them. It should be something that is both not too complex, but also not so zen abstract that you have to bootstrap meaningful things endlessly.
  I personally liked the Smalltalk AST. But I couldn’t speak to its pedagogical or industry value as a meta programming environment.
  
  eru 3 years ago
  
  Are you talking about the tools that eg Lisp has to manipulate arbitrary ASTs, or are you talking about the AST required to represent Lisp code?
  Racket (a Lisp) and ML-family languages like Haskell are really good at manipulating arbitrary ASTs and similar structures. ML was even invented to do exactly that, the letters stand for meta-language.
  I do agree that the structures you need to represent Haskell would be rather complicated: Haskell is a rather complex language after all.
  I'm not sure what you mean by having to bootstrap meaningful things endlessly? I can understand that eg C is pretty limited, so you have to put in lots of effort to bootstrap to something meaningfully. But most higher level languages would be doing just fine. And you can also use libraries:
  Eg no need to write your own parser from scratch, if you can just use a parser combinator library like parsec. Also no need for a beginner to write native code generation from scratch: just use a library to interface with llvm.
  
  eru 3 years ago
  
  Yes, compilers at most have a bit of overlap with systems programming, but don't really have that much to do with it.
  The overlap comes from two places:
  (1) If you want to generate fast code, you need to know how your target works. That's pretty low level, and could be considered systems programming?
  (2) Many people want compilers themselves to run fast. Here again, the way to get the most speed out of your programs is often to go low level (and again we could see low level as a synonym for systems programming?)
  
  pjmlp 3 years ago
  
  At which level would you place the OS linker required by the compiler?
  Given that Bastion written in XNA, and widely successful, then from your list C# is a system programming language.
  Or maybe Java is one, given Minesweeper and bare metal deployments like those sold by PTC and Aicas.
  
  amelius 3 years ago
  
  > Compilers are their own beast — I wouldn't put them with systems code.
  Unless your systems code runs using a JIT.
  
  tialaramex 3 years ago
  
  Beginning with machine code for some simple architecture (maybe RISC-V these days?) might be one good route in.
  I can also see (having experienced it myself, albeit I already knew C etc. these were not requirements and many of my classmates did not) beginning with a pure functional language where all the practicalities are abstracted entirely.
  Today the University where I learned this begins with Java, which I am confident is the wrong choice, but the person who part-designed their curriculum, and is a friend, disagrees with me and he's the one getting paid to teach them.
- thejosh 3 years ago
  
  Rust is a hard language to learn, but once I got over the initial hump of learning all the things that don't exist in higher languages/that I've forgotten about it's a great language.
  The best thing I did was start doing minor contributions to opensource Rust libraries, I did a small PR for arrow2 and polars which taught me a lot, and also the maintainers and contributors there are incredibly friendly and helpful.
- jollybean 3 years ago
  
  'Becoming familiar with Rust, is one of the main benefits of learning Rust'
  The problem is that we seem to believe there is something 'inherent' about the way Rust solves these problems, and that by doing so with Rust lifetimes, we solve some kind of inherent issue in software.
  This makes us believe that Rust = Value in some ways which I believe is wrong.
  Rust is only 'one way' - and it may be far more complicated than it needs to be and in many cases more complicated than it is worth.
  I feel that Rust may be an experiment, our 'first attempt' at advanced compile time safety, like it's a kind of C++ whereas the Java/C# of memory safety will happen once we've ingested all of these lessons.
  Funny I feel that with Rust 'There's a much smaller and clearer language struggling to come out' like what Stroustrup said about C++.
  
  hedora 3 years ago
  
  Coming from a C++ background makes me think Rust is the smaller and clearer language that came out of C++.
  The only successful C++ code bases I've worked on were multithreaded, and most were also asynchronous.
  If you assume those design decisions are required, then you'll find that programmers have to reinvent the borrow checker in their head and manually apply it to code in whatever language they're using.
  That's incredibly boring and tedious, and people are demonstrably terrible at doing it. Rust automates away all that stuff.
  I don't know of any approach to threading or I/O that doesn't rely heavily on reasoning about ownership and mutability. In that sense there really is only one way to do it.
  
  eru 3 years ago
  
  > I don't know of any approach to threading or I/O that doesn't rely heavily on reasoning about ownership and mutability. In that sense there really is only one way to do it.
  Erlang's message passing style is interesting here. And so is eg Software Transaction Memory that you can try in Haskell.
  Have a look at deterministic parallelism, too.
  
  jcelerier 3 years ago
  
  > Erlang's message passing style is interesting here.
  it just hides who owns the message data object though, no ? The problem is still here.
  e.g. for instance I work on a C++ software where there is a lot of message passing across threads, but the OS thread on which "malloc" and "free" are called for the messages and any data they may contain really, really matters - some threads must never call "malloc" or "free" under any circumstance. Would I be able to manage this with Erlang, knowing that every thread should be able to send message to pretty much any other ?
  Also, looking at that code... https://www.erlang.org/blog/message-passing/ I don't really see what this gains me over the trivial C++ solution:
  processing_thread.message_queue.send([=] (Context& ctx) { int x = ... some computation ... ctx.ui_thread.message_queue.send([x] { ... show some info ... }); });
  which can become even simpler with coroutines in C++20 if one does not like callback style
  
  eru 3 years ago
  
  > it just hides who owns the message data object though, no ? The problem is still here.
  Erlang messages have a semantic as-if there are copied. (Because that's the only way to do it over a network, and Erlang wants to be agnostic over whether two 'processes' are on the same machine or not.)
  Erlang doesn't need to worry too much about ownership, because it has a garbage collector.
  If you want to explicitly control which thread uses malloc/free, then Erlang isn't at the right level of language in this case.
  
  mwcampbell 3 years ago
  
  > Erlang messages have a semantic as-if there are copied. (Because that's the only way to do it over a network, and Erlang wants to be agnostic over whether two 'processes' are on the same machine or not.)
  I think that approach could be called pessimization. The assumption that some components are on the same machine, or better yet the same address space, unlocks some significant optimizations. Sometimes you can be sure that a particular system won't need to scale beyond one machine. And of course, the more efficient the implementation is, the bigger a system can get without having to scale beyond one machine. (Aside: I was reading earlier today about how Uber heavily uses microservices. I wonder if Uber could be a monolith running on one machine given the right architecture and implementation.)
  
  dralley 3 years ago
  
  Perhaps that's true, but the telecommunications industry is the heaviest user of Erlang and it seems to work fine for them. I'm assuming optimize for "must not go down, EVER" over raw performance, and the Erlang approach works well for that.
  
  deepsun 3 years ago
  
  > I wonder if Uber could be a monolith running on one machine given the right architecture and implementation.
  You seem to simplify Uber's tasks to only ridesharing. But there are many other typical tasks for large tech orgs. For example (never worked at Uber, taking from my imagination):
  1. Handle ride-sharing business (the easiest part maybe).
  2. Make a pipeline to extract data for legal requests from courts, processing all Uber's historical logs.
  3. CEO asked to make a forecast (like chart), using billing data from their financial processor.
  4. Serve video and quizes and check tests during driver's onboarding process.
  5. Monitor/alert internal network for network security purposes.
  6. Gather some analytics over Uber's cloud resources spent and give recommendations on where to optimize.
  Not sure how to pack those all into a monolith.
  
  eru 3 years ago
  
  > Aside: I was reading earlier today about how Uber heavily uses microservices. I wonder if Uber could be a monolith running on one machine given the right architecture and implementation.
  Throughput-wise, that might be possible for all I know.
  For latency reasons, you'd probably want to have at least one monolith per city. Similar objections apply for redundancy and resilience.
  Splitting your system into individual services ensures that there are well-defined interfaces between the different parts.
  In principle, you can always wring more performance out of a system, if you are allowed to break down these interfaces. But that makes maintaining and further developing these systems harder and harder to do for mere mortals.
  Have a look at how much of a kludgy mess our genome is for an example.
  ---
  In principle, you could write something as individual services but compile them into a single binary that runs in a single memory space on bare metal (see unikernel architecture). A sufficiently strong type system could make sure that any one service being buggy wouldn't bring down the rest of the application.
  You could probably even work on a system like the above that still allows you to replace individual parts without restarting everything else.
  This could be much faster and smaller than traditional micro-services that live in separated processes or even machines, but even this utopian system would still be hampered by the conceptual walls between services that can only talk to each other via well-defined interfaces. (Even if your compiler is smart enough to do a lot of inlining etc.)
  ---
  The discussion reminds me of exokernels. Have you heard of them?
  Basically the idea is that traditionally operating systems are supposed to have at least two functions: abstract hardware, and safely multiplex different uses and users.
  Serving two masters means serving neither of them well.
  So the exokernel people say: let's just use libraries for abstracting over hardwarde, and let the operating system present the hardware as raw as possible and concentrate on safely multiplexing.
  See eg https://www.classes.cs.uchicago.edu/archive/2019/winter/3310...
  
  jollybean 3 years ago
  
  "you'll find that programmers have to reinvent the borrow checker in their head and manually apply it to code in whatever language they're using."
  There are definitely other ways to do things in concurrent programming, such as isolating data structures to a single thread etc. and other forms of synchronization etc..
  
  verdagon 3 years ago
  
  This is actually what we're doing in Vale. [0]
  Rust is a stellar language, and there's a lot we as a field can learn from it. Its borrow checker is pretty amazing for optimization, but it can be a detriment for a program's overall architecture, not to mention the difficult learning curve, which people tend to underestimate as an issue.
  I want Rust to grow and succeed in the realms it's uniquely suited for, but I think we can make a general purpose programming language that combines its features in a new way and brings its strengths to the rest of the programming world.
  For example, we found a way to recreate the borrow checker based on regions [1] on top of a foundation of shared mutability with reference counting (or generational references [2] in Vale's case).
  Another example is inspired by Rust's RefCell; we found a way to decouple it from Rust's usual aliasability-xor-mutability rules to make it more flexible, in something we call Hybrid-Generational Memory. [3]
  If we succeed, then we'll have found a way to get the borrow checker's benefits without its complexity. We're a little over halfway done with implementing the region borrow checker [4] and haven't broken ground on HGM yet, but we're well on our way.
  Hopefully Vale will be the smaller, clearer language that's struggling to come out of Rust!
  [0] https://vale.dev/
  [1] https://verdagon.dev/blog/zero-cost-refs-regions
  [2] https://verdagon.dev/blog/generational-references
  [3] https://verdagon.dev/blog/hybrid-generational-memory
  [4] https://github.com/ValeLang/Vale/tree/master/Backend/src/reg...
  
  avgcorrection 3 years ago
  
  Big claims. Are all of these things as zero-cost as the things that Rust uses?
  
  verdagon 3 years ago
  
  Rust has some overhead for safety (bounds checking) and Vale will have some too (generation checking) but it's hard to say how much. If we get within even 2% of Rust, we'll call it a win for the simplicity we can bring to the programmer. Fingers crossed!
  
  amelius 3 years ago
  
  How efficient is this in a multithreaded environment? Can immutable data be accessed from multiple threads without much cost?
  
  verdagon 3 years ago
  
  Similar to Rust, each thread would have separate memory. Data would be lent or passed to other threads via channels, mutexes, or structured concurrency.
  
  amelius 3 years ago
  
  What if you have a large map that represents an in-memory read-only database, and you wish to share its contents between different threads, where each thread can perform complicated queries? Would you need to duplicate it?
  
  jollybean 3 years ago
  
  I can't speak for the author, but it would seem to me that some kind of mutex or whatever can be used in situations were you need them.
  
  jollybean 3 years ago
  
  That is extremely cool, 'region locking' is something I've wondered about myself.
  Good luck.
  
  verdagon 3 years ago
  
  Thanks!
  
  amelius 3 years ago
  
  Yes. Rust doesn't even have a good GUI yet, which is telling because it clearly shows that while Rust may be good for some systems stuff, it may not be very good at everything else.
  
  dureuill 3 years ago
  
  I mean another explanation is that GUIs take a tremendous amount of time and investment, especially in today's environment.
  C++ was out in 1985, and qt's first version wasn't until 1995, with kde starting in 1997. And these were simpler times.
  I'm very not convinced that lack of GUI is a rust specific problem
  
  mwcampbell 3 years ago
  
  I think a more charitable interpretation is that the developers currently working on Rust GUI toolkits are still trying to figure out what a good Rust-native GUI looks like. I'm optimistic that the results will be worth the wait.
  
  jollybean 3 years ago
  
  It really doesn't make a whole lot of sense to build a UI in rust.
  UIs generally change a lot, errors aren't so critical, and there are types of abstractions and complexity that just don't exist at the lower layers.
  You could so some of the 'core' stuff in rust, but people building UIs would want to use a higher level language.
  Imagine writing websites in C. Why would you want to do that, even if you could (aside from novelty).
  It's quite beyond the scope of rust.
- bsder 3 years ago
  
  > I kind of feel like it goes without saying that Rust isn't ideal for beginners.
  I agree if you mean: "Beginners should be using Python, Ruby, Java, etc."
  People who suggest that "Beginners should be using C." have blocked out from their memory the trauma that is "Segmentation Fault (core dumped)."
  
  dralley 3 years ago
  
  Counterpoint: Beginners should use C just long enough to be traumatized by "Segmentation Fault (core dumped).", so that they can appreciate the affordances that higher level languages offer (but hopefully not long enough that they run away from the whole subject)
tialaramex 3 years ago

I think I'd recommend teaching Move semantics not Copy semantics from the outset, because Move semantics work fine everywhere in Rust and the Copy semantics are just an optimisation. As you've found, if you teach Copy then for types which aren't Copy you now need to teach Move.
Languages like Kotlin and Swift are doing a lot of lifting to deliver this behaviour for String, and of course they can't keep it up, so students who've done more than a little Kotlin or Swift will be aware of the idea of "reference semantics" in those languages where most of the objects they use do not have the behaviour they've seen in String which is instead pretending to be a value type like an integer.
Again, if you only teach Move, you're fine. After not very long a student will wonder how they can duplicate things (since they didn't know Copy), and you can show them Clone. Clone works everywhere. Is cloning a usize idiomatic Rust? No it is not. Does it work just fine anyway? Of course it does! And of course Clone is implemented for String, and for most types beginners will ever see.
- hgomersall 3 years ago
  
  Are copy semantics always used in place of move semantics for a Copy type? I didn't know that.
  
  tialaramex 3 years ago
  
  Literally all that Copy does is it says after assignment the moved-from variable can still be used. So in this sense, sure, these semantics are "always used". But if you don't use the variable after assigning from it, you could also say the semantics aren't used in this case. Does that help? Copy does a lot less than many people think it does.
  If you're a low level person it's apparent this is because Copy types are just some bits and their meaning is literally in those bits, Copy the bits and you've copied the meaning. Thus, this "it still works after assignment" Copy behaviour is just how things would work naturally for such types. But Rust doesn't require programmers (and especially beginners) to grok that.
  It's possible to explain Copy semantics first in a way that's easier to grasp for people coming from, say, Java, but that's only half the picture because your students will soon need Move semantics which are different. Thus I recommend instead explaining Move semantics from the outset (which will be harder) and only introducing Copy as an optimisation.
  I think this might even be better for students coming from C++, because C++ move semantics are a horrible mess, so underscoring that Move is the default in Rust and it's fine to think of every assignment as Move in Rust will avoid them getting the idea that there must be secret magic somewhere, there isn't, C++ hacked these semantics in to a finished language which didn't previously have Move and that's why it's a mess.
  I'm less sure for people coming from low-level C. I can imagine if you're going to work with no_std on bare metal you might actually do just fine working almost entirely with Copy types and you probably need actual bona fide pointers (not just references) and so you end up needing to know what's "really" going on anyway. If you're no_std you don't have a String type anyway, nor do you have Box, and thus you can't write Box<str> either, although &str still works fine if you've burned some strings into your firmware or whatever.
  
  afdbcreid 3 years ago
  
  This isn't really something you usually encounter, but I have to bring this cute example:
  pub fn foo() -> impl FnOnce() { let non_copy: String = String::new(); let copy: i32 = 123; || { drop(non_copy); // Works drop(copy); // error[E0373] } }
  https://play.rust-lang.org/?version=stable&mode=debug&editio...
  
  nyanpasu64 3 years ago
  
  This is another reason that I find (optional but first-class) explicit lambda captures (adding syntax for C++'s approach) is better, because it prevents this kind of implicit and surprising behavior.
  
  nynx 3 years ago
  
  There’s no implicit behavior here. It won’t compile if you don’t explicitly move the string into the closure.
  
  nyanpasu64 3 years ago
  
  The implicit behavior is that using a string moves it into the closure, and does compile even without `move || {...non_copy}` (which surprised even me), whereas using an integer takes a reference, and you have to "slap on `move` to make it work".
  
  afdbcreid 3 years ago
  
  The surprising part is that it does not compile with `Copy`, not that it compiles with `Copy`. The rules are that if a move a necessary then it is performed. The tricky part is that move is not necessary for `Copy` types, because you can capture a reference and still get an owned value.
lijogdfljk 3 years ago

Makes me wonder if there could be room for a SimpleString library.
I love/use Rust. I don't think any of this is complicated. BUT, i'm a big fan of just "clone your problems away" for beginner Rust users. Going knee deep into techniques which merely reduce memory usage when people likely don't actually care - at all - about it just feels wrong to me.
So yea, maybe a cursed library where SimpleString is just some niceties around some Cow + Arc thing which is also Copy. Hell, you could probably just apply it Vec and who knows what else.
Anyway, clearly not something i'm advocating anyone _really_ use. But it seems a nice way to make stuff "Just Work" in the beginning.
- kzrdude 3 years ago
  
  Some weird construction around Cow + Arc that is also Copy is not really possible in Rust, I'm sorry to report. No way to implement it and even if you could (you technically "can" by reimplementing most of Cow and Arc) - the result is not useful, the destructor of it doesn't work.
  
  nyanpasu64 3 years ago
  
  You can't override moving to run a copy constructor, and this is usually a good thing, as much as having one would be convenient for Rc/Arc (where cloning is an incref rather than an actual deep clone).
  
  lijogdfljk 3 years ago
  
  Huh, i figured you could actually just implement `Copy` yourself (ie on the SimpleString). Can you not for this? Now you have me curious hah
  
  kzrdude 3 years ago
  
  You can't if the parts are not Copy. I wrote something hand wavy about reimplementing Cow/Arc but that doesn't really work either - atomics are not Copy, so that's another building block you need for Arc but can't have. Not mentioning the allocating bits because those are the obvious ones. Can't define a destructor for a Copy type, so literally nothing works.
  
  lijogdfljk 3 years ago
  
  TIL. I had just figured you could manually make an unsafe Copy impl or something haha.
- nicoburns 3 years ago
  
  Isn't the SimpleString just String? Most of the complications disappear if you avoid str entirely. The one simplification which I would like to see added is support for string literals that produce a String. (s"foobar" syntax has been proposed).
- codedokode 3 years ago
  
  But Rust is designed to write high-performance code. If you don't care about performace, you don't really need Rust. Swift or Go seem more readable and easier to use.
  
  Santosh83 3 years ago
  
  Maybe, but Rust is also targetting higher levels of the stack like Webassembly, so I wouldn't say Rust is only meant for maximum bare metal performance/safety. Lots of higher level applications are also now being written (or rewritten) in Rust, so a canonical "simple string" crate won't be amiss I feel.
  
  solar-ice 3 years ago
  
  > Rust is also targetting higher levels of the stack like Webassembly
  The /reasonable/ Rust vision of Webassembly is "you write parts of your application in this when they need to be faster than javascript". You really shouldn't be writing your whole web application in Rust, frankly, there's no good reason to. (I write Rust for a living, and I reach for other languages the moment I'm trying to do Normal Web Stuff.)
  And if people are using wasm as a sandbox outside the browser, instead of a reasonable VM with all the garbage collection niceties, they're probably doing it wrong unless there is a reason to require developers to write low-level, performant code without a GC.
  
  howinteresting 3 years ago
  
  Swift is well-designed but is virtually non-existent outside of Apple platforms, so it doesn't have nearly the third-party ecosystem that Rust does. Go has the third-party ecosystem but is poorly designed and doesn't have basic language features like sum types.
  Rust is likely the best combination of thought-out design and ecosystem support that exists in a programming language today.
  
  pjmlp 3 years ago
  
  Rust is also pretty much focused on Linux workloads, mostly.
  Also the Apple ecosystem has plenty of third parties, including commercial libraries.
  
  jeroenhd 3 years ago
  
  Interestingly, Microsoft is also pushing Rust quite hard with special API packages, tutorials, and even some IDE integration. Windows tools are often closed source, though, so you'll probably never notice it if your favourite tool uses Rust or not.
  
  pjmlp 3 years ago
  
  What IDE integration? VSCode plugins aren't done by Microsoft, finally they got a spot on MSDN docs as of last VSCode release.
  Currently there are no plans for proper VS tooling on par with C++ across all Windows development workloads.
  If you mean Rust/WinRT, the new toy from the folks that produced C++/WinRT, after they successfully managed to kill C++/CX and turn the UWP development experience back to the glory days of Visual C++ 6.0 with ATL 3.0 (apparently very dear to their hearts), then it isn't something I look forward to ever use, if C++/WinRT is any indication of their understanding about developer productivity.
  Or do you mean the Azure Sphere OS, with the whole security marketing sales pitch while have a C only SDK available, and only now evaluating if Rust is something they would support in addition to C.
  Working really hard? Doesn't look like it.
  
  mwcampbell 3 years ago
  
  > Rust/WinRT, the new toy
  First, it's now called Rust for Windows (or windows-rs), because the scope has expanded to include (fairly low-level) bindings for all the Win32 and COM APIs in the Windows SDK.
  I strongly object to calling it a toy just because it doesn't have certain RAD conveniences. I'm happily using that crate (though currently only the Win32/COM portion) in the native parts of a Windows app I'm working on [1]. Yes, I have to use a lot of unsafe when working with the Win32 and COM APIs, but I keep my unsafe blocks as small as possible, and try to do as much as possible using safe Rust abstractions like slices and the various string types.
  FWIW, windows-rs recently removed the bindings for the Windows.UI.Xaml namespace, on the grounds that they were nearly unusable, and because the in-box Windows.UI.Xaml is no longer Microsoft's recommendation for app developers. I think the latter policy is a mistake, because some of us actually want to produce something with a minimal memory footprint, using a GUI toolkit that's already in the working set on all Windows machines. But maybe someone can bring back Windows.UI.Xaml bindings in a separate crate.
  > they successfully managed to kill C++/CX
  IMO C++/CX was a mistake. It was an attempt to make C++ into something it was not. From what I've read [2], C++/CX spectacularly failed to uphold C++'s zero-overhead principle, and the code size of newer shell components in Windows 10+ suffered as a result. Yes, I felt some of the frustration of working with the current (or at least, mid-2020) tooling for C++/WinRT when I was mentoring an intern on the Windows accessibility team at Microsoft (which I left later that year). But I think the Windows developer platform team had to do what was right for runtime efficiency. Developers who value rapid development over runtime efficiency can still use C#.
  > the glory days of Visual C++ 6.0 with ATL 3.0
  They were glorious, for code footprint. But like you, I wouldn't want to write new code using ATL (and on the Windows accessibility team at Microsoft, we avoided that as much as practical). Luckily, C++/WinRT is a big improvement over ATL, thanks to modern C++. It's just a step backward from C++/CX for some RAD conveniences that were really a better fit for a language like C#.
  And to bring it back to Rust, I think Rust's macro system allows for some powerful high-level tooling on top of the WinRT bindings. For example, I think we could have a macro-based DSL for succinctly declaring a tree of UI elements, and that syntax could be more integrated with the rest of the language than the markup/code dichotomy that was introduced with WPF. Such tooling could also avoid the trap of boilerplate-heavy code generation that's typical of the Visual Studio wizards, and be available to developers who don't use VS.
  [1]: The non-native part is using Electron. I'd love to write the whole thing in Rust, but Electron in general, and access to Chromium's full WebRTC stack in particular, is just too convenient.
  [2]: https://devblogs.microsoft.com/oldnewthing/20220606-00/?p=10...
  
  pjmlp 3 years ago
  
  A wall of text, ignoring that C# developers do have to use C++/WinRT, as not everything is exposed. I guess WinDev is too busy writing IDLs with notepad like tooling.
  It is a toy when PAYING customers have to deal with that crap tooling versus C++/CX.
  And all the Rust stuff is not "hard at work" as suggested by OP, when it is a small subset of Visual Studio has in the box for .NET languages and C++.
  
  howinteresting 3 years ago
  
  I'm not sure what you're talking about. Rust works great on Windows, unlike Go.
  
  pjmlp 3 years ago
  
  Only if the only thing you care about are CLI applications, same applies to Go.
  
  burntsushi 3 years ago
  
  > Rust is also pretty much focused on Linux workloads, mostly.
  This is blatantly false. Stop spreading misinformation.
  
  pjmlp 3 years ago
  
  Can you please prove me wrong with a nice set of COM components written in Rust?
  Or maybe a well know watchOS or iOS app.
  Proper production stuff, not proof of concept or WIP random GitHub repos.
  
  zRedShift 3 years ago
  
  >iOS
  https://www.reddit.com/r/rust/comments/vj40qq/noumenal_my_3d...
  Does this count?
  And we’re soon shipping Rust components to several hundred thousands of iOS (and Android, but it’s technically Linux) users at my day job, so there’s that.
  As for Windows, I wouldn’t know, that’s your speciality.
  But I’m pretty sure there is quite a sizable amount of VSCode users on Windows, all of whom are running a certain Rust utility written by the person you’re replying to.
  
  pjmlp 3 years ago
  
  A headless library plugged into Electron, basically a CLI.
  I know pretty well to whom I have replied.
  Yes, Noumental does count.
  
  nicoburns 3 years ago
  
  Dropbox ships Rust for core syncing functionality in their main desktop sync client across Windows, macOS and Linux.
  Also, this thread was recently posted about how easy it was to get a Rust script written on linux working on Windows (no work was required at all - it just worked) https://www.reddit.com/r/rust/comments/vl1xpg/short_story_of...
  
  pjmlp 3 years ago
  
  With an Electron based UI...
  
  nicoburns 3 years ago
  
  Well sure, I don't think anyone's claiming that Rust has a good UI story. But that very different to saying it doesn't work well on Windows.
  
  burntsushi 3 years ago
  
  That's what we call moving the goalposts. You clearly have something more specific in mind, so say that, instead of generalizing and making shit up.
  I don't have an exact count, but my guess is that my software is deployed to millions of Windows workstations. That doesn't happen if Rust was "pretty much focused on Linux workloads, mostly."
  
  pjmlp 3 years ago
  
  Nope you asserted, I asked for clarifications.
  
  burntsushi 3 years ago
  
  You asserted with zero qualification: "Rust is also pretty much focused on Linux workloads, mostly."
  Which is, like I said, bullshit. You asked for clarification, and you got some. They don't address your use cases, which is fine and fair to call out, but they very clearly show that there is a focus on Windows, contrary to your very broad claim.
  
  mwcampbell 3 years ago
  
  Isn't Cloudflare using BoringTun in their Warp app? Rust might not yet be a good choice for the whole app, but IMO it's great for cross-platform non-UI code.
  
  steveklabnik 3 years ago
  
  Yes, they are.
  
  pjmlp 3 years ago
  
  Swift is pretty much about performance, as replacement for C, C++ and Objective-C in the Apple ecosystem, it is even on Apple's official sites.
  What Apple isn't willing to do is sacrifice productivity while achieving that goal.
  
  astrange 3 years ago
  
  Making memory management obnoxious by not handling copying for you doesn't really encourage performance. It just encourages people to prematurely optimize things that don't matter.
  
  lijogdfljk 3 years ago
  
  Okay, maybe (i disagree, but w/e) - but the point is for learning.
  You'd have a difficult time learning Rust with Go, hah.
lumost 3 years ago

Rust strings are difficult for others coming from statically typed and low level languages as well.
It’s one of the types programmers will most often encounter, and yet it’s one of the most obtuse topics within rust.
- k__ 3 years ago
  
  I remember strings being "not so easy" in C/C++ too.
  
  oconnor663 3 years ago
  
  I think the big differences are that copying and reference taking are automatic and invisible in C++. So a lot of APIs taking string or string& will "just work" for the beginners, and you can delay the part where you talk about how different those things are.
  This sounds like a minor difference, but I've met lots of developers who do meaningful work in C++ but who don't know what a copy constructor is. I get the impression that there's an enormous difference between being a C++ "user" vs a "library writer", because there's so much automatic stuff happing under the covers.
  Rust tends to have a bit less invisible complexity, I think, but some of that difference is just making the complexity visible (like reference taking), which effectively frontloads it onto beginners. It's a tough tradeoff.
  
  rocqua 3 years ago
  
  I occasionally port my C code to C++ just for more ergonomic strings.
- jokethrowaway 3 years ago
  
  After haskell strings, rust strings actually felt reasonable
  
  eru 3 years ago
  
  Alas, Haskell's built-in strings are horrible. Anything serious would use Data.Text or Data.Bytestring (depending on the situation).
  Unfortunately, most of the language and libraries got written before reasonable strings were available. So you still see lots of things taking linked-list strings.
  
  lumost 3 years ago
  
  Rust could probably benefit from a collections style reintroduction of String and maybe one or two other common data structures. To simplify the language.
  It’s not so much a language problem as a rough implementation of a hyper common data structure.
klabb3 3 years ago

Don't worry. As soon as you explain to them that appending to a PathBuf is O(1) amortized they'll come around, and it will scale much better for all their GB-sized file paths.
I guess this adds a prerequisite on complexity theory but nobody should go anywhere near advanced data structures like strings with less than a bachelor in CS.
xarope 3 years ago

I've not delved into Rust much, but strings in any language are "hard", because underlying the representation is an array/list/heap/vector/whatever, which means thread safety/garbage collection/cache coherency issues, etc.
Just that some languages make it "easy" until the footgun, whereas it seems like Rust presents the problems right at the start?
brundolf 3 years ago

I wrote a blog post that tries to bridge this gap: https://www.brandons.me/blog/why-rust-strings-seem-hard
astonex 3 years ago

I remember trying out some Rust years ago, I stopped after how difficult Strings vs str vs &str was.
I think if I tried again today I could probably grasp it now I have a better understanding of concepts like string views, and encodings.
mlindner 3 years ago

I don't understand people's trouble with Strings. They're not any more difficult than what's in C++ and they have a lot fewer footguns comparatively.
- mytherin 3 years ago
  
  That's just not true. In C++ you can use std::string everywhere and everything will mostly "just work". In Rust there are 5+ string types that are each subtly different, support subtly different operations and cannot be used interchangeably. The standard library will hand you different types of strings depending on which functions you are calling, so you cannot avoid learning about all of these types either.
  Now in many ways the Rust design is nicer for low-level programming, since it allows you to avoid making copies all over the place, and has certain added safety benefits around e.g. unicode validation. But it is certainly way more difficult to use than std::string.
  When I was learning Rust it certainly took me a while to wrap my head around all of these string types and their various limitations - and I have a low-level programming background and understand exactly why all of these string types exist and why these limitations are there. I have to imagine this will be a major road-block to learning Rust for people from e.g. a Python or Javascript background who are not familiar with all these low-level details.
  
  afdbcreid 3 years ago
  
  It is more difficult for beginners. Experienced C++ programmers also manage their strings carefully. Rust just makes it explicit, which is a trade off it chooses in lots of cases (the most famous being ownership). It is a trade-off: it makes it harder for beginners but easier to use correctly.
agumonkey 3 years ago

rust has one uphill battle in the mainstream adoption is that a lot of things make sense if you wrote bare metal code. If not then it can be very confusing.
Ericson2314 3 years ago

It's not good for beginners to get weird errors for file paths which aren't valid Unicode.
- mlindner 3 years ago
  
  Since when would a beginner be hitting file paths that aren't valid Unicode? The only systems that aren't Unicode are old legacy systems. (Or is even now Windows still broken?)
  
  burntsushi 3 years ago
  
  Neither Linux nor Windows require their file paths to be in any particular encoding. (With some restrictions, like no interior NUL bytes, among others.)

codedokode 3 years ago

Is there official documentation about what `str` (without an ampersand) is? For example, documentation [1] says that `str` is a "string slice" (without explaining what "string slice" mean), and then goes on with description of &str.

And a book on Rust [2] says:

> A string slice is a reference to part of a String

This seems wrong, because &str can reference static strings which are not String. And if str, or "string slice" is a "reference", then &str is a reference to a reference?

And later:

> The type that signifies “string slice” is written as &str

But the documentation said that "string slice" is str, not &str.

Also, I wonder, what do square brackets mean when they are used without an ampersand (as s[0..2] instead of &s[0..2])?

Also, is an ampersand in &str the same as an ampersand in &u8 (meaning an immutable reference to u8) or does it have other meaning?

[1] https://doc.rust-lang.org/std/primitive.str.html

[2] https://doc.rust-lang.org/book/ch04-03-slices.html#string-sl...

LegionMammal978 3 years ago

> Is there official documentation about what `str` (without an ampersand) is? For example, documentation [1] says that `str` is a "string slice" (without explaining what "string slice" mean), and then goes on with description of &str.
A `str` is really just a `[u8]` with extra semantics. Thus, a `&str` is really a `&[u8]`, a `&mut str` is a `&mut [u8]`, a `Box<str>` is a `Box<[u8]>`, etc. So we call it a "string slice", since it mostly acts like a regular `[T]` slice.
In general, the term "slice" can either refer to the unsized type `[T]` or the reference `&[T]`/`&mut [T]` interchangeably. You could also call the latter a "slice reference" where the distinction is important; e.g., a `Box<[T]>` would be a "boxed slice", while `Box<&[T]>` would be a "boxed slice reference" or "boxed reference to a slice". But most of the time, the correct meaning can be inferred from context.
> Also, I wonder, what do square brackets mean when they are used without an ampersand (as s[0..2] instead of &s[0..2])?
`s[0..2]` is a place expression that refers to the raw `str` subslice. But since `str` is an unsized type [0], it cannot appear on its own; it must appear behind some reference type. Thus, `&s[0..2]` creates a `&str`, and `&mut s[0..2]` creates a `&mut str`. However, the ampersand isn't always necessary: you can write `s[0..2].to_owned()` to use the `str` as a method receiver, which implicitly creates a reference.
[0] https://doc.rust-lang.org/book/ch19-04-advanced-types.html#d...
ruuda 3 years ago

The & in &str is like the & in &[u8], str is like [u8] (an unsized type), not like u8. A &str is a "fat pointer" (pointer + length), unlike &u8 which is a regular "thin" pointer.
avgcorrection 3 years ago
I don’t have a full explanation of it. But you always use `&str` because you always want to borrow (or lend out?) the string slice.
The doc says:
```
    A &str is made up of two components: a pointer to some bytes, and a length.
```
This must mean that `str` is the string proper. But you can’t use it like `String` since `str` is not owned. So passing `str` to a function doesn’t make sense. Another difference is that `str` has no capacity since it’s a slice and not a vector (seems like `String` is implemented as `Vec[u8`).
I’ve always found it weird that if you make a short word for “String” and use a lower-case letter then it becomes the borrowed version.
- steveklabnik 3 years ago
  
  > I’ve always found it weird that if you make a short word for “String” and use a lower-case letter then it becomes the borrowed version.
  The reason for this is that String is a standard library type, but str is a primitive type in the language. So they follow those respective conventions.

sirwhinesalot 3 years ago

It's unfortunate that strings are badly named in rust. They got that better with Path and PathBuf.

str is fixed size, like a Java String

String is growable, like a Java StringBuilder

After that, we get into memory ownership, with &str not owning memory, and Box<str> owning memory, but you rarely need the latter, so it's really &str vs String that you need to care about.

EDIT: changed immutable to fixed and mutable to growable to better reflect the real difference, though typically you almost always use immutable &str and &mut String. I thank the commenters below for pointing it out, I don't want to make the problem even more confusing than it already is.

_0w8t 3 years ago

String in Rust is very similar to std::string in C++, while str is std::string_view except it is safe to use.
StringBuffer in Java is not like String in Rust. In particular, one cannot pass StringBuffer in Java to a function taking String, while both Rust and C++ allow to implicitly convert the string backed by a heap into the corresponding read-only view.
- sirwhinesalot 3 years ago
  
  Strings in Java own their memory, they aren't views, they're closer to Box<str>. That's why you can't implicitly convert a StringBuilder into one.
  I know this, I'm not the one you need to explain it too, it's Rust newbies. So many problems would have been avoided with Str/StrBuf or StrView/Str, but now the ship has sailed.
  
  rrobukef 3 years ago
  
  String in Java share their memory with other substrings of the same allocation. They are views.
  
  cesarb 3 years ago
  
  IIRC, that used to be the case, but recent Java releases changed it so that memory is no longer shared with substrings. The former behavior could cause some extreme memory leaks (unless you were very careful to always manually duplicate each substring); a one-character substring could keep a multi-megabyte memory allocation alive. See for instance https://stackoverflow.com/questions/33893655/string-substrin... which discusses this issue.
  
  masklinn 3 years ago
  
  That was changed years ago (the slicing optimisation was removed in Java 7 or something) because it caused too many memory leaks.
  It was also a (bad) optimisation, it was never part of the langage model or semantics.
- kevincox 3 years ago
  
  But why is it &str not StringView? The main reason I see is that the lifetimes are more natural as &'a str instead of StringView<'a>.
  I don't think this inconsistency is worth it. It is really just a hack that reinterpretes the (vtable, ptr) of an unsized type as (len, ptr). The only other type that does this are slices.
  Or maybe the real reason is that Rust can't be generic over lifetimes, so you would need StringView and StringViewMut. But even that it doesn't seem like a huge deal to me for a simpler "language" (my understanding is that these hacks are actually done in std, but it's been a while since I looked at the code).
- MrBuddyCasino 3 years ago
  
  They added CharSequence interface to address this, both StringBuilder and String implement it. Its not widely used though.
Blikkentrekker 3 years ago

I find that this explanation does not do justice
The important part is that `str` is a dynamically sized type as it's called. What it is is simply a region of memory, of any size, containing UTF8. Since it is dynamically sized various constraints are placed onto it which in practice come down to that it can only really be passed around at runtime by being behind a pointer and is hard to directly put on the stack.
`String` is three words, two words are æquivalent to a “fat pointer” to a `str`, as in one word for the address, and the other for the size, which is how Rust deals with dynamically sized types in general, and the third word denotes the capacity of memory allocated to the `String` which it uses to know when to reallocate.
`str` is neither mutable nor immutable which isn't part of it's type, `&str` is immutable, and `&mut str` is mutable. It's perfectly possible in Rust to mutate a `str` if one obtains a mutable, or perhaps better called exclusive reference to it somehow, but the mutations that can be performed are very limited since the size cannot easily grow.
This is where `String` comes in, which guarantees that the space after the `str` pointed to it, the size of it's “capacity” third word is not used by anything else, and thus it can grow more easily by manipulations.
There are some limited mutation methods on `&mut str` in Rust, such as `make_ascii_uppercase`, which converts all lowercase ascii letters to uppercase, which is perfectly fine, since this operation is guaranteed to not ever increase the size of the `str`, but with unicode such a guarantee no longer applies and one needs a `String`.
That being said, yes, I would have favored for `String` to be called `StrBuf`, and `Vec` `SliceBuf` instead.
- sirwhinesalot 3 years ago
  
  Sure, if you want to be truly specific about it and not do a Java analogy ;)
Arnavion 3 years ago

String used to be StrBuf first. The rename to String was intentional because String was the more commonly known name in other languages.
https://rust-lang.github.io/rfcs/0060-rename-strbuf.html
- sirwhinesalot 3 years ago
  
  Unfortunately, judging by the fact so many people are still confused about it, it was a mistake. Having a shorthand for something (str) and that thing (String) be different things was dumb, and someone brought that up in the discussion at the time but I guess hindsight is 20/20.
  C++ has std::string and std::string_view which makes a loads more sense.
  Java and C# have StringBuilder and String.
  Go has strings.Builder and string.
  Objective-C/Cocoa has and NSMutableString and NSString.
  ADA has Unbounded_String, Bounded_String and Fixed_String for different use cases.
  Rust has by far the worst naming.
  
  oconnor663 3 years ago
  
  It has downsides, but it also has upsides. Calling it "String" doesn't tell you much about what exactly it does or how it relates to other types, but it does tell you "hey if you want to read and write text then you need to learn about this type". If it was called StrBuf or something, I might worry about new learners seeing that name and assuming it was some advanced concept they don't need yet.
  
  kzrdude 3 years ago
  
  I guess C++ has the best names after all, Rust should have emulated those (except it couldn't - string_view came after Rust and maybe even was inspired by Rust.)
  
  cmrdporcupine 3 years ago
  
  Chromium's C++ StringPiece dates back to at least 2012, and pretty sure Google had something similar (I forget the original name) to it in Google3's C++ base library which became abseil's string_view before that even.
  And Boost had string_ref at least as far back as 2013.
  https://chromium.googlesource.com/chromium/src/base/+/master...
  https://github.com/abseil/abseil-cpp/blob/master/absl/string...
  https://github.com/steinwurf/boost/commits/master/boost/util...
  All of this is arguably before Rust's influence would have been anything to speak of.
  
  berkut 3 years ago
  
  Google had StringPiece and LLVM had StringRef in 2012 and provided similar functionality.
  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n34...
  No mention of Rust there I can see.
  
  astrange 3 years ago
  
  > Objective-C/Cocoa has NSMutableString and NSString.
  This (or value typed strings) is an important optimization that most other systems don't properly have. If you know the string is immutable, so it won't ever have to be resized, you can do a single allocation (string+buffer) instead of having an extra pointer dereference (string->buffer).
  
  afdbcreid 3 years ago
  
  I don't know Objective-C/Cocoa, but is it when you know the size at compile time? If yes, you can use an array.
  
  ratmice 3 years ago
  
  No it is when you know the size during runtime, or at time of construction.
  there is a 3rd class for constant strings known at compile time but it is mostly hidden behind @"" compiler syntax.
  
  astrange 3 years ago
  
  NS(Mutable)Strings aren’t “classes” but “class clusters”. Making them classes would be an inappropriate mix of interface and implementation. Of course, despite it being a bad idea, almost all other data structure library does do it that way.
  Constant strings have the same memory layout as other strings more-or-less but have different layouts for ASCII and Unicode, and in memory immutable strings have different backing classes for tagged pointer storage and for UNIX paths.
  
  astrange 3 years ago
  
  “At compile time” is too strict; that’s a constant string. And an array isn’t a string, it’s an array.
  (A string is only a byte array because everything is a byte array.)
- howinteresting 3 years ago
  
  This was a mistake. Having str and StrBuf would have been significantly less confusing than str and String.
  
  steveklabnik 3 years ago
  
  I often joke that this is the only change I'd desire for a Rust 2.0.
  
  OJFord 3 years ago
  
  What about aliasing it, marking String as deprecated in docs, 'please use StrBuf'? (Clippy warning, etc.)
  
  steveklabnik 3 years ago
  
  In theory you could do something like this, but it would be a lot of churn for a questionable amount of gain. I probably wouldn't support it today; Rust is past being able to make these sorts of changes imho.
  
  howinteresting 3 years ago
  
  Could it be done as part of a new edition, with cargo fix to support it?
  
  steveklabnik 3 years ago
  
  Standard library can't vary per edition, generally speaking. Maybe there'd be a way to get around it in this case, I haven't thought much about it, honestly.
- lifthrasiir 3 years ago
  
  Note that this is a very old RFC and doesn't have much context and discussion compared to later RFCs. It is worthwhile to read the actual discussion happened [1].
  [1] https://github.com/rust-lang/rfcs/pull/60
marcosdumay 3 years ago

> but you rarely need the latter
AFAIK, it's because people go with String when what they actually mean is Box<str>. Since they have similar costs, nobody ever sees the need to change it, and the String type does have a much better name.
But the need is there all the time. People just satisfy it differently.
- sirwhinesalot 3 years ago
  
  I think it's mainly because unlike Java, where a StringBuilder is effectively an optimisation over concatenating Strings, in Rust managing that memory would be a total pain, so you tend to keep the mutable thing around.
  Once that happens, Box<str> becomes kinda unnecessary. There are many cases where it would be the correct type, for example reading from a file in a read-only manner, but most of the time you're going to be doing something to that text, so it makes more sense to just load it up as a String already and avoid the unnecessary copy.
  Either way, it's mostly a naming problem. &str/String sucks :(
- afdbcreid 3 years ago
  
  It's not just the naming: I prefer String if I don't care about the data structure size, since converting a String to Box<str> can be costly (allocation and copy if the capacity isn't exact) and carrying a capacity is free pretty much all of the time.
aliceryhl 3 years ago

The difference has to do with ownership, and it has nothing to do with mutability. For both types, you can mutate them given a mutable refence, and you can't given an immutable reference.
For an example, an `&mut str` can be modified via various methods such as make_ascii_uppercase.
- sirwhinesalot 3 years ago
  
  Nope, not ownership either, Box<str> and String both own their memory, the different is fixed size vs growable :)
  But you're right, I edited my post to reflect this, the Java analogy is pretty strained as it is.
  
  Macha 3 years ago
  
  I believe the parent poster was comparing &str and String, not Box<str> and String.
  
  sirwhinesalot 3 years ago
  
  Yeah in my original post (before the edit) I made a similar simplification with immutable vs mutable (since that's how they are commonly used), but the correct distinction is str vs String. You can have &str and &String and you can have &mut str and &mut String, and Box<str> and String. All cases exist.
nicoburns 3 years ago

Personally I'd prefer String/StringView (and potentially Path and PathView), but I guess that ship has sailed.
- bmacho 3 years ago
  
  Why StringView? What is that supposed to mean?
  
  nicoburns 3 years ago
  
  It’s a view into a string. It’s relationship to a string would be similar to a database view’s relationship to a table.
  
  Thiez 3 years ago
  
  That seems like a lot of extra typing for the rest of our lives to prevent a one-time moment of confusion for newbies.

umanwizard 3 years ago

This might clarify the situation, for C or C++ folks:

    // heap-allocated, fixed-size
    struct BoxStr {
        unsigned length;
        // INVARIANT: this points to a heap allocation of length bytes, and is valid utf8
        unsigned char *data;
    }

    // heap-allocated, resizable
    struct String {
        unsigned length;
        unsigned capacity;
        // INVARIANT: heap allocation of capacity bytes, the first length of which are valid utf8
        unsigned char *data;
    }

Of course you could resize BoxStr, but only by reallocating `data` to the exact desired length every time, which will kill your asymptotic complexity.

tylerhou 3 years ago
Is your first example really equivalent to Box<str>? I would have expected something like
```
    using BoxStr = std::unique_ptr<Str>;
```
where Str is defined as
```
    struct Str {
      size_t len;
      char data[];
    };
```
The difference is that the len is stored on the heap, and the data is stored inline with the length. Unfortunately C++ does not support flexible array members so this syntax is not actually valid.
Edit: Never mind, after reading the article Rust does use the above representation because Box holds a “fat” pointer to str, which stores it’s length on the stack. So BoxStr is the correct equivalent, because &[u8] is not equivalent to u8*, it’s equivalent to std::span<u8>.
- steveklabnik 3 years ago
  
  Your parent is correct, the length is stored alongside the pointer, not on the heap with its data. This is true for any "dynamically sized type," not just Box<str>. &str is also a (pointer, length) pair, for example.

jez 3 years ago

Do any of the string types in the Rust standard library implement the same sort of small string optimization that C++ libraries implement for std::string? (explained here[1])

Some quick searching turned up a few rust-lang internals posts and GitHub issues, but it was hard to see whether anything came of them.

I understand that it’s probably possible to implement a comparable String API in a crate that uses small string optimizations, but being able to avoid a dedicated crate makes interoperability with other libraries much easier.

[1] https://tc-imba.github.io/posts/cpp-sso/

steveklabnik 3 years ago

Rust's standard library strings cannot because of a specific API, as_mut_vec, which is incompatible with the internal representation necessary to do SSO.
- mwcampbell 3 years ago
  
  Do you think including this API was a mistake?
  
  steveklabnik 3 years ago
  
  No. There’s a discussion below about SSO. I generally agree with it. It’s a trade off, not a universal improvement. Keeping things basic for the basic case is a good thing.
  
  mwcampbell 3 years ago
  
  Yet the standard library's HashMap and BTreeMap are very sophisticated. IMO, the more optimizations are built into the standard library, the more effectively Rust can achieve its goal of empowering everyone to build reliable and efficient software.
  The as_mut_vec function doesn't even really prevent Rust's standard String type from implementing SSO. That function, which is already unsafe, could just come with a warning in the documentation that calling it will force the string into its heap-allocated form.
  But no, I don't care enough about this to do the work myself, so I'll say no more.
  
  steveklabnik 3 years ago
  
  > Yet the standard library's HashMap and BTreeMap are very sophisticated.
  "Sophistication" isn't really the argument I'm making, though I can see how you may think that. I should have said something more like "when a tradeoff isn't clear, you shouldn't make it, by default, for everyone." Because that's the issue here: it's not clear that SSO is universally an advantage. So forcing that on everyone isn't the right way to go, in my opinion. Reasonable people may differ, of course :)
  (It's also because this wouldn't end up enforcing SSO on String, but on all Vecs. While SSO is often useful for strings, it may not be for every vector that exists, so you'd also have to take that into consideration.)
  > could just come with a warning in the documentation that calling it will force the string into its heap-allocated form.
  If you made that change, it would then conflict with the name of the method: as_ is defined as a free conversion.
  
  mwcampbell 3 years ago
  
  > If you made that change, it would then conflict with the name of the method: as_ is defined as a free conversion.
  Ah, I wasn't aware of that convention. I'm sure that means I failed to do adequate research before jumping into this discussion.
  
  steveklabnik 3 years ago
  
  It's all good :) I think that may be a bit harsh on yourself. If you're curious: https://rust-lang.github.io/api-guidelines/naming.html#ad-ho...
aaaaaaaaaaab 3 years ago

https://github.com/rust-lang/rust/issues/20198
24bytes 3 years ago

https://github.com/ParkMyCar/compact_str
https://old.reddit.com/r/rust/comments/t33hxp/announcing_com...
edflsafoiewq 3 years ago

Not in std, no.

the__alchemist 3 years ago

I'm working on a PC-based configuration for a drone flight controller. PC-side is std Rust with a stack available. Firmware is `no-std`, running on a microcontroller. It has waypoints you can program when connected to a PC using USB. They have names that need to be represented as some sort of string.

I'm using `u8` arrays for the strings on both sides; seems the easiest to serialize, and Rust has `str::from_utf8` etc to handle conversion to/from the UI.

`String` is unsupported on the MCU side since there's no allocation. I find this low-level approach ergonomic given it's easy to [de]serialize over USB.

dochtman 3 years ago

The tl;dr doesn't quite make sense to me. To me the core difference is that a Box<str> takes one less word on the stack, because by virtue of the str being immutable it doesn't need to track the capacity of the allocation as distinct from the length. This is analogous to Box<[u8]> vs Vec<u8> (and in fact those are the same data types except for the guarantee of valid UTF-8).

tines 3 years ago

C++ programmer here: which one guarantees valid utf8, and why would a primitive container make guarantees about the values it's storing?
- pornel 3 years ago
  
  The guarantee exists to speed up UTF-8 processing, so that it can safely assume working with whole codepoints/sequences (without extra out of bounds checks for every byte) and to ensure you can always losslessly roundtrip every string to and from other Unicode encodings without introducing any special notion of a broken character. There's also a security angle in this: text-processing algorithms may have different strategies for recovering from broken UTF-8, which could be exploited to fool parsers (e.g. if a 4-byte UTF-8 sequence has only 3 bytes matching, do you advance by 3 or 4 bytes?).
  Having the "valid UTF-8" state being part of the type system means it needs to be checked only once when the instance is created (which can be compile-time for constants), and doesn't have to be re-checked later, even if the string is mutated. Unlike a generic bag of bytes, the pubic interface on string won't allow making it invalid UTF-8.
- Animats 3 years ago
  
  "str" and "String" guarantee UTF-8. To make a String from an array of bytes, call
  pub fn from_utf8(vec: Vec<u8, Global>) -> Result<String, FromUtf8Error>
  which consumes the input Vec and returns it unmodified, if it's valid UTF-8,, or reports an error, if it's not. There are a number of related functions in this family. Such as
  pub fn from_utf8_lossy(v: &[u8]) -> Cow<'_, str>
  which takes in a slice of bytes and checks if it's a UTF-8 string. If it is, it returns the original str. Otherwise it makes a copy with any errors replaced with the Unicode error character.
  Vec<u8> and array slices such as &[u8] are primitive containers - they can store any sequence of u8 values. String is more like an object with access methods.
- ntoskrnl 3 years ago
  
  > why would a primitive container make guarantees about the values it's storing
  If you know you have valid UTF-8, you can safely skip bounds checks when decoding a codepoint that spans multiple bytes.
- lifthrasiir 3 years ago
  
  Everything labelled as "string" is a valid UTF-8 string in Rust, and to my knowledge this decision was made very early in the history of Rust (before 0.1). Many "modern" languages (including modern enough C++) have a distinction between Unicode strings and byte strings however they are called and Rust just followed the suit.
  
  M2Ys4U 3 years ago
  
  OsStr and OsString aren't necessarily UTF-8, the data for those are "in the operating system’s preferred representation".
  str/String and CStr/CString are defined as UTF-8, though.
  
  puffoflogic 3 years ago
  
  > in the operating system’s preferred representation
  This is unfortunately misleading and a common misconception about OsStr. The documentation now explains:
  > OsStr losslessly represents a borrowed reference to a platform string. However, this representation is not necessarily in a form native to the platform
  What this means is that valid sequences of Unicode scalars are encoded as utf8 in OsStr on both Windows and Linux. The difference between OsStr and str is that the former can round-trip with the native encoding; that means that for Windows, there's a special way it encodes unpaired surrogates (wtf8) and on Linux it's actually just an arbitrary byte sequence.
  This choice of representation means that on every platform: you can always borrow an OsStr from a str (as_ref) and you can sometimes borrow a str from an OsStr (to_str). This cross-borrowing wouldn't be possible if OsStr were UCS-2 on Windows.
- afdbcreid 3 years ago
  
  Note that if you don't want this guarantee, you can use [u8] and Vec<u8> (and Box<[u8]>). This does not support all string methods unfortunately, but there's work to do that.
tialaramex 3 years ago

One notable difference is that ToOwned for &str gives you a String, whereas ToOwned for &[u8] gives you a [u8] by cloning the slice you have.
In fact all four standard library types that are ToOwned without invoking Clone are more or less strings (str, CStr, OsStr, Path)
- afdbcreid 3 years ago
  
  What? No: https://doc.rust-lang.org/1.61.0/src/alloc/slice.rs.html#854.... ToOwned for [T] gives you Vec<T>.
  
  tialaramex 3 years ago
  
  Huh. TIL
  Actually now that I think harder about this, WTF did I think was really going on here before. If we clone the slice to make an array, where does the array go? We don't know how big that array is, so we can't put it on the stack.
  Yeah, that was crazy. Thanks for pointing it out.
- kzrdude 3 years ago
  
  Perspective on ToOwned: It's just Clone with an extension to a number of DST types that can't be Clone themselves. They are dynamically sized types, hence no surprise that they are string-like.

OJFord 3 years ago

If OP is here, then in this listing:

    let boxed_str: Box<str> = "hello".into();
    println!("size of boxed_str on stack: {}", std::mem::size_of_val(&boxed_str));

    let s = String::from("hello!");
    println!("size of string on stack: {}", std::mem::size_of_val(&s));

I know it's not the point and doesn't make a difference, but you might want to make the two 'strings' the same (not with & without '!'), just to be clearer.

jollybean 3 years ago

I really loathe to read this.

I love the borrow checker, but I despise arbitrary complexity.

I feel that Rust is far more complicated than it needs to be, and that this is our 'first attempt' with borrowing and lifetimes, and that there is an 'easier way'.

Engineers love to solve problems, if we can solve the 'life time' issue, we 'feel' very accomplished. And I mean literally a 'feeling'.

We are not bound to business and customer success in the same way, and so we tend to value one issue over the other due to our instinct.

I'm still skeptical about the extra time being worth the added cost of development in so many cases.

hota_mazi 3 years ago

Rust just opens up the kimono and exposes bare metal details about strings.

A string with quotes goes into the executable.

A `String` exists at runtime and is an entire string.

An `&str` is a slice of a string: it points to an existing string for a certain number of characters.

Not all programs or programmers require this amount of knowledge, but if you want to understand programming at a deeper level, Rust is a great tool to dive into the deep end to understand complex topics about programming languages.

FullyFunctional 3 years ago

This is missing a conversation about https://lib.rs/crates/compact_str (and a few alternatives like it). TL;DR: String takes the space of three pointers, that is, 24 bytes on 64-bit archs. compact_str fits up to 24 byte strings in the same space and reverts to String for longer strings.

ADD: that is, avoids heap allocation for those, unlike both Box<str> and String.

tialaramex 3 years ago

Box<str> is still going to be smaller if you know how big the text is because (unlike CompactString and String) it doesn't need to carry a capacity value. In exchange of course you can't append things to it (without re-allocating)
CompactString is a very clever† SSO implementation, and I'll remember it is there if I run into a situation where it might help but I firmly agree with Rust's choice not to implement the SSO optimisation in the standard library's String type.
† Storing 23 UTF-8 codepoints as one of several representations in a 24 byte data structure makes sense, you can see how to write a fairly safe SSO optimisation for Rust which does that, but the CompactString scheme relies on the fact Rust's strings are by definition UTF-8 encoded to squeeze the discriminant into the same space as the last possible byte of an actual UTF-8 string, so it can store a 24 byte value like "ABCDEFGHIJKLMNOPQRSTUVWX" inline despite also distinguishing the case where it needs a heap pointer for larger strings. That's very clever.
- rtfeldman 3 years ago
  
  > I firmly agree with Rust's choice not to implement the SSO optimisation in the standard library's String type.
  Out of curiosity, why is that?
  I don't know much about how or why that decision was made, but I'm curious.
  
  lifthrasiir 3 years ago
  
  SSO means that pretty every string operation has multiple code paths, which can be highly unpredictable. Basically it is a trade-off between memory usage and performance, and the standard library is not really a good place to make that trade-off. By comparison many C++ codes (still) copy strings all over the place for no good reason, so SSO in the standard library has a much greater appeal.
pornel 3 years ago

A nice thing is that all string types have &str as the lowest common denominator, so even if you use SSO or on-stack or any other fancy string type, it's automatically compatible with almost everything.

Thorentis 3 years ago

This type of complexity is a huge downside for a language. A modern language, designed with all the prior knowledge we now have about language design, should not be this complicated for something as essential as a String.

For this reason, I do not like Rust. For similar reasons, I do not like Go. And I'm extremely disappointed that despite being developed so recently, they make these same mistakes.

I honestly think Python is one of the best languages (note I am strictly talking about the language, let's put aside performance). If there was work done to make it compilable I would be very happy. But even then, for my own uses this isn't an issue. I just wish there was something as fantastic in the system language space.

jmillikin 3 years ago

The complexity is inherent to the problem domain. Rust has many string types because the word "string" has been overloaded to mean any of dozens of different internal representations, each of which has different performance or correctness tradeoffs. The programmer's ability to optimize would be sharply limited if they couldn't choose between String, Box<str>, CString, OsString, and so on.
That's why, in my opinion, any attempt to create a "simple" systems language is doomed to failure. You can't have the simplicity of Python and also be able to manually assign values to specific registers. It's better to carve out niches in the design space: low-level (C / C++ / Rust), general-purpose (Java / C# / Go), scripting (Python / Ruby / Lua).
saghm 3 years ago

To be fair, Python had to make a _ton_ of breaking changes to the way their strings worked that arguably are still reverberating in the industry (python2 is still not fully gone, although it's finally starting to fade away).
I'm also not convinced that it's possible to differentiate the performance aspects from the way that Rust does strings; unless you're willing to just sacrifice a decent amount of performance by default, I'm not sure there's any way to avoid multiple string types. I don't think that the Rust string API is any sort of platonic ideal for a systems language; I just haven't really seen any alternatives to needing both a mutable, heap-modified string type alongside an immutable reference string type that doesn't require allocation to create. C++ has `std::string_view` alongside `std::string`, Java has both `String` and `StringBuffer`; I think the pain in Rust has a lot more to do with the ambiguity when just saying the word "string" out loud and some confusion about how `str` essentially requires using some sort of reference, making `&String` a weird type that 99% of the time you ignore and the other 1% you need to work around due to some type inference in a closure or something.
unclad5968 3 years ago

What's the alternative? If there was just one string type, "str", then you would have to implement your own "String" anyway for any strings you intend to manipulate.
Computers are complex. Sometimes you need a heap allocated vector of bytes, and sometimes you need a heap allocated array of bytes.
- SAI_Peregrinus 3 years ago
  
  And sometimes you need to interact with an OS that uses ASCII for strings, and sometimes one that uses UTF-8, and sometimes one that uses UCS-2, and maybe (if you've been very bad and need to use an IBM Z) to one that uses EBCDIC or something even worse.
  C defines a string as just a bunch of bytes that represent some text (no embedded NULLs).
  Rust defines a string (in general) as just a bunch of bytes that represent some text in some encoding specified by the specific string type. So even without the stack vs heap distinction you'd still have tons of string types, because there are tons of string encodings.
Matl 3 years ago

> For similar reasons, I do not like Go.
Go is a GC'ed language that has a single string type btw.

n8henrie 3 years ago

Many thanks to OP for some real-life examples of using rust-lldb! I keep wanting to learn to use it, and I think this was really helpful.

sampo 3 years ago

Title is: What is Box<str> and how is it different from String in Rust?

dang 3 years ago

Fixed now. Thanks!