To do this fast path in about 1μs you could do this stuff:
- Remove the tries. If the detection must work exactly as it does now, you can store a small array of the prefixes and suffixes and just compare branchlessly against all of them. A comparison looks like a vector load, then movemask(cmpeq(prefix, needle) | cmpeq(prefix, all_zeros)) == scalar_all_ones in the case where your needle fits into a vector register (almost all words). edit: you can aggregate consecutive results of individual compares using cmov with the condition being this_mask == scalar_all_ones, but a few branches on this would also be fine. This requires that the prefixes be sorted in descending order by length, so that the first match you find will be the longest.
- Don't heap allocate strings several times during the query.
- Use a cheaper hash than siphash. A reasonable hash for short strings is 2 rounds of AES. ahash is an off-the-shelf hash which will do something like this.
- Use a hash table or hasher interface that lets you separate the hashing from the hash table lookups.
- Use a hash table that lets you prefetch the slot for a given hash. Compute your 3 hashes, prefetch the appropriate slots, then query the strings. If you get the chance to process more than 1 needle at a time, you can hide more latency here - the optimal stride for prefetching ahead of your queries into a large hash table is significantly more than 3.
- stuff about compound words omitted for brevity, but the approach is similar to the above.
The document splitting for 38M hacker news posts also seems like it should complete in a minute or something. Not sure what's going on there.
The important number that changed here was the 30ms to 300μs for correctly spelled queries. 30ms for typos wasn't something that we considered a problem and improving to ~5ms was really a side benefit of fixing things for the correct spellings.
Additional latency for actual typos seems ok, but degradation for correctly spelled queries felt really bad. The thing that was really damaging to the UX for us was when latency was spiking for search queries where it didn't need to.
It's not that weird considering LazyLock has only been in stable Rust since 1.80.0, which is only ~6 weeks old!
It's a great tip though. As an out of the loop/casual Rust programmer who has internalized using `lazy_static!` for this purpose, I probably never would have noticed this update short of lazy_static throwing compiler warnings directing me to use the std lib equivalents (so...maybe never?).
To do this fast path in about 1μs you could do this stuff:
- Remove the tries. If the detection must work exactly as it does now, you can store a small array of the prefixes and suffixes and just compare branchlessly against all of them. A comparison looks like a vector load, then movemask(cmpeq(prefix, needle) | cmpeq(prefix, all_zeros)) == scalar_all_ones in the case where your needle fits into a vector register (almost all words). edit: you can aggregate consecutive results of individual compares using cmov with the condition being this_mask == scalar_all_ones, but a few branches on this would also be fine. This requires that the prefixes be sorted in descending order by length, so that the first match you find will be the longest.
- Don't heap allocate strings several times during the query.
- Use a cheaper hash than siphash. A reasonable hash for short strings is 2 rounds of AES. ahash is an off-the-shelf hash which will do something like this.
- Use a hash table or hasher interface that lets you separate the hashing from the hash table lookups.
- Use a hash table that lets you prefetch the slot for a given hash. Compute your 3 hashes, prefetch the appropriate slots, then query the strings. If you get the chance to process more than 1 needle at a time, you can hide more latency here - the optimal stride for prefetching ahead of your queries into a large hash table is significantly more than 3.
- stuff about compound words omitted for brevity, but the approach is similar to the above.
The document splitting for 38M hacker news posts also seems like it should complete in a minute or something. Not sure what's going on there.
Title:
> 300μs typo correction for 1.3M words
Article:
> 300μs for correctly spelled queries and ~5ms/word for misspellings
Aren't these contradictory?
The important number that changed here was the 30ms to 300μs for correctly spelled queries. 30ms for typos wasn't something that we considered a problem and improving to ~5ms was really a side benefit of fixing things for the correct spellings.
Additional latency for actual typos seems ok, but degradation for correctly spelled queries felt really bad. The thing that was really damaging to the UX for us was when latency was spiking for search queries where it didn't need to.
Sure, but why does the title claim 300μs typo correction?
That's a good point, so we did a s/correct/detect/ on the title above. Thanks!
The title says 300μs typo detection.
How? If the query doesn't finish in 300μs, return error :)
Agreed. Confirming correct spellings is the basecase of the system and what the title is referring to, but that's not clear.
Also, 300µs _per word_, looking up a word in a corpus of 1.3M words. Not 300µs for correcting 1.3M words!
It's a little odd that they're using `lazy_static` now that OnceCell and LazyLock are in the standard library.
It's not that weird considering LazyLock has only been in stable Rust since 1.80.0, which is only ~6 weeks old!
It's a great tip though. As an out of the loop/casual Rust programmer who has internalized using `lazy_static!` for this purpose, I probably never would have noticed this update short of lazy_static throwing compiler warnings directing me to use the std lib equivalents (so...maybe never?).
Tools like GPT and Copilot still generate lazy_static a lot since it's the majority of the training data
Yeah... We might still be kinda rust noobs lol. Should clean that up.