I do something like this in Kyudosudoku[0] to store the undo history in localStorage, but I took it a few steps further than this author as I was pursuing different goals:
* I don't use bit or byte boundaries. If I need to store a field with 11 possible values, I go (n*11) + value. You'd think that would be a performance bottleneck, but it turns out major browsers are hella fast at this, at least when handling the amount of data in one Kyudosudoku undo item, which isn't a lot. The obvious downside is that you can't index into the data, you can only decode the whole thing, but I always want to decode a whole undo item at a time anyway.
* The main goal was for the representation to be as compact as possible so as to fill up as little of the user's localStorage as possible. Using hexadecimal for this is a massive waste. So I try to use the entire range of Unicode characters that are 16 bits in UTF-16 (that's the BMP minus surrogates, but to be safe I avoid the C0 control characters too). (I reserve the space character so that I can concatenate the entire undo history into a single string and still split it apart later without having to decode all of it.) This means I essentially encode the BigInt in base-63453. Again, dividing and moduloing huge numbers by 63453 seems like a performance nightmare, but it looks like modern browsers handle it fine, as the undo/redo feature is perfectly usable.
Clever, but it seems like you're still encoding states separately. For an undo/redo stack, you can probably do much better spacewise with some sort of delta-encoding, either by explicitly storing the changes made to the grid (or more accurately, the inverse of changes, so you can undo them) or by comparing the new state with the last one whenever saveUndo() is called and storing only the difference.
That is true, and I did think about that, but the current implementation is more than adequate for my years of playing Kyudosudoku, so I considered it not worth the implementation effort.
I'd be skeptical of the performance of any getBits() implementation. With the traditional approach ((X >> A) & B), you'll have to redundantly bit-shift all the data above the field you're aiming for. The other option would be to mask prior to the shift, but then you have a redundant copy of the data below the field.
Also, there's no great story for getting a large byte blob into or out of a BigInt, short of converting it all to ASCII hex and back. I ran into that limitation when attempting to use BigInts for arbitrary-precision arithmetic.
For simple data storage, I'd rather just use a Uint32Array and split/merge bits over element boundaries as appropriate.
For anyone interested in this space, Structurae[0] provides a well tested 'binary' protocol for efficiently storing and transmitting well-structured data, among many other useful and compact data structures (BigBitField, BitArray, etc).
According to the author that wasn't a design goal:
> While looking all this up, I found Justin Fagnani’s article Composite Map Keys in JavaScript with Bitsets, which is mostly unrelated to what I’m doing here
In fact, the author specifically wants his records to be mutable, which precludes using them as Set or Map keys (at least at the same time).
So immutable bigints seem like sort of the wrong tool for the job: changing any single field requires creating a completely new number (and while in theory a compiler might be able to optimize away the copy, in practice I doubt this happens).
Maybe just implement a hash function and some buckets and return an interned object? You could implement something with the same functionality but have a lot more flexibility if you’re not relying on cramming things into a bigint.
I suspect the "find all keys with identical values" example would be fastest if they'd use a struct-of-arrays approach where each key gets its own typed array and objects are identified by a unique index. Just iterate over each array and as soon as a value isn't eqoal to the next then that key is thrown out.
Does basically leverage the fact that JS VMs have access to all SIMD instructions for better optimizations and WASM does not? Otherwise, you can do something similar with array buffers, data views and a couple of fast WASM functions.
Looks very similar to Flatbuffer. It’s an interesting approach to store immutable data in a memory efficient way and it’s fast to serialise and deserialise.
I do something like this in Kyudosudoku[0] to store the undo history in localStorage, but I took it a few steps further than this author as I was pursuing different goals:
* I don't use bit or byte boundaries. If I need to store a field with 11 possible values, I go (n*11) + value. You'd think that would be a performance bottleneck, but it turns out major browsers are hella fast at this, at least when handling the amount of data in one Kyudosudoku undo item, which isn't a lot. The obvious downside is that you can't index into the data, you can only decode the whole thing, but I always want to decode a whole undo item at a time anyway.
* The main goal was for the representation to be as compact as possible so as to fill up as little of the user's localStorage as possible. Using hexadecimal for this is a massive waste. So I try to use the entire range of Unicode characters that are 16 bits in UTF-16 (that's the BMP minus surrogates, but to be safe I avoid the C0 control characters too). (I reserve the space character so that I can concatenate the entire undo history into a single string and still split it apart later without having to decode all of it.) This means I essentially encode the BigInt in base-63453. Again, dividing and moduloing huge numbers by 63453 seems like a performance nightmare, but it looks like modern browsers handle it fine, as the undo/redo feature is perfectly usable.
[0] https://kyudosudoku.timwi.de/
Clever, but it seems like you're still encoding states separately. For an undo/redo stack, you can probably do much better spacewise with some sort of delta-encoding, either by explicitly storing the changes made to the grid (or more accurately, the inverse of changes, so you can undo them) or by comparing the new state with the last one whenever saveUndo() is called and storing only the difference.
That is true, and I did think about that, but the current implementation is more than adequate for my years of playing Kyudosudoku, so I considered it not worth the implementation effort.
I wonder if it would be worth compressing the bigint as an Uint8Array
So app state -> bigint -> bytearray -> compressed bytearray -> bigint -> packed utf16 string.
Probably it would but help much
I'd be skeptical of the performance of any getBits() implementation. With the traditional approach ((X >> A) & B), you'll have to redundantly bit-shift all the data above the field you're aiming for. The other option would be to mask prior to the shift, but then you have a redundant copy of the data below the field.
Also, there's no great story for getting a large byte blob into or out of a BigInt, short of converting it all to ASCII hex and back. I ran into that limitation when attempting to use BigInts for arbitrary-precision arithmetic.
For simple data storage, I'd rather just use a Uint32Array and split/merge bits over element boundaries as appropriate.
For anyone interested in this space, Structurae[0] provides a well tested 'binary' protocol for efficiently storing and transmitting well-structured data, among many other useful and compact data structures (BigBitField, BitArray, etc).
[0]: https://github.com/zandaqo/structurae
How does this compare to using ArrayBuffer/DataView, which seems like the more obvious choice for this sort of thing?
This wouldn't require any bitwise operations to store/retrieve fields, assuming they are all byte-aligned.
They are more obvious, but it wouldn't work in Set or as Map keys, which seems to be a design goal here.
According to the author that wasn't a design goal:
> While looking all this up, I found Justin Fagnani’s article Composite Map Keys in JavaScript with Bitsets, which is mostly unrelated to what I’m doing here
In fact, the author specifically wants his records to be mutable, which precludes using them as Set or Map keys (at least at the same time).
So immutable bigints seem like sort of the wrong tool for the job: changing any single field requires creating a completely new number (and while in theory a compiler might be able to optimize away the copy, in practice I doubt this happens).
Oh yeah. Huh. I guess I read that wrong.
Maybe just implement a hash function and some buckets and return an interned object? You could implement something with the same functionality but have a lot more flexibility if you’re not relying on cramming things into a bigint.
I suspect the "find all keys with identical values" example would be fastest if they'd use a struct-of-arrays approach where each key gets its own typed array and objects are identified by a unique index. Just iterate over each array and as soon as a value isn't eqoal to the next then that key is thrown out.
Does basically leverage the fact that JS VMs have access to all SIMD instructions for better optimizations and WASM does not? Otherwise, you can do something similar with array buffers, data views and a couple of fast WASM functions.
Looks very similar to Flatbuffer. It’s an interesting approach to store immutable data in a memory efficient way and it’s fast to serialise and deserialise.