The media engagement index is based on a user's past activity on a site, but Chrome has a special list of "preloaded" sites that are allowed to autoplay video even without any prior media engagement.
The preloaded list is in the source code (https://github.com/chromium/chromium/blob/master/chrome/brow...) but it's encoded as a finite state automaton that makes it a bit difficult to enumerate the list of whitelisted domains.
I made a small Python script to unpack the DAFSA in preloaded_data.pb.
Here is the code: https://gist.github.com/NeatMonster/e9cdb01441a3cd842e6a20fd...
And here is the plain-text list: https://gist.github.com/NeatMonster/e9cdb01441a3cd842e6a20fd...
One has to wonder whether they intentionally obfuscate this list. It sounds like they “trained” a browser, and captured the resulting state. I’m sure you can argue this makes things more fair (we trained it using real world behavior!), but I really can’t give them the benefit of the doubt anymore.
It's generated by a Python script [0] from a list of URLs, but the input list doesn't seem to be included in the Chromium source (only the binary output of this tool).
[0] https://github.com/chromium/chromium/blob/615d5eed47c10d8890...
> The pre-seeded site list is generated based on the global percentage of site visitors who train Chrome to allow autoplay for that site; a site will be included on the list if a sizable majority of site visitors permit autoplay on it. The list is algorithmically generated, rather than manually curated, and with no minimum traffic requirement. With the implementation of the autoplay policy for Web Audio in M71, Web Audio playback is also included in calculating the MEI score for a given site.
https://www.chromium.org/audio-video/autoplay/autoplay-pre-s...
Will this not have some kind of self-reinforcing behavior, as the measurements are biased towards sites that are currently unmuted by default?
According to the MEI it actively measures user behavior and one of the most important measures is that a video is unmuted. From the document:
“The MEI is meant to allow media heavy websites (e.g. YouTube, Netflix) that rely on autoplay for their core experience. It is a non-goal to allow websites with a “good media behaviour” to autoplay without restrictions”
It doesn’t sound too good, and still doesn’t really explain how everything is seeded.
If it's a FSA can someone at least convert it to a regular expression or some other more readable format?
Is there no way to decode it
Take a list of top X websites and enter it in every one.
The preimage space is finite and easily enumerated.
neatmonster wrote a script to decode the list and then shared links the results here:
https://news.ycombinator.com/item?id=24819473