Generating Pixels One by One | Svelte Hacker News

Surely the ideal choice for the

> learned token embeddings

for the pixel intensities in M3 would be... a single scalar value between 0 and 1? (That is, a "vector" of length 1, consisting of just the original pixel intensity.)

Am I misunderstanding something? Or is this maybe a case of "Yes, you could just do that much simpler thing in this simplified example, but in general you need the complicated approach that we're trying to demonstrate"?

archermarks 6 months ago

Really nice article! This clarified a lot to me about how positional embeddings work and how other useful context might be embedded in other models.