It's a specification of essentially a complex graph of mathematical operations. If there's a function called
def mult(a,b): return a*b
it's not much more informative to write:
def mult(activation_a, activation_b): return activation_a*activation_b
Many of these functions are not much more complex than that, and the names along with their comments are more than sufficient given familiarity with the literature. If you think familiarity with the literature is unreasonable, it's still not clear what could improve code like this in reasonable space. "This is a linear function, which means that it satisfies f(x+a)=f(x)+f(a)"? "This is the attention head, it acts as a mask on the sequence input"? It would be like complaining that someone made a tree class and didn't put a comment explaining what a leaf node is. Code readability always assumes some reader context and minimum pre-existing knowledge (as do all forms of technical communication).
When math is involved, it's much easier to read code with short variable and function names.