A downloadable analysis

When a network embeds a token, it has to be able to unembed it again. Giving the same task to a human would be like saying: "I am thinking of a word or part of a word. You can now decide on 700 questions that I have to answer to determine that word. Also, spelling and pronunciation don't exist, you have to find it through its meaning." 

Then, there are a bunch of questions that it intuitively makes sense to ask, and we'd strongly expect to find a lot of those represented in the embedding. 

Also, any computation will cut into this space, so it'll share some space with the semantic meanings and give rise to new meanings like "this word is the third in the sentence". 

All of this is more a property of the information theory of a language than a property of some specific transformer, so there'll be overlap between transformers that look at the same language.

Download

Download
Transformer dimensional analysis - How to use this document.pdf 36 kB

Install instructions

download pdf, go to link

Leave a comment

Log in with itch.io to leave a comment.