Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transformers Learn Shortcuts to Automata (arxiv.org)
37 points by bmc7505 on April 26, 2023 | hide | past | favorite | 2 comments


This looks like a very interesting paper that takes the rare approach of actually trying to understand what all the cool new language models are doing at a fundamental level.

Does anyone with more knowledge of the relevant mathematics (group theory and so on) care to chime in?


This paper is a very good advertisement for Krohn-Rhodes theory, which shows how automata decompose into simpler automata. I think it's a somewhat obscure topic within math (among people who aren't semigroup theorists), so I was happy to be exposed to it.

It's a bit shocking that they got Transformers to actually learn the theoretical low depth algorithms for simulating automata, but looking closer at their results we can see that the parts that I would intuitively think are hard to learn (i.e. learning parity) are fairly brittle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: