Sure! So it’s hopefully clear that the notion of constrained grammar is not novel (see every comment on here of people name-dropping their implementation from two months ago).
The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step.
The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc.
Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in.
Yep! But that’s sort of my point, and maybe this is just some misplaced academic shit of mine but if you’re going to write a paper then “easily had this idea and never discovered the paper” just doesn’t fly.
Almost all academic work is derivative tweaks of yesterday’s work, yet we still fall over ourselves to cite this stuff.
The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step.
The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc.
Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in.
Again - shill OSS all day - I’ll upvote it.