Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure! So it’s hopefully clear that the notion of constrained grammar is not novel (see every comment on here of people name-dropping their implementation from two months ago).

The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step.

The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc.

Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in.

Again - shill OSS all day - I’ll upvote it.



By "standard regex" stuff I take it you mean the standard regex stuff Python standard library comes with?

I mean going from standard regex to NFA to DFA is already more sophisticated than that one, it's _quite_ oldschool and gives you linear time matching: https://en.wikipedia.org/wiki/Thompson%27s_construction https://en.wikipedia.org/wiki/Powerset_construction

And what I mean to say by this as they could have easily have had this idea and never had discovered the whitepaper you referenced.


Yep! But that’s sort of my point, and maybe this is just some misplaced academic shit of mine but if you’re going to write a paper then “easily had this idea and never discovered the paper” just doesn’t fly.

Almost all academic work is derivative tweaks of yesterday’s work, yet we still fall over ourselves to cite this stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: