Hacker Newsnew | past | comments | ask | show | jobs | submit | cyclecycle's commentslogin

This has become such a problem in scholarly publishing that we have a business that provides citation checking https://groundedai.company/ that we've been buidling for a couple of years now

What’s the hallucination rate of your AI?

So far we basically just provide a very rule-based approach and try not use LLMs as much as possible. So we extract and parse the citations using various ML and rule-based approaches, and carry out a bunch of predetermined queries and do various fuzzy matching approaches on the metadata components, and have a bunch of rules around risk levels of things we should have found/matched based on what type of source it is, which venue we should have found it in, etc.

So there are absolutely a bunch of tasks that could be evaled/benchmarked, but "hallucination rate" isn't particularly applicable/interesting as a metric of how good the tool is

that said, we do use various LLMs (mostly local, fine-tuned, small, for things like NER/parsing/metadata comparison, etc.). and they can and do hallucinate, but we have very hard constraints on the validation, so any extraction results that don't match 1:1 back to the input text are discarded for example. so again, rather than hallucination risk we prefer hard constraints


Great to see this picked up.

Nick Morley from Grounded AI here (https://groundedai.company)

We ran the analysis in collaboration with Nature :)


Nick Morley from Grounded AI here (https://groundedai.company)

We collaborated with Nature here to study the extent of fake/frankenstein citations in scholarly literature (from top 5 publishers - Springer, Elsevier, Wiley, Sage, Taylor & Francis)

We're estimating hundreds of thousands of papers affected in 2025 with hallucinated citation issues

As part of the work we analysed 20k papers generated with ChatGPT API to figure out which citation errors are characteristic of gen AI use and use that classify the errors we saw in the wild.

The world's gone mad, publishing is in a nuts state, the training data is poisoned!


yeah this makes sense. we run a citation verification service and provide publishers with data of hey this citation could be fake etc. but we don't currently capture any "action" or "measured result" so i guess that's what we need to expand to next


Trying to think about our best chance of making a lasting contribution (therefore actually surviving?) as a startup.

I'm a big fan of focusing on verification tools ("Verification is all you need"?) but who knows.

I'm sure I'm missing something in my perspective. Here to expand my aperture


A classic case.

I work on Veracity https://groundedai.company/veracity/ which does citation checking for academic publishers. I see stuff like this all the time in paper submissions. Publishers are inundated


Don’t publishers ban authors who attempt such shenanigans?


That's basically what we're doing with app.studyrecon.ai.

What we've found is that vector similarity is often not the final solution. It is still only a crude proxy for the true goal of 'informativeness' or 'usefulness' with relation to the user goal/query. Works okay, but we're definitely seeing a need for more rigorous LLM-postprocessing to enrich the results set.

Which, yes, the time adds up quick!


Here's my attempt at a simple explanation of transformers. I would love feedback on whether I've got it right and how I could improve it. Cheers


Some personal reflections on the direction of my work.


We're working on this at Grounded AI (https://www.groundedai.company/contact-us). We'd love to help you if we can. Feel free to contact me (email is on my profile page)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: