Hacker Newsnew | past | comments | ask | show | jobs | submit | jayers's commentslogin

How is this, in principle, any different from the DOJ using a subpoena to get customer records from an adult store that was allegedly selling illegal explicit material?

Just because you use the internet to commit the crime doesn't make it not a crime.


I’d say a big difference is that in your example the thing that was supposedly sold was entirely illegal to possess for any reason.

The case being discussed is one where someone might be able to use the product to break the law.

So it’s more like demanding that Home Depot, Walmart, Amazon give the names of every American who’s ever bought a crowbar because the DOJ has heard that some people are breaking into buildings with crowbars.

It has been alleged that the government doesn’t want to prosecute these people who are the ones committing the crimes, they “just want to talk” in order to prosecute the company. Not sure I’d trust that without a signed immunity agreement. If I were forced to speak to these goons, I certainly wouldn’t say a word unless they gave me one of those - regardless of what I was using the gadget for.


Journalists by definition cannot be anonymous. That's why its a dangerous job.


> I expect next go-around they'll want to pay some big design agency for a custom site; it'll probably be six figures. I don't know how I should approach that discussion. Any ideas?

Keep your ear to the ground and when you start to hear rumblings of this happening, pay a skilled freelancer to update the old website (or just build a new one if its easier) to fit the new marketing director's taste. Solve marketing's problem, save the company a bunch of money, be the hero.


Yeah, that's the obvious thing to do. My fear is that we (more directly: I) have burned trust. After all, they moved us away from an internal solution because marketing didn't like it; how can I persuade them that a new internal solution wouldn't be going backwards?

You'll have to trust me: from a customer-facing POV the old site was miles better than the new; it's only from an "Owner"-facing POV that the new one is preferable.

Um, I guess that hints at the answer: any new proposal has to start with their (Marketing's) experience of the product - flatter them to the max. This probably should start with my being a little more-attuned to their specific frustrations with their new solution. Hmmm.

Thanks for thinking this through with me, HN. Would welcome any other advice.


It's funny: publishing work offline in books and magazines is perhaps more anonymous in the age of AI.

I pasted in a number of passages from books on my bookshelf. Predictably, stuff that I read for my English degree in university is largely in the training data and easily identifiable. Stuff from regional authors or is slightly adjacent to the cultural mainstream makes no impression.


To clarify, because a number of posts here sort of suggest the confusion:

the article here isn't about the LLM recognizing works that were in the training data. EG, The Old Man and the Sea off the shelf. It's about pegging the author of novel texts, like, say, some letter written by Hemmingway that gets discovered next week and was never before digitized.


Yes, that makes sense. However, unless there's a significant corpus of an author in the training data it won't recognize them. One of the author's that I fed into Claude was a passage from the book Leepike Ridge by ND Wilson. Wilson has written online and in print quite a bit, but Claude couldn't guess the author and guessed that it was a passage from a noir crime novel.

Wilson is a fairly idiosyncratic writer with a distinct style, yet even still Claude couldn't guess correctly from a currently published book.

I suspect that what's going on here (like other's are suggesting in this thread) is that Claude is in some way biased towards certain sets of authors by its training.


It is for now.

But I'm sure the scanning operations will start scouring the earth even harder for any books unaffected by slop containing niche knowledge and text in order for their models to have an edge over the ones trained only on pirate collections and the Internet.

I wonder if secondhand bookshops and deceased estates are seeing bulk buyers of their stock suddenly appearing. Maybe broke governments/municipalities will start selling them entire libraries and archives to ingest.


Profitable remains to be seen, but it is undoubted that the potential resources in the solar system are (pun intended) astronomically valuable. Getting at them is "just" an engineering problem.


That seems like an unfounded inference. Plenty of animals have more neurons than humans but lesser cognitive and language abilities. Language has lot to do with structure of the brain in addition to neuron count.


One thing I've learned by following a link from elsewhere in this thread is that while the total count of neurons in an animal's nervous system is not a good proxy for intelligence, the count of neurons in the forebrain is. By that measure, only the orca ranks higher than humans [1].

That doesn't mean language ability is a natural outcome of crossing a certain threshold of brain complexity; if anything it's more likely the other way around: this complexity being be driven by highly social behavior and communication.

1. https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...


Language also has a lot to do with what we do. We do more complex things than animals, so we say more complex things than animals. The biggest difference in the evolution of human language versus the evolution of elephant language might just be that we have thumbs.


https://www.nature.com/articles/s41559-025-02855-9

Birds have areas of the brain that we would consider language alike. Both for native bird communication and I would also speculate that for human to bird communication.

If you have ever owned a parrot this is blatantly obvious since they actively communicate and vocalize both observations and needs/desires


This isn't even a hard question. The movie theater is open but movies that are rated R are not. In this case, Reddit.com is a movie theater, subreddits are movies. The website might be open, but not every subreddit is. This is in fact how Reddit already operates, age verification is just a joke right now.


I'm incredibly dubious of the conclusions of this researcher. Claude Opus was used to gather and analyze all of the data.

I am not skeptical of any of the research, the sources seem to be cited properly. I am skeptical that this researcher has thought through or verified their conclusions in a systematic and reliable fashion. This part gives it away: "Research period: 2026-03-11 to present." This individual dropped his investigative report two days after beginning research!

Yes, AI is an incredibly good research assistant and can help speed up the tasks of finding sources and indexing sources. The person behind this investigation has not actually done their due diligence to grok and analyze this data on their own, and therefore I can't trust that the AI analysis isn't poisoned by the prompters implicit biases.


I agree. I tried reading some of the documents and they're full of this:

> LIMITATION: Direct PDF downloads returned 403 errors. ProPublica Schedule I viewer loads data dynamically (JavaScript), preventing extraction via WebFetch. The 2024 public disclosure copy on sixteenthirtyfund.org was also blocked.

> Tech Transparency Project report: The article "Inside Meta's Spin Machine on Kids and Social Media" at techtransparencyproject.org likely contains detailed ConnectSafely/Meta funding analysis but was blocked (403)

The least they could have done is read their own reports and then provided the documents to the LLM. Instead they just let it run and propose connections, asked it to generate some graphs, and then hit publish.


Some of these are also just like really weak? One of them for example seems to be some random employee at FB donating ~$1k to a politician and calling that a link. The entire "Proven Findings" is all over the place and provides no coherence. I don't think it's a particular secret that Meta would prefer age verification be done at the OS level so I'm not really sure what the added claim here is.

> A Meta employee (Jake Levine, Product Manager) contributed $1,175 to ASAA sponsor Matt Ball's campaign apparatus on June 2, 2025. Source: Colorado TRACER bulk data.

> No direct Meta PAC contributions to any ASAA sponsor across Utah, Louisiana, Texas, or Colorado. Source: FollowTheMoney.org multi-state search.

While it is true that Meta has funded groups that advocate for age verification, a lot of them also appear to have other actors so it's not like this is some pure Meta thing as some of the other commenters are suggesting.


This is a fascinating report, not because of the content or even quality of the report, but because of the way it was generated. It is an AI generated report dumped into GitHub and has made it onto the front page of Hacker News with over 1,000 upvotes and many comments.

This type of GitHub-based open-source research project will become more common as more people use tools like Claude Code or Codex for research.


It's not slop when it confirms my biases. /s


Hmmm.

_GPT, prioritize truth over comfort, challenge assumptions, and avoid flattery. And analyze the patterns of biases in my prompts, and then don’t do that… or something_

Give it time, we’ll come up with something.


I came here to say that this is pretty much my view having poked around a little bit as well.

This file does not exactly fill me with confidence: https://github.com/upper-up/meta-lobbying-and-other-findings...

In one part of the report, there seems to be this implicit assumption that Linux and Horizon OS (Meta's VR OS) are somehow comparable and that Meta will be better equipped than Linux if age verification is required.

It doesn't explicitly say "This will allow Horizon OS to become the defacto OS and Linux will die out" but that seems to be the impression I'm getting which uhh... would make zero sense.

More broadly, this entire report (and others like it) are extremely annoying in that I've seen some Reddit comments either taking "lots of text" as a signal of quality or asking "Does anyone have proof that these claims are inaccurate" which is

a) Of course entirely backwards as far as burden of proof

b) Not even the right rubick because it's not facts versus lies, it's manufactured intent/correlations versus real life intent/correlations (ie; bullshit versus not)

All of this could be factually true without Meta being smart enough to play 5D chess


>taking "lots of text" as a signal of quality

Or of authority, when they're not equipped to evaluate the data first-hand.

The Gish gallop technique in debate overwhelms opponents with so many arguments that they're unable to address them all before the time limit. Reports presented like this are functionally that, but against reader comprehension and attention.

Similarly, being the first, loudest, or only voice claim is unreasonably effective at establishing perception of authority, where being unchallenged is tantamount to correctness. This also goes both ways; censorship in media, for instance, can be used to promote narratives by silencing competing views, like platforms selectively amplifying certain topics to frame them as more proven and widely supported than they might actually be.

It's unfortunate that inexpert execution often positions well-meaning and potentially correct arguments to be discredited and derided by prepared opponents before their merits can be established. In this case, it may be true that Meta may have organized a well-coordinated shadow campaign for legislation using technically legal channels, but I'm sure they've anticipated this at some point, or are relying on the inertia of the system and initial buy-in to force the course.


Concur. The data is not independent of the conclusions reached, and feels very Reddit research like - (à la Boston bombing).

In this case they have named individuals and firms as well, without the degree of diligence that such call outs should warrant.

In its current state, I would count it as a prelude to witch hunts.


As someone who was actually hurt by being exposed to pornography as a child (though this was on the wild internet before Discord), I think you're being both histrionic and downplaying the dangers posed to children online.


That does not mean everybody systematically is "hurt" like you were. That is a very dangerous extrapolation.


Of course not everybody is systematically hurt nor did I claim that. About 11% of porn users are addicts. Porn addiction is real and incredibly difficult. But even when you are not addicted, regular porn usage has negative effects on mental health (lower impulse control, higher rates of depression and anxiety). It stands to reason that this effect would be exaggerated in children and teenagers.

At best, pornography is akin to alcohol and cigarettes. We regulate the ability of minors to access these things for obvious reasons. I see no reason why internet pornography is different.


This is the crux of it. The digital world doesn't produce value except when it eases the production of real goods. Software Development as a field is strange: it can only produce value when it is used to make production of real goods more efficient. We can use AI to cut out bureaucratic work, which then means that all that is left is real work: craftsmanship, relationship building, design, leadership.

There are plenty of "human in the loop" jobs still left. I certainly don't want furniture designed by AI, because there is no possible way for an AI to understand my particular fleshly requirements (AI simply doesn't have the wetware required to understand human tactile needs). But the bureaucratic jobs will mostly be automated away, and good riddance. They were killing the human spirit.


> Software Development as a field is strange: it can only produce value when it is used to make production of real goods more efficient. We can use AI to cut out bureaucratic work, which then means that all that is left is real work: craftsmanship, relationship building, design, leadership.

Thats a really odd take. Software is merely a way of ingesting data and producing information. And information often has intrinsic value. This can scale from simple things like minor annoyances of forgetting your umbrella, to avoiding deaths/millions of dollars in losses due to ships sinking in storms.

Now the long term value of software does approach zero, because it can usually be duplicated quite easily.


Extraction and manufacturing are considered the primary and secondary economic sectors. In a closed loop system, tertiary and onward sectors, like services and technology, cannot exist without the primary and secondary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: