News agencies and websites have their content reproduced on Google Search and Ne...

dwaite · on Aug 7, 2023

Publishers specifically provide that content however, via techniques like open graph. They already can control how much text or what images are displayed in results. They can also indicate they don't want indexing at all.

Yet they publish articles with the entire headline and backing images marked for free display on Google/Facebook. Almost like they are trying to help the search engines to attract traffic.

warning26 · on Aug 6, 2023

Then ban that, don't ban linking.

mattstir · on Aug 6, 2023

Who's banning linking? The new law tried to get tech companies to pay to show the links with previews (similar to Australia and France, etc).

To be clear, the bill is fundamentally broken. It would require Google or Facebook to pay simply when links to news sites are served, rather than for reproducing or condensing the material in the news article as a "preview" sort of be thing (as in other countries). The bill isn't banning links explicitly, but the government should have seen this coming.

dwaite · on Aug 7, 2023

> The bill isn't banning links explicitly, but the government should have seen this coming.

Perhaps they could have listened when the two companies they are trying to impose this on provide suggestions - or maybe modeled it more closely after legislation in other countries where such media taxes have been successful.

rvnx · on Aug 6, 2023

From the perspective of the newspapers:

Google is exploiting copyrighted content to make profit and create audience.

If the search index of Google was empty, the people wouldn't use Google, it's that simple.

So the newspapers are asking for royalties for feeding that search index.

gruez · on Aug 6, 2023

But they could easily opt out with robots.txt. They want to have their cake and eat it too. They want the free traffic from search engines AND want google to pay them for the privilege of bringing them traffic.

zlg_codes · on Aug 7, 2023

robots.txt is not an actual Internet standard, and there are no standard controls for when a bot ignores its contents. You're on your own to protect your pages.

Granted, I have `disallow: /` in mine because I don't want my stuff scraped, but I still see Googlebot sometimes, and others, try to crawl my site.

It's not a very effective opt-out, because it requires the 'attacker' to honor the file's settings.

Feel free to enable it on your own server and watch the logs for a few days after sharing some links.

18pfsmt · on Aug 8, 2023

Have you tried crying about the fact that you are getting traffic to your website that nobody would see otherwise?

I hope daddy government treats you well.

Count on Big Brother to save you, and they will beat you down when it's time as well.

warning26 · on Aug 6, 2023

That's like demanding that Rand McNally pay a fee to each city they print onto their maps. Or demanding that World Book Encyclopedia pay a royalty fee to every entity they write an article about.

Noting that something exists and including it in reference material should by no means incur royalties.