Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hello, I am an author of the scrape. I did it more to try it, but who knows, maybe it will be useful to someone.

I went trough the description pages like http://thepiratebay.se/torrent/$i by increasing the $i and saving the magnet if pirate bay didn't return 404 error. I went trough the pages as unlogged user, though, so that might be the reason why I got only 1.5m torrents.

I didn't know pirate bay has hidden porn torrents; there is TONS of porn in the scrape already.

The script is in perl, I will post it to pastebin in a moment.

edit: allright, the script itself is here http://pastebin.com/8RXXthXB

as you can see, it's not very complicated.



I think it's a great idea, and a nifty hack.

It might be an to release a diff against this once a week, and write a quick script to grab it, keeping the list up-to-date.


I am thinking of releasing new versions once a week and putting the hash of the torrent of the newest version on some public site. (Say, some twitter account.)

But it would still be more proof of concept than really anything useful - the comments and descriptions ARE important.

edit: More I am thinking about it, the less useful it sounds.

First, the information about seeders vary constantly, especially with the new torrents.

Also, it STILL depends on single point of failure - the Pirate Bay itself. If TPB will be down for any reason, I will have no place to scrape this from and it will all fall apart anyway.

Plus, I think Pirate Bay itself should make dumps like this. It would probably be much better for their database anyway :)


I like the idea of a weekly twitter update with the master magnet hash. I feel like the purpose would not be the usefulness of the string of chars, but more to prove a point.


The porn torrents are only hidden from naive searchers; all the pages for them are still accessible if you've got a direct link to them, so your scraper should've picked all of them up.


i tried to run the script, however, i get an error (added diagnostics for more info, so line 13 refers to line 11 of your script, line 27 to line 25):

Can't use an undefined value as an ARRAY reference at piratebay_magnet_scrape.pl line 13 (#1) (F) A value used as either a hard reference or a symbolic reference must be a defined value. This helps to delurk some insidious errors.

Uncaught exception from user code: Can't use an undefined value as an ARRAY reference at piratebay_magnet_scrape.pl line 13. at piratebay_magnet_scrape.pl line 13 main::__ANON__(20697, 0, undef, 0, 0) called at /usr/share/perl5/Parallel/ForkManager.pm line 354 Parallel::ForkManager::on_finish('Parallel::ForkManager=HASH(0x9cd7ac8)', 20697, 0, undef, 0, 0) called at /usr/share/perl5/Parallel/ForkManager.pm line 333 Parallel::ForkManager::wait_one_child('Parallel::ForkManager=HASH(0x9cd7ac8)', undef) called at /usr/share/perl5/Parallel/ForkManager.pm line 285 Parallel::ForkManager::start('Parallel::ForkManager=HASH(0x9cd7ac8)') called at piratebay_magnet_scrape.pl line 27




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: