If I could feed this an article and have it generate headlines based on the text of that article (and they were any good), there is a solid chance I would pay real money for that service.
Headlines are an absolute pain, and as the article says, they're decidedly unoriginal most of the time. I can't see an obvious reason that an AI would be much worse at creating them as a human.
I've always wondered if sites do this the other way around. A site invests a lot in content they create you would think to get the most out of it they would serve it with different headlines to different demographics. Knowing from Facebook what headlines you have clicked through in the past can indicate how they should write future headlines to get you again.
There are many, many headline formulae in the world ( some of my favourites were written in the mid 1920s, in John Caples' book "Tested Advertising Methods" ) but they still take time to iterate through and hone for each article.
People don't have a craving for this kind of crap, that is they don't actively search for it. It works by exploiting the brain. It's the publishing equivalent to junk food. We know it's awful. We know it's bad for us. But we struggle to not consume it because it's cheap and it pings our reward systems.
Actually, I think that the clickbait junk makes us think that it will ping our reward systems. For me, at least, it doesn't really reward me very well (even in a junk food way).
Maybe this means that the real clickbait trash is training me not to click on it, so I don't need the fake to do so?
I cured myself out of clickbait headlines after I clicked on few and learned to expect no content on the other side. It's a simple association, really. You click on something X-y, you get no reward, you learn not to waste time on X-looking things.
For you certainly. But the proof of the pudding is in the tasting. Clickbait drives an insane amount of traffic and it shows no slowing down. Cosmo has been successfully using "clickbait" titles for decades on the cover of their magazines.
Of course. I'm not denying the effectiveness of this technique, just providing a n=1 datapoint. Maybe my personal idiosyncrasies make me immune to that particular type of traffic-driving technique (I have no doubts I'm vulnerable to other methods).
Part of our reward systems is what was initially named the "pleasure centre".
When the "pleasure centre" in the brain was first identified and named, it was named because it was thought that stimulating it caused pleasure, because rodents given the choice to stimulate it vs. other activity would stimulate the pleasure centre even over eating.
But as it turns out, the main function of stimulating this area is strong cravings and compulsion. You may get some pleasure from giving in to the cravings, but the cravings are independent of whether or not there's a "real" reward at the end of it.
Reminds me of a comment I read a few weeks back when an ex-drug addict was describing how the anticipation of using drugs was often more rewarding than the use itself. Which explains the pleasures in drug use rituals.
I've just banned myself from visiting many of the worst offenders. Though it's getting really hard, any more, to find sites that won't sink to that level.
Tl;dr: No. Wrong output format, wrong training set, wrong input.
To create a classifier that does that, you'd need a labeled set - i.e. someone would have to go through and say "this headline is 3 clickbaits. This other headline is 8 clickbaits". You could also sort between clickbaity and non-clickbaity, but that would still require manual work.
You could get that programatically through a few different means, but you'd need a lot more than just headlines.
It also probably wouldn't be a good idea to use a RNN - it doesn't suit the data format well. It'd be better to use a neural network (non-recurrent) or logistic regression with the entire headline as input.
Fortunately, it'll converge on a good solution a LOT faster - fewer parameters to tune + simpler output = fewer examples needed to figure out what's going on - so you might be able to get something that has plausible levels of accuracy with a day or two of set labeling (estimate brought to you by my ass).
>Unfortunately, someone out there must really have a craving for "weird old tricks" and "shocking conclusions".
This problem seems concurrent to the old mystery of Viruses Spontaneously Self-Constructing On People's Computers. "How did you get all these viruses on your computer?" "I didn't do anything it just happened." "Okay, well be really careful what you click on." "I am careful!"
Yes, but even respectable news and information websites now include clickbait (Outbrain and other "sponsored" content). I've seen it on WSJ, NYT, and other sites even when I'm paying $10-$15 a month.
Could this RNN model perhaps be used to filter click bait headlines from HN automatically? Perhaps one could perform some sort of backward beam search to figure out how likely a particular headline would've been produced by it. If there are words in a headline that the model doesn't know, one could perhaps just let it replace it with one that it knows.
I really find RNNs to be pretty cool. When they are combined with a natural human tendency to see patterns they are hilarious. So perhaps we need to update our million monkeys hypothesis to a million RNNs with typewriters coming up with all the works of Shakespeare.
Nice!
I've wanted to do something like this for awhile, too, but haven't had the time yet.
What's interesting to me, from a research point of view, is the degree of nuance the network uncovers for the clickbait.
We all know that <person> is going to be doing <intriguing action>, but for each person these actions are slightly different.
The sentence completions for "Barack Obama Says..." are mainly politics related while "Kim Kardashian Says..." involve Kim commenting on herself.
So it might not really understand what it's saying, but it captures the fact those two people will tend to produce different headlines.
Neat Idea: what if we tried the same thing with headlines from the New York Times (or maybe a basket of newspapers)?
We would likely find that the Clickbait RNN's vision of Obama is a lot different from the Newspaper RNN's Obama.
Teasing apart the differences would likely give you a lot more insight into how the two readerships view the president than any number polls would.
There's not an incredibly rich structure to extract, and with short outputs the weirdness doesn't compound and cycles aren't as likely. A common small dataset for playing with RNNs is all of Shakespeare which is somewhere in the region of 1M words.
> There's not an incredibly rich structure to extract, and with short outputs the weirdness doesn't compound and cycles aren't as likely. A common small dataset for playing with RNNs is all of Shakespeare which is somewhere in the region of 1M words.
He does state that the network is trained with 2M headlines, meaning ~5-20M words. That should be enough.
I would have thought that RNN would somehow work better. It would be interesting to see direct comparison of fake hacker news headlines generated with Markov chains versus RNN.
True, I had managed to miss that, although it's working on 200 dimensional vectors rather than single letters as in the small shakespeare dataset. That feels like it might make it harder to train. I've personally found more problems dealing with Glove vectors compared to the word2vec ones, but I don't have any hard data for that.
This was an enjoyable article. There is an obvious extension which is to mturk the results and feed the mturk data back into the net. Just give the turkers 5 headlines and ask them which they would click first, repeat a hundred times per a thousand turkers or whatever.
Years ago I considered applying for DoD grant money to implement something reminiscent of all this for military propaganda. That went approximately nowhere, not even past the first steps. Someone else should try this (insert obvious famous news network joke here, although I was serious about the proposal). To save time I'll point out I never got beyond the earliest steps because there is a vaguely infinite pool of clickbaitable English speakers on the turk, but the pool of bilingual Arabic (or whatever) speakers with good taste in pro-usa propaganda is extremely small, so the tech side was easy to scale but the mandatory human side simply couldn't scale enough to make the output realistically anything but a joke.
A lot of this kind of work ends up being repetitive-- like multiplying two matrices together that have a few thousand entries each. These are the sorts of things that GPU's do very well with. GPU's have the ability to do such things on a massively parallel scale. GPU's also tend to have more memory bandwidth doing the kinds of things that a CPU would get bogged down on in the memory cache.
GPU's are really good at parallel tasks such as calculating the color of every pixel on the screen, or doing the same operation on a large dataset. According to Newegg, the GTX980 has 2048 CUDA cores (parallel processing cores) that run at ~1266 MHz as opposed to a nice CPU which might have 4 cores that run at 4 GHZ. In other words, if you want to manipulate a whole bunch of things in one way in parallel, you can program it to use the GPU effectively, if you want to manipulate one thing a whole bunch of ways in series, CPU is your best bet.
Coarse rule-of-thumb: running on Geforce class GPUs you can get up to 5x, maaaybe 10x the performance per dollar as compared to a top-line CPU. Assuming your problem scales well on GPUs, many problems don't. The GTX980 is actually a great performer. For Tesla class systems like the K40 it's a lot closer to equal with the CPU on performance/$ (they're not much faster than the GTX980 but a lot more expensive). But you can get an edge with the Teslas when you start comparing multi-GPU clusters to multi-CPU clusters, since with GPUs you need less of the super-expensive interconnect hardware. (You're not going to put GTX cards in a cluster, you'd have massive reliability problems.)
IMHO, the guys showing 100x speedups on GPUs are Doing It Wrong; they use a poor implementation on the CPU, use just one CPU core, consider a very synthetic benchmark, or a bunch of other tricks.
Error: 500 Internal Server Error
Sorry, the requested URL 'http://clickotron.com/' caused an error:
Internal Server Error
Exception:
IOError(24, 'Too many open files')
Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 862, in _handle
return route.call(**args)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 1732, in wrapper
rv = callback(*a, **ka)
File "server.py", line 69, in index
return template('index', left_articles=left_articles, right_articles=right_articles)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 3595, in template
return TEMPLATES[tplid].render(kwargs)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 3399, in render
self.execute(stdout, env)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 3386, in execute
eval(self.co, env)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 189, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 3344, in co
return compile(self.code, self.filename or '<string>', 'exec')
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 189, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/usr/local/lib/python2.7/dist-packages/bottle.py", line 3350, in code
with open(self.filename, 'rb') as f:
IOError: [Errno 24] Too many open files: '/home/ubuntu/clickotron/views/index.tpl'
Yep. Have a separate process cache a few up and simply cp over the active one to be served as a big bag of immutable bits. Bonus points for using a CDN.
I used a simpler technique (character level language modelling) to come up with an Australian real estate listing generator: http://electronsoup.net/realtybot
This is pre-generated, not live, for performance reasons. There are a few hundred thousand items though, so the effect is similar.
The data source is several tens of thousands of real estate listings that I scraped and parsed.
I can't understand the first two layer RNN which according to the author optimized the word vectors.
it says:
During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.
how to you follow the gradient down into these word vectors?
if word vectors are the input of the network, don't we only train the weight of the network? how come the input vectors get optimized during the process?
That was exactly the problem, bottle+gevent serving static files. It's moved behind nginx now. (But you might have to wait for a DNS propagation before you get to the new server.)
Headlines are an absolute pain, and as the article says, they're decidedly unoriginal most of the time. I can't see an obvious reason that an AI would be much worse at creating them as a human.