In the data I've scraped in my own nsfw programming work there's a great deal of...

rightsForRobots · on Nov 15, 2016

Yes even within a particular video there are lots of frames where the act is implied not directly shown, like a close-up of others faces. Karpathy et al. showed they could still learn from the sports video database even with random crowd shots or announcer shots not being removed.

I think the quality for the data influences the result and hand crafting the dataset is what lead to 95% accuracy on new instances.