I think that's a valid question and I ask it every time someone reports "this LLM said X about itself", but I think there are potential ways to verify it: for example, upthread, someone pointed out that the part about copyright materials is badly worded. It says something like "don't print song lyrics or other copyright material", thereby implying that song lyrics are copyrighted. Someone tested this and sure enough, GPT-5 refused to print the lyrics to the Star Spangled Banner, saying it was copyrighted.
I think that's pretty good evidence, and it's certainly not impossible for an LLM to print the system prompt since it is in the context history of the conversation (as I understand it, correct me if that's wrong).
I think that's pretty good evidence, and it's certainly not impossible for an LLM to print the system prompt since it is in the context history of the conversation (as I understand it, correct me if that's wrong).
https://news.ycombinator.com/item?id=44833342