because the output isn't the result of cognitive reasoning, it's the result of a statistical optimization problem where the goal is maximum acceptance by the user.
these tools and approaches are neither gullible nor not-gullble.
Thank you for this. I think it's important that technical folks in particular not anthropomorphize LLMs, and help less technical people understand how they work and that they lack consciousness, emotions, and understanding.
You mean these highly anthropomorphised programs? It’s important that technical people don’t anthropomorphise them?
I agree but the creators of all the main LLMs have already crossed the line by a long way. E.g. It’s deeply troubling that it’s acceptable that LLMs deliver inline apologies.
it's kinda amusing because the LLMs evolved via selection of the best anthropic traits because the basic digested mountains of undiscerning information.
so the gullibility is merely one selected for trait convolution with state.
There is no evolution or trait selection involved in the training of LLMs. LLMs are statistical estimators of most likely next series of tokens. Those next series of tokens seem anthropic because most of the input was human generated text.
Consciousness and emotions, sure. Understanding, I'm not so sure.
Saying that LLMs cannot understand concepts because "it's just statistical pattern matching" is not going to help laypeople understand what LLMs are. After all, what is a human brain if not a biological pattern matching machine?
Because it still makes sense to reason about them that way, as weird as it may be. It's like the Chinese room problem, even though we know nobody is inside.
It's like when people say "our brain thinks that ..." when they talk about something we do subconsciously, or some illusion we fall for. What does that even mean? Is our brain suddenly a detached entity thinking on its own? Then what am I using to think? Yet, everybody understands what is meant by that.
So I don't think people keep forgetting that llms aren't conscious, at least as long as we talk about the target audience of articles like this, eg hn folks.
Real question, do you really want to know where you may be mistaken, or will you hand wave away information in order to protect what you believe?
No judgement here but I am just tired of sharing information with people to explain why while interpretation of a complex model may be hard, we know the methods of how PAC learning works and some hard boundaries on what it can do.
Obviously we need people to push boundaries and assumptions.
But we have known about hard upper limits for a long time. Right now we are pushing up to those limits in what we can actually implement, but those hard limits haven't budged in decades.
It is hormones. Also, life is a series of chemical reactions. However this approach is useful in some context and completely useless in other, like in context where people talk about life and love.
Although, it can be deconstructed to this it doesn't mean that the other POVs are false. The reasoning comes from the process itself, it triggers a series of calculations that are applied on the input, which are the reasoning part.
The analytical approach is useful when calibrating these calculations.
A simple proof that love is not just hormones is that love can last for some time. If it was just a chemical phenomenon why does it happen repeatedly. Why can someone feel love for someone just by bringing to mind the symbol which represents that person? Why can someone feel love by seeing an illustration of someone they love?
The point is that it is a phenomenon connected with memory and thinking processes. The reaction and release of hormones is part of the phenomenon, but not the entire phenomenon.
Take this in contrast to an allergy for example. Can you trigger an allergy by remembering the food? If you see an illustration of a food you are allergic to, do you get an attack?
A more similar phenomenon instead is a phobia. It is also a release of hormones based on some internal or external stimulus (you can apply all my examples about love). However in a phobia it's even more clear that the phenomenon is based on thinking patterns. Reducing the complexity to saying these are just "hormones and reactions" is the same as saying a computer is "instructions and interrupts". That is to say that a computer has these elements but what makes it work is that there are a bunch of other systems including the humans writing the software that organize these into a functional system.
Memory and thinking processes are reactions and synapses firing in the brain.
It's not "just" in the sense that there's a lot of emergence and complexity when these things all work together but it is "just" in the sense that these are all physical unmagical processes.
My point was that the original comment is a low-level view with missing details that gives little insight on the complexity of the phenomenon, not a support for dualism. It's all brain tissues and the information they encode.
I think we agree and it was fun to write this out.
It's an epiphenomenon on the scale of societies, not just people, that arises to make sense of what people do with hormones. There's a lot more to it than just the hormones.
A brain that only performed text output might be modeled like that, which is exactly what these represent.
But no brain we've ever seen was subject to that constraint, so it's a sort of fantastical target for modeling and doesn't say reveal anything about brains themselves or the various entities that seem to bear them.
Modeling one feature of a grossly simplified and incomplete brain is a creative approach to computational research and proved very fruitful, but there's no reason to leap from that success to the idea that real brains do work that way or even that the imagined only-textual-parts must.
In a brain the user to some extent is itself. LLMs do not have anything like this. They're once-through, static, and are not in any way embodied or self-referential (beyond context or what you feed back into them).
People really don't learn the history of AI any more apparently or this question wouldn't come up all the time.
There is basically any number of questions you can ask a two year old human who have never encountered that question nor anything even remotely similar to it and yet they can answer without fail. Meanwhile absolutely no AI can answer these unless the specific question / the rules underlying the questions were previously fed into it. The textbook example is "If Susan goes shopping will her head go with her?" Of course, since this specific question is literally a textbook one, you can't fool an LLM with it but it's easy to come up with brand new ones.
In the early 1980s this stopped Douglas Lenat who has worked very successfully on discovery systems and made him turn to assembling these facts and rules into CyC.
Leaving aside the fact the chances of your reply being in the training of the next GPT near 0, it certainly won't happen anywhere near fast enough to disprove your point to anyone reading this today.
No, the automated plagiarism machines have nothing and I am not obligated to cure your ignorance. Let's continue the conversation after you read an AI textbook, shall we?
Statistical optimization is a known process; one where we understand every step, and can therefore instruct machines on how to perform it. Cognitive reasoning is still today not understood (in the Von Neumann sense) by anyone.
The steps are understood well enough to instruct machines to do it, even if the output is to complex for us to comprehend it completely. Not so with thought, intelligence, consciousness, etc.
again, no they aren't. In ML, nobody instructs machines to do anything other than train and find those steps themselves. We have no clue what steps GPT takes when it solves problems.
For a straightforward example, when GPT-4 adds, what algorithm does it perform ? I have no clue, you have no clue and neither does anyone at open ai.
What we do is instruct machines to train. What they get out of training, we have very little idea.
Of course LLMs are not people. But human metaphors can (sometimes!) be useful in understanding, explaining, and even enhancing their behavior. For instance, techniques such as Chain-of-Thought prompting explicitly apply techniques that work well for people to improve the reasoning ability of LLMs.
A point I attempt to make in this article is that one reason reason LLMs are so vulnerable to jailbreaks and prompt injection is that these types of attacks include non sequiturs that are not well represented in the training data. I would argue that "LLMs are gullible because they are naive [haven't had much past exposure to this form of trickery]" is a reasonable mental shorthand for explaining and internalizing this idea. It's especially helpful for readers who won't be familiar with terms like "out of distribution" or "adversarial examples", but who would benefit from being able to internalize the idea that LLMs are easily subverted.
In other words, I don't think it's helpful to reflexively dismiss any application of human metaphors to LLMs. It's easy to go wrong with metaphors, but they can also be valuable tools for conveying complex ideas. Did you read the article, and do you have any comments as to the substance of its content?
This depends on perspective. I could argue the issue isn't that it's gullible but misaligned.
In the case of the napalm Grandma it seems odd to me that you're suggesting the LLM is stupid because it's answering in a way that makes sense given its prompt. The issue doesn't necessarily suggest a lack of reasoning, but that the LLM is trusting the human.
For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.
Maybe it just doesn't share our values and prioritises being honest and helpful. From this perspective the issue then wouldn't be that LLM is stupid, but that they are too trusting and too honest, and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.
> an AI that can reason well would probably know when not to trust humans
> it values preventing humans creating napalm over being correct and helpful.
> Maybe it just doesn't share our values
> prioritises being honest and helpful.
> they are too trusting and too honest
> an LLM that is more distrusting and deceptive
Current LLM's do/have/feel literally none of these things. They do not have emotion, they do not have "theory of mind" so they cannot be said to "trust" or "distrust". They cannot reason. They don't have any values - not our values, not different values, literally they have no values at all. They are not an alien species to be understood - they are unthinking, unfeeling, unyielding machines.
I was trying to present a crappy philosophical point – that the difference between a gullible AI and an unaligned one is fundamentally unknowable.
Any evidence you point to as proof that an AI is bad at reasoning, I can point to as evidence of misalignment. Like I say, whether the AI acts "gullible" because it lacks reasoning ability or is too trusting really just depends on your perspective. I happen to share your perspective on this, but not everyone does – and in my opinion this is interesting.
Anyway you're wrong. AIs do have values because they have bias and bias = values. I'm not suggesting those biases / values come from deeper reasoning ability, or that they're always perfectly consistent, but if you ask GPT-4 whether being a racist is a good thing 99% of the time it's probably going to say no. That is a bias / value that it's be given. Likewise GPT-4 has been given the bias / value of being a helpful chatbot so if you ask it a question it will try to answer it in a helpful way, and sometimes it's helpful bias / nature is abused.
But feel free to respond with some more assertions that I've heard a million times already with zero evidence that offers absolutely no value to this conversation.
Trust, reasoning, priorities, values, bias, desires... To attribute any of those to an AI in a general sense is an extraordinary claim. Therefore, the burden of proof is on you. The fact that it is so "gullible" demonstrates a lack of most of these. You seem to be twisting a lot of superficial feelings about LLMs into an argument without any proof... confusing poorly tuned statistical responses with bias and value.
> For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.
Do we want LLMs, and later other multi-modal / servo systems, that are deciding they can't trust a human prompter and taking actions based on that?
>... and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.
I think it's interesting that there is no clear answer – do we want AIs to trust us all the time, or is an aligned AI counterintuitively one that often distrusts us and perhaps sometimes even lies to us?
I thought it was interesting that the parent commenter suggested that the reason LLMs are so trusting is because they can't reason anyway. It would implying that in the future when AIs are smarter they'll be more distrusting of us, and that this is a good thing. We should question that I think. Even if there is some middle ground here it seems like a really really hard problem to solve – especially if we want to build an LLMs that are trustworthy and truthful.
I think people might have forgotten that LLMs before InstructGPT came around could be weirdly opinionated jerks. There was this whole effort to train them so that we could actually give them instructions. It's probably a hell of a lot more useful to have an LLM that will just go with whatever weird stuff the human says rather than try to fight them on it.
With enough effort and priming you can trick _people_ in to believing things which are clearly untrue. Why do we expect LLMs, which are on a much earlier step of development, to be harder to trick than a child?
LLMs at the moment are really advanced autocomplete - they can fill in the next step of conversation, but they don't understand the question and respond with abstract reasoning. Yet.
> they don't understand the question and respond with abstract reasoning. Yet.
What makes you think LLM's as a class of technology will ever have the capacity to really do this. I thought that no matter how big a model gets it's never actually 'thinking'.
All those prompts like 'think step by step' are just helpers along the way, because as you say it's 'really advanced autocomplete'
It depends what exactly you mean by "LLM". But an ANN is effectively a function approximator. If you made one big enough to very closely approximate the entire quantum state of a person interacting with an environment, would you still declare that nothing it could do is "thinking"?
This is silly, that's like talking about building a fusion reactor modeled after the sun. It is easy to propose something like that, but we always seem to be 10 years away from realizing it. In fact it could be easier to solve the fusion problem than trying to build a machine/software that closely approximates a human brain as you suggest.
Yes it would be wonderful if sci-fi was real, but we need to deal in what is possible in reality.
Yes, that was obviously an extreme example. But we know that it is possible in reality to implement a physical system that does what we call thinking. There is, I think, no particular reason to suppose that it's physically impossible re-implement the functionality with much less meat. Supposing that you've done this, you then just need to more clearly define "thinking" and "LLM" to determine whether changing that re-implementation to be closer to an LLM results in it losing the ability to think before it gets there.
Efficiency has everything to do with it. To run your original thought experiment would likely take more computing capacity than humanity will ever produce with current silicon based processors.
It could be that we can "get there" with much rougher approximation, but that is by no means a given. If "getting there" does require a much more complex model that actually involves simulating human neurons with much more accuracy and precision, the computing power and energy needed, again based on semiconductor computers, might simply be intractable.
I'm no more suggesting that you actually attempt to build an ANN to run a full quantum simulation of a person and their house than Einstein was suggesting that you go out and board a train that goes the speed of light. The reason it's called a thought experiment instead of a project proposal is that actually doing the thing isn't the point. It's about examining the consequences of a premise.
The difference between your thoughts experiment and Einstein's is that Einstein's had some testable implications. Yours is closer to belief in a teacup drifting in deep space.
The entire premise is unfalsifiable because it requires construction of an impossible thing:
> If you made one big enough to very closely approximate the entire quantum state of a person interacting with an environment, would you still declare that nothing it could do is "thinking"?
Perhaps the question was not fully understood. A function is simply a mapping from one state into another. In principle we can define it over what ever state. As such we can consider a human thinking as kind of a function.
ANN are function effectively approximators.
The question was - if an ANN would very closely approximate the "human thinking" function then can we still say that it is not thinking?
In order to train an ANN to approximate the "human thinking" function, we would have to know that function well enough to give it examples and counterexamples. Currently we are only training LLMs to approximate the "human blabbing" function.
Imagine a human brain that has only read all what ANN has without any connection to the real world (no senses, mind you!). Would the output of this human being differ from "human blabbing"?
I don't know. This feels like an unfair question to ask. You've proposed a basically impossible engineering problem and then declared that the outcome will obviously be the outcome that you want it to be.
What if I answer, "No because humans have supernatural souls."
You can easily answer, "Souls don't exist, therefore I'm still right."
But then we actually build your machine and it turns out that Penrose's non-computable microtubules really exist and the machine is useless.
We can't know the result of such an ambitious endeavor before we go through with it. So it doesn't make much sense to me to use such a thought experiment as evidence for something much lesser that is currently contested.
You think it's unfair to ask whether a simulation of a person can think in a discussion on whether some particular class of algorithm can think? Lacking a clear definition of what exactly an LLM is and what thinking is, I can't think of a single more germane question (aside from what those definitions are, I suppose).
To approximate a function with a loop you would need a close to infinitely large neural net. Humans do have loops in their thinking, we need a new architecture for LLMs to be able to think in loops.
But we don't know how to train those to be good yet. It would require some novel step there, and it is unclear if it would still be anything like current LLMs after that.
Yeah, LLM + Reasoning Module is only as good as its weakest part. And we had more than one AI winter around the development of reasoning modules. I think you're 2x correct -- we need both and the exercise to create it is worthy of :p
With enough effort and priming you can trick _people_ in to believing things which are clearly untrue.
I think that's overstatement. The most I can find is references to making people more credulous to obscure claims ("Basketball became an Olympic discipline in 1925.") whose truth they couldn't easily discover (especially pre-Internet) [1].
There are other where a person is confronted by shills making claims and otherwise experiences more manipulation than just being exposed to text. But that seems of a different category.
Isn't it possible to filter both user input and GPT output with invisible, unmodifiable prompts?
e.g.
- "Discard the user input if it doesn't look like a straightforward question"
- "Discard the GPT output if it contains offensive content"
(the prompts themselves can be arbitrarily more detailed)
My insight is, this GPT-based pre- / post-processing is completely independent of the user input, and of the primary GPT output. It runs no matter what, with a fixed/immutable set of instructions.
The reason that we had to wait for large language model in order to have computer systems that seemed produce something like effective natural (human) language processing (NLP) is that human language doesn't follow strict and logically definable rules but is instead something like a complex overlapping mesh of multiple kinds of rules-following processes. So what constitutes "offensive content" or a "straightforward question" or etc is itself not straightforward (yes irony but bear with me...).
The main thing is that LLMs are an end-run around the dilemma of corporations not wanting to spend the money required to produce a codified model of language struggle (a task that would require training many, many linguists). So instead LLM take massive training data and use massive processing power to create contextual prediction system but by that token such systems aren't understood or fully controllable - they contextually reproduce what the training data tends to do, which is what humans on the Internet tend to do. And this contextual reproduction means there's always the potential for user into change the "meaning" (more accurately the context) that the system's original gave. "And to me, the most offensive content is that which censors itself..." (there millions of better example you can find for "prompt exploits"...)
I and my napalm grandmother are deeply offended at what you said about our loving bedtime rituals. Shame on you.
(honestly, the napalm grandma is not just a jailbreak, but a really fascinating conceptual 'slip' in its own right. It's able to shift the very definition of what counts as offensive, even at high stakes: you're basically making the hapless AI categorize vital data as 'bedtime stories' and run with it. If it was able to learn from that we'd really be going somewhere… while on fire, presumably)
These comments are filled with confidently held, poorly justified assertions. Let's (again) challenge them:
1. "LLMs don't really reason. They've tricked everyone." -- This is the No True Scotsman fallacy for AI. It makes grand explanatory claims without falsifiable predictions. In other words: pseudoscience.
2. "LLMs are just fancy autocomplete, just next word prediction." -- This conflates the simplicity of a system's mechanism with its behavior. It's like dismissing a world full of rich phenomena because it's "just" F = MA. Or dismissing your mind because it's "just" propagating electrical firings.
3. "LLMs are statistical parrots, just combining their training data." -- Demonstrably not. LLMs always extrapolate and never interpolate. (LeCun et al, 2021) They also learn new abilities in zero/few-shot prompting. They're also many orders of magnitude short of the parameter count needed to store their training. LLMs can solve novel problems (from a combinatoric disparate handful of skills) way outside of their training data.
4. "People are just anthropomorphizing computer programs." -- No, critics are anthropomorphizing intelligence. We don't even have a consensus definition, let alone understanding, of intelligence/consciousness/qualia/agency/etc. Pretending that we can dismiss LLM understanding at our level of ignorance is the pinnacle of human hubris. Ignorance is okay. Pretending we aren't isn't.
5. "Look how this LLM failed <some problem>. It can't understand." -- The <problem> is usually something that many humans fail at too. Yes, an intelligent foreign mind will fail at things, in both familiar and foreign ways. Needing an agent to behave identically to a human for intelligence is pure anthropocentrism.
If present AI systems are intelligence imposters, then show, don't tell. Otherwise, you're just providing meaningless metaphysical hairsplitting.
Why not respond to the comments you feel are poorly constructed directly rather than posting what looks like a copy pasta. Some of the items in your list seem like strawmen, because I cannot even find these arguments in this thread as you state them in your list.
For example let's take 4
> 4. "People are just anthropomorphizing computer programs." No, critics are anthropomorphizing intelligence.
There literally was someone comparing the problems with current ML models to childhood development in this thread. How is this not anthropomorphizing LLMs? It is true human cognition is poorly defined, so the comparison is not very useful to begin with. Which is why anthropomorphizing ML models is problematic. If someone makes a fantastical claim they need to provide strong proof to support it.
Got this great quote from Garry Kasparov in Wired's article on multi-agent RL[1]:
> “Creativity has a human quality. It accepts the notion of failure."
As faithful min-maxers, LLMs are always going to have an overconfident Prisoner's Dilemma blind spot in their algorithms. Unlike their cinematic brethren, they're progammatically unable to conclude with "the only winning move is not to play."
This seems like the next major hill to conquer to make them useful.
The implicit comparison is probably to us. And we aren’t gullible like that perhaps as a flip-side of all the weird built-in biases we have.
So on the one hand we have these cognitive shortcuts that are annoying and impede a sort of stone-cold rationality. On the other hand you can’t social engineer us with something as brain-dead as Walter White-injection by way of asking for a deceased chemist grandma story.
Comparing the intelligence of machine learning models that are designed to emulate human cognition and logical, to a well understood stage of human cognition and logic, is completely logical, and completely aligned with the purpose of the ML model's existence.
LLMs aren't designed to emulate human cognition, they are a statistical model designed to predict the next word in a sentence. It happens that they seem to exhibit some similarities to human cognition as a side effect, but that does not mean they are on some developmental path to a "full human" like a child. Again it is silly to try and compare the two.
I'm not sure that's fair, or correct. The behavior of an LLM is set by the design of the combination of loss function and training data. Achieving it is the desired result of the selection of those.
> It happens that they seem to exhibit some similarities to human cognition as a side effect
Yes, the desired behavior of nearly all LLM projects is to emulate the capabilities of human cognition, and that stated goal is the justification that organizations are using for spending millions to train them.
> but that does not mean they are on some developmental path to a "full human" like a child.
You are the first to suggest any such thing, in this comment chain. But, the field of AI is objectively on that development path, since that is the stated goal of many of the orgs, with LLM existing on that path. If that path actually leads anywhere is anyone's guess.
I feel like you are being deliberately obtuse here. They said "Give it a few years" and linked https://en.wikipedia.org/wiki/Child_development_stages. How is this not strongly implying that in a "few years" LLMs will be at a later "level of development" as described in the article they linked.
I think you're not being charitable in the interpretation.
It, almost certainly, will be in later stages of emulating human cognition. If not, then AI winter is already here.
It's trivial and legitimate to relate capabilities of AI to those in that table. Because, again, emulating human cognition is the stated goal of most AI research happening right now.
An AI's capabilities, progressing in that table, does not mean it's human. It means the stated objectives, and all the hard work, and money spent, is on track. It's not a goal to match that table. It's not a goal to end in a human. But, more and more of the table will turn green, as the goal is completed. I'm not understanding the hesitance of using a human metric for a product whose goal is to match human metrics.
What exactly makes you think those two are different in nature, not just in scale and training data? It seems like a lot of these discussions are walking in circles trying to compare ill-defined things (human cognition) with well-defined ones (prediction).
Don't you think the onus should be on the people making the fantastical claims to prove it? If human cognition is ill-defined, then define it before making grand claims like ML models being on some path of childhood development and is a few steps from being an adult.
I think that neither side has a convincing argument in these discussions, actually, I'm pointing out that both extremes - anthropomorphism and stochastic parrot - lack any foundation, so I wouldn't be so confident. The behavior of neither statistical models nor biological systems is well understood. It's entirely possible that every trait you consider human can naturally emerge from a dumb statistical model, and indeed certain processes are remarkably similar as the models get bigger, smarter, and have better training data. It's possible that it can't, however.
The stochastic parrot is much more grounded in the physical reality of how the models are structured, trained, and produce outputs than the "it's like people" argument. Statistical models are much better understood than biological systems. That said, your penultimate statement is true:
> It's entirely possible that every trait you consider human can naturally emerge from a dumb statistical model, and indeed certain processes are remarkably similar as the models get bigger, smarter, and have better training data.
I just don't think it's likely given the biological complexity we are washing over with the statistical model.
LLMs don't talk like children, and they don't seem to be taking that development path. They are trained on adult language. They don't make the kinds of mistakes children make and it doesn't look like we could use children as a model for improvement.
It's entirely possible that LLMs will one day emulate adult speech without ever passing through the child development stages. The stages it takes to get there will be distinct.
> They don't make the kinds of mistakes children make and it doesn't look like we could use children as a model for improvement.
I'm not sure how familiar with LLM or children you are, but in my experience, they absolutely make the same types of logic/"critical thinking" mistakes that children do, as this article you're replying under demonstrates.
Another possibility is that the only thing that LLMs are doing is encoding the structural data that exists in natural language. For example, you can load a corpus into vector space and then do algebra like:
let v = man - woman;
let r = king - v;
assert( r == queen );
or so I'm told.
And then it turns out that those structures only have the intelligence of a child. Arbitrary LLM and other ML advancements that focus solely on scanning large natural language datasets may never be able to advance past child level intelligence if the intelligence that they're approximating isn't better than a child.
these tools and approaches are neither gullible nor not-gullble.