So, what are investors thinking to warrant this? If it is 'DeepSeek means you don't need the compute' that is definitely wrong. Making a more efficient x almost always leads to more of x being sold/used, not less. In the long term does anyone believe we won't keep needing more compute and not less?
I think the market believes that high end compute is not needed anymore so the stuff in datacenters suddenly just became 10x over-provisioned and it will take a while to fill up that capacity. Additionally, things like the mac and AMD unified memory architectures and consumer GPUs are all now suddenly able to run SOTA models locally. So a triple whammy. The competition just caught up, demand is about to drop in the short term for any datacenter compute and the market for exotic, high margin, GPUs might have just evaporated. At least that is what I think the market is thinking. I personally believe this is a short term correction since the long term demand is still there and we will keep wanting more big compute for a long time.
But the SOTA models basically all suck today. If people don’t think they suck, definitely in 1 year they’ll look back and consider those older models unusably bad
I recently went to the LLM chat arena and tried my "test input" against the latest frontier models that GPT 3 failed on. This test snippet simply repeats the same four-letter word in a paragraph many times using all of its various possible meanings simultaneously. The request to the AI is to put the meaning of each usage of the word next to it in brackets.
None of the frontier models can do this perfectly. They all screw up to various degrees in various interesting ways. A schoolkid could do this flawlessly.
This is not some contrived test with bizarre picture puzzles as seen in ARC-AGI or testing obscure knowledge about bleeding-edge scientific research. It's simple English comprehension using a word my toddler knows already!
It does reveal the fundamental flaw in all transformer-based models: They're just shifting vectors around with matrices, and are unable to deal with many categories of inputs that cause overlaps or bring too many of the tokens too close to each other in some internal representation. They get muddled up and confused, resulting in errors in the output.
I see similar effects when using LLMs for programming: They get confused when there are many usages of the same identifier or keyword, but with some subtle difference such as being inside a comment, string, or in a local context where the meaning is different.
I suspect this will be eventually fixed, but I haven't seen any fundamental improvement in three years.
On the contrary, this is testing the LLMs on inputs they're supposed to be good at.
Fundamentally, this kind of problem is the same as language translation, text comprehension, or coding tasks. It just tests where the boundaries are of the LLM capabilities by pushing it to its limits.
I've noticed the LLMs bumping up against those very same limits in ordinary coding tasks. For example, if you have a prefix-suffix type naming convention for identifiers, depending on how the tokenizer splits these, the LLMs can either do very well or get muddled up. Similarly, they're not great at spotting small typos with very long identifiers because in their internal vector representations the correct and typo versions are very "close".
That's a known thing that would be in its training set.
I just made up my own thing that no AI model would have seen anywhere before.
It's pretty easy to create your own, just pick a word that is highly overloaded. It helps if it is also used as proper names, business names, place names, etc...
selling more does not necessarily mean you make more money. more efficiency could lead to less margins even if volume is higher.
moreover, even things are incredibly efficient, the bar to sufficiently good AI in practice (e.g. applications), might be met with commodity compute, pretty much locking nvidia out, who generally sells high margin high performance chips to whales.
I think the market believes that high end compute is not needed anymore so the stuff in datacenters suddenly just became 10x over-provisioned and it will take a while to fill up that capacity. Additionally, things like the mac and AMD unified memory architectures and consumer GPUs are all now suddenly able to run SOTA models locally. So a triple whammy. The competition just caught up, demand is about to drop in the short term for any datacenter compute and the market for exotic, high margin, GPUs might have just evaporated. At least that is what I think the market is thinking. I personally believe this is a short term correction since the long term demand is still there and we will keep wanting more big compute for a long time.