It's not an optimization targeting Nvidia chips. It's an optimization of the technique through and through regardless of chip
But your point is well taken and perhaps both mine and GP's metaphors break down.
Either way, we saw massive spikes in demand for Nvidia when crypto mining became huge followed by a massive drop when we hit the crypto winter. We saw another massive spike when LLMs blew up and this may just be the analogous drop in demand for LLMs
You both seem to be talking past each other. There were a number of optimizations that made this possible. Some were with the model itself and are transferable, others are with the training pipeline and specific to the Nvidia hardware they trained on.
But your point is well taken and perhaps both mine and GP's metaphors break down.
Either way, we saw massive spikes in demand for Nvidia when crypto mining became huge followed by a massive drop when we hit the crypto winter. We saw another massive spike when LLMs blew up and this may just be the analogous drop in demand for LLMs