Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really don't like that these models can be branded as Deepseek R1.


Well, Deepseek trained them?


Yes, but it would've been nice to call them D1-something, instead of constantly having to switch back and forth between Deepseek R1 (here I mean the 604B model) as distinguished from Deepseek R1 (the reasoning model and it's distillates.)


You can say R1-604b to disambiguate, just like we have llama 3 8b/70b etc.


These models are not of the same nature either. Their training was done in a different way. A uniform naming (even with explicit number of parameters) would still be misleading.


? Alexander is not Aristotle?!


you made my day!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: