To find the absolute minimum you just multiply the number of parameters by the b...

To find the absolute minimum you just multiply the number of parameters by the bits per parameter, divide by 8 if you want bytes. In case 8 billion parameters of 4 bits each means "at least 4 billion bytes". For back of the napkin add ~20% overhead to that (it really depends on your context setup and a few other things but that's a good swag to start with) and then add whatever memory the base operating system is going to be using in the background.

Extra tidbits to keep in mind:

- A bits-per-parameter higher than the model was trained adds nothing (other than compatibility on certain accelerators) but a bits-per-parameter lower than the model was trained degrades the quality.

- Different models may be trained at different bits-per-parameter. E.g. 671 billion parameter Deepseek R1 (full) was trained at fp8 while llama 3.1 405 billion parameter was trained and released at a higher parameter width so "full quality" benchmark results for Deepseek R1 require less memory than Llama 3.1 even though R1 has more total parameters.

- Lower quantinizations will tend to run proportionally faster if you were memory bandwidth bound and that can be a reason to lower the quality even if you can fit the larger version of a model into memory (such as in this demonstration).