This is partly wrong. For example, it's important how the fields are organized in a structure, and only the developer knows what belongs together and only they know the access patterns across threads (or at least should know). This is far from being a micro-optimization.
If I do it wrong, are we talking about my application (say, some web app) becoming noticeably slower? Or we would need to be running several million operations per second before anything a user could notice the difference?
I suspect only very high end games, image processing apps and that kind of thing would ever need to care about the order of fields in a struct... perhaps the author of my web server as well, but even that only if I have a ridiculously high load on my server (which I don't, almost no one does).
So, no I don't think I should know that until I develop one of those highly niche applications myself.
> are we talking about my application (say, some web app) becoming noticeably slower? Or we would need to be running several million operations per second before anything a user could notice the difference?
If you have an algorithm with quadratic or higher complexity somewhere you will be in the millions even if you only have a few thousand entries to deal with. Now the worst case cache miss (down to ram) is modeled as roughly 1000 cycles. Put that right in the middle of that O(n^2) algorithm and your web app will be able to handle 1 request per second on a good day.
Maybe you shouldn't be doing O(n^2) inside a request handler in the first place. This has nothing to do with caches.
And even if you do quadratic operations in your handler, how often do you write a new such handler? Most of the time you work on stuff around that, supporting infra, testing, etc. None of those needs to be cache optimized either.
It was mostly meant as a example of millions of operations even at a small scale. Sadly I have seen way too much accidental O(n^3) code and not all of it could be "fixed" trivially.
> This has nothing to do with caches.
Would you rather handle 1000 request per second or 1? Caching has the potential to make an already bad situation so much worse.
> Would you rather handle 1000 request per second or 1?
Would you rather like to approach this problem by optimizing memory layout which, let's be honest, tends to give you more in the region of 1-10% improvements rather than 10-100x, or would you rather like to try to go from O(n^3) to O(n^2) or O(n log n) or similar?
The real meat is in the complexity class. Cache effects are where you go when there is nothing else to optimize and it actually matters. Preemptively making all your data structures 2-3 as big because of not-actually-necessary padding does not sound right to me, but YMMV.
How many of your structs does your code access million times a second? For most programs, that number is firmly 0.
While you are optimizing your struct layout and decrease you app start time from 1.122765 seconds to 1.122764 seconds, I ship production code at 2x the rate because I only optimize for performance where it matters while otherwise optimizing for maintainability, testability, extensibility, dev fun and actually shipping.