Of course the parameters are different, but even if it were GCC or clang on both sides, the compiler back-end for arm64 vs x64 are also different. If we followed your argument to its logical conclusion, no cross-platform benchmarks should ever be performed since there will always be something different. I am approaching this from the perspective of the end user (how long does it take me to get a working p7zip executable?).
That said, I attempted to make the comparison as close as possible by timing the time it takes to cross-compile Hugo for Windows/x64 (taking care to run it once to load the sources into the package cache, while clearing the object file cache each time). Thus using the exact same Go 1.20.1 compiler, the same code generator back-end, the same platform source code being compiled, and the results are much closer:
- Mac Studio (M1 Ultra, 128G): 9.7s
- Macbook Air (M1, 16G): 16.7s
- HP DevOne (5850U, 64G): 18.7s
- HP Z2 Mini G4 (Xeon E-2176G, 32G): 20.7s
- Gigabyte Aorus 17G (i7-11800H, 64G): 16.5s
So you are right, somehow GCC's optimizer at the -O2 level is much slower on x64 laptops than x64 desktop or arm64. Not sure why compiling p7zip is so much faster on the Xeon than on the Ryzen or Aorus, gcc is 12.2 on both and the optimizer setting is -O2, no -march=native. The only explanation I can think of is thermal throttling, since both slow machines are laptops, and somehow GCC is triggering that in a way Go's compiler is not (nor apparently clang).
That said, I attempted to make the comparison as close as possible by timing the time it takes to cross-compile Hugo for Windows/x64 (taking care to run it once to load the sources into the package cache, while clearing the object file cache each time). Thus using the exact same Go 1.20.1 compiler, the same code generator back-end, the same platform source code being compiled, and the results are much closer:
- Mac Studio (M1 Ultra, 128G): 9.7s
- Macbook Air (M1, 16G): 16.7s
- HP DevOne (5850U, 64G): 18.7s
- HP Z2 Mini G4 (Xeon E-2176G, 32G): 20.7s
- Gigabyte Aorus 17G (i7-11800H, 64G): 16.5s
So you are right, somehow GCC's optimizer at the -O2 level is much slower on x64 laptops than x64 desktop or arm64. Not sure why compiling p7zip is so much faster on the Xeon than on the Ryzen or Aorus, gcc is 12.2 on both and the optimizer setting is -O2, no -march=native. The only explanation I can think of is thermal throttling, since both slow machines are laptops, and somehow GCC is triggering that in a way Go's compiler is not (nor apparently clang).