That would certainly be doable. If the performance is better, we can certainly switch to using that for our benchmarks. The idea for the benchmarks is to compare to a "gold standard" — hence the fact that the best results are taken across all optimization levels. We could even take the best results across multiple C compilers to give ourselves the absolute hardest comparison :-)