> I mean the whole modern paradigm of "complex branch predictors, controlling wi...

> I mean the whole modern paradigm of "complex branch predictors, controlling wide front-ends, feeding schedulers with wide back-ends and hundreds or even thousands of instructions in flight".

Ah, the philosophy of having the CPU execution out of ordered, you mean.

> A simple in-order core simply can't extract that much parallelism

While yes, it is also noticable that it does not have data hazard because a pipeline simply doesn't exist at all, and thus there is no need for implicit pipeline bubble or delay slot.

> And accurate branch prediction is of limited usefulness when the pipeline is that short.

You can also use a software virtual machine to turn an out-of-order CPU into basically running in-order code and you can see how slow that goes. That's why JIT VM such as HotSpot and GraalVM for JVM platform, RyuJIT for CoreCLR, and TurboFan for V8 is so much faster, because when you compile them to native instruction, the branch predictor could finally kick in.

> like the Itanium > And we all know how badly the Itanium failed.

Itanium is not exactly VLIW. It is an EPIC [^1] fail though.

[1]: https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...