Umm, AKCTSHUALLY it gets more like O(n^2^) in parallel, assuming you’re using a physically achievable memory. There’s just a lot of criss-crossing the entries have to do.
Strassen’s algorithm gets O(n^2.8^) in serial by being terrible, and the weird experimental ones get O(n^2.3^), but the asymptotic benefits of Coppersmith-Winograd and friends only kick in if you’re a Kardashev III civilisation computing a single big product on a Dyson sphere.