If you don’t understand how models use host caching, start there.
Outside of that, you’re asking people to simplify everything into a quick answer, and there is none.
ONNX is the “universal” standard, ensure you didn’t accidentally convert the input model into something else by accident, but more importantly, ensure when you run it and automatically convert, that the works are actually done on the GPU. ONNX defaults to CPU.