My man, there’s quite literally no depth information in video - and there’s no actual motion.
You can calculate how much a given block changes from frame to frame, and that’s it. You can then try to be clever and detect if this is a person, a ball, a network logo. But that’s it.
This is absolutely an universe away from DLSS (and now presumably FSR) frame generation, and to even suggest they’re the same is such a ridiculous statement I’m not even going to bother anymore.
Just the mere attempt at comparing a feature modern GPUs are finally being able to achieve with a simple algorithm running in the media decoder of a 4 core little ARM chip on a TV is laughable.