I would guess that you assume environment is changed most of the time, because a footage where it changes gets more attention than a footage where it doesn’t. There are a lot of cams with virtually nothing changing in the view between people passing.
Also, if everyone changes the environment binary search would give lots of false detections in case you don’t know what exactly to expect (like when you mentioned toppling a trash can)