I feel like Cunningham’s law can be leveraged here. If someone spots a mistake in a computer generated caption and there is an option to correct it, someone will correct it!
Any corrections could also be used to train (hopefully open source) speech-to-text software.