Favorites - The reward function for an LLM is about generating a next word that is...

merc , 2 months ago

The reward function for an LLM is about generating a next word that is reasonable. It’s like a road-building robot that’s rewarded for each millimeter of road built, but has no intention to connect cities or anything. It doesn’t understand what cities are. It doesn’t even understand what a road is. It just knows how to incrementally add another millimeter of gravel and asphalt that an outside observer would call a road.

If it happens to connect cities it’s because a lot of the roads it was trained on connect cities. But, if its training data also happens to contain a NASCAR oval, it might end up building a NASCAR oval instead of a road between cities.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

h3ndrik 2 months ago