I read the abstract, and the connection to your title is a mystery. Are you using “grock” as in “transcendental understanding” or as Musk’s branded AI?
No c, just grok, originally from Stranger in a Strange Land. But a more technical definition is provided and expanded upon in the paper. Mystery easily dispelled!
We follow the classic experimental paradigm reported in Power et al. (2022) for analyzing “grokking”, a poorly understood phenomenon in which validation accuracy dramatically improves long after the train loss saturates. Unlike the previous templates, this one is more amenable to open-ended empirical analysis (e.g. what conditions grokking occurs) rather than just trying to improve performance metrics