Learning Is Generalization

 

Grokking: Learning Is Generalization and Not Memorization





This article is divided into different sections where we will answer these questions:

What is grokking and why can help us to learn more about how neural networks learn
How does this elusive phenomenon originate?
Check the list of references at the end of the article, I provide also some suggestions to deepen the topics.

One of the most repeated concepts in neural networks is: that when the training loss converges to a low value, the neural network will no longer learn much. Yet in a 2021 study, they observed a strange phenomenon, which the authors called “grokking.” The model seems to reach plateaux, where low and stable training loss with poor generalization is observed, and then with further training, the model is capable of perfect generalization.

In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. (source)

So in this study, they show that although the model seems to have reached overfitting, validation accuracy sometimes suddenly increases to near-perfect generalization.

Post a Comment

0 Comments

Recent, Random or Label