Training Language Models to Self-Correct via Reinforcement Learning - DeepMind paper
Google DeepMind's SCoRe method uses reinforcement learning to train language models to effectively self-correct through iterative revision, significantly improving their accuracy and reliability....