Internal Coherence Maximization (ICM): LLMS’s Unlabeled, Unsupervised Training Framework

The training method of a pre-trained language model (LMS) specifies the desired behavior through demonstration or preference feedback, depending on human supervision. However, as task and model behavior become very complex, this approach faces critical limitations. In these cases, human supervision is unreliable when LMS learns to mimic errors in demonstrations or exploits inherent defects in feedback systems. The core challenge is that the task of training LMS exceeds the human ability to be in terms of reliability in demonstration or assessment. Recent research has identified various failure modes, including rewarding human-designed supervision signals or real humans themselves.
Limitations of human supervision after LLM training
Researchers explore multiple ways to expand human supervision. A standard approach utilizes high-quality verifiable rewards, such as model outputs matching basic truth solutions in the mathematical domain. Although there is evidence that the pre-trained base model has the potential for powerful downstream tasks and has added minimal improvements after training, effective heuristics are still challenging. The contrast consistent search (CCS) approach is an unsupervised heuristic that uses logical consistency to find potential knowledge without supervision. However, CCs perform poorly and knowledge is not generally recognized due to other prominent features that satisfy the consistency attribute.
Introducing internal coherence maximization (ICM)
Researchers from Humans, Schmidt Science, Independence, Constellation, New York University and George Washington University have proposed internal coherence maximization (ICM), which has carefully trained models with their own generated tags without using any provided tags. ICM solves this problem by searching for sets of tags that are both logically consistent and mutually predictable based on pre-trained models. Since optimal label set identification remains computationally unfeasible, ICM uses a search algorithm that simulates annealing-inspired to approximate the maximum target. Furthermore, this approach matches the performance of authenticity and GSM8K’s gold label training, and training on crowdsourcing human labels on alpacas is better than training.
How ICM algorithm works
The ICM algorithm follows a three-step iteration process: (a) the system sampled a new unlabeled example from the dataset for potential inclusion, (b) it determined the best label for this example while simultaneously addressing any logical inconsistencies, and (c) an example of the Algorithm cestuluthm a algorithm cestuu the Algorith insies the new Bike example, whether this example accepts the evaluation function of this new tag. ICMs were evaluated in three datasets: authenticity assessment, mathematical correctness of GSM8K verification, and the benefit and harmlessness of alpacas. The researchers used four baselines in the experiment: zero-fire, zero-fire (CHAT), gold-tag, and human-tag. In addition, two open models were used in the experiment, Llama 3.1 8b and 70b, and two proprietary models: Claude 3 Haiku and Claude 3.5 Haiku.
Benchmark performance and model comparison
In the superhuman ability-inspired task, the ICM matches gold supervision accuracy at 80%, performing better than the estimated 60% human accuracy. Using the reward model generated by ICM, the researchers successfully trained the assistant chatbot without human supervision. The unsupervised reward model had 75.0% accuracy on the reward board, while the human supervised alternatives trained on production data was 72.2%. Additionally, using unsupervised and human-supervised RM, both policies were trained in RL to create beneficial, harmless and honest assistants. The policy of receiving unsupervised RM training achieved a 60% victory. However, these policies still lag behind the publicly released Claude 3.5 Haiku, which has a 92% victory rate.
Conclusion and future prospects
This article introduces internal coherence maximization (ICM), an advancement in unsupervised LM for fine-tuned pre-trained models of self-generated labels. This method consistently complies with gold supervision performance and exceeds human supervision across GSM8K verification, authenticity and alpaca reward modeling tasks. However, limitations of ICM include dependence on conceptual significance in pre-trained models and the invalidity of long inputs due to context window constraints. As LMS goes beyond human assessment capabilities, ICM provides promising alternatives to traditional RLHF, ensuring alignment with human intentions without boundaries of human supervision.
Check Paper. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.
Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.
