From exploration collapse to predictable limitations: Shanghai AI Laboratory proposes entropy-based scale law to enhance enhanced learning of LLMS
Recent advances in the inference-centric large language model (LLM) have expanded the reinforcement learning scope (RL) of narrow, task-specific applications, enabling broader generalization and reasoning capabilities. However, this shift introduces significant challenges, especially when...