Pull out: Improve the robustness of GSM benchmarks by strengthening teaching LLMS abstract reasoning
Recent research shows that LLM, especially smaller LLM, often struggles with strong reasoning. They tend to perform well on familiar issues, but they can twist when...