For example -CFG: Enhanced code generation by executing feedback in real time

LLM has made impressive progress in generating code for various programming tasks. However, they rely primarily on identifying patterns of static code examples rather than understanding the behavior of code during execution. This usually results in programs that look correct but fail at runtime. Although recent methods introduce iterative sophistication and self-deception, they often work in different steps, generated, tested and then revised. Unlike human programmers who are constantly based on real-time output snippets and tweaking, these models cannot continuously integrate execution feedback, limiting their effectiveness in generating true functional code.

The role of program synthesis and prompts in code generation

Program synthesis has long been used to evaluate LLM and automated code generation benchmarks such as MBPP, HumaneVal and Codecontests by testing models of various coding challenges. While prompt strategies such as small shots and business chains improve performance, newer approaches now combine the feedback loops of the output using tools or execution results. Some frameworks even assign tasks to multiple LLM agents, each solving different aspects of the problem. However, most methods still rely on simple decoding methods. Unlike traditional strategies, new guidance technologies such as CFG offer a more dynamic approach, but have not been widely used with real-time execution feedback.

Introducing EG-CFG: Tel Aviv University’s Executive Guidance Code Generation

Researchers at Tel Aviv University have launched EG-CFG, a new method for code generation that actively utilizes execution feedback during the generation process, a technique commonly used by human programmers. Instead of waiting until the end, EG-CFG evaluates part of the code written, leading the model to the correct and executable output. It uses Beam search to generate multiple code options, run them and integrate runtime results to influence the next step. This real-time feedback loop significantly improves performance across standard benchmarks such as MBPP, HumaneVal, and CodeContests, even surpasses closed source models, while also enabling efficient parallel inference and dynamic exploration.

How EG-CFG works: Real-time feedback conforms to beam search and AST analysis

The EG-CFG method improves code generation by using real-time execution feedback-guided language models. For a given programming task, it can generate partial code solutions and explore multiple continuities using Beam Search. The syntax of these continuities was checked using AST parsing and only valid continuation was performed on the test cases to collect detailed runtime traces, including variable states and errors. This feedback is then injected into the prompts of the model to inform future predictions. The guidance mechanism interpolates between the model’s standard output and the feedback suggestions, helping the model gradually improve its solution until it passes through all test cases.

Benchmark results: EG-CFG performs better than GPT-4 on HumaneVal and MBPP-ET, while Claude is better than Claude

The EG-CFG method was tested using two versions of the DeepSeek model: the local 1.3B parameter model and the larger V3-0324 model were tested through the API. It is evaluated in five code benchmarks: MBPP, HUMANEVAL, CODECONTESTS, MBPP-ET and HUMANEVAL-ET. On HumaneVal, EG-CFG with DeepSeek V3 correctly solved 90.1% of the tasks, outperforming GPT-4 (85.5%) and Claude 2 (83.2%). On MBPP-ET, it achieves 81.4% accuracy, setting a new benchmark. It is worth noting that the smaller 1.3b model also showed strong growth, with the growth rate of human events increasing from 46.3% to 61.7% when booted with EG-CFG. An ablation study confirmed the importance of components such as dynamic feedback and beam search in driving these results.

Conclusion: EG-CFG simulates human-machine debugging to improve code generation

In summary, the EG-CFG method introduces a new method of generating code using language models by combining real-time execution feedback during generation. Unlike traditional methods that rely on static patterns, EG-CFG simulates how human programmers test and perfect code. It uses Beam search to explore possible code completions, tests them with real inputs, and then generates them based on the results guided. This happens by line, ensuring that feedback is both structured and works. This approach also supports multiple agents working in parallel, thereby increasing efficiency. EG-CFG achieves the highest accuracy in standard benchmarks, showing strong results even on complex coding tasks and smaller models.

Check Paper and Github page. All credits for this study are to the researchers on the project.

Sponsorship Opportunities
Attract the most influential AI developers worldwide. 1M+ monthly readers, 500K+ community builders, unlimited possibilities. [Explore Sponsorship]

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.

For example -CFG: Enhanced code generation by executing feedback in real time

The role of program synthesis and prompts in code generation

Introducing EG-CFG: Tel Aviv University’s Executive Guidance Code Generation

How EG-CFG works: Real-time feedback conforms to beam search and AST analysis

Benchmark results: EG-CFG performs better than GPT-4 on HumaneVal and MBPP-ET, while Claude is better than Claude

Conclusion: EG-CFG simulates human-machine debugging to improve code generation

You may also like...

Leave a Reply Cancel reply

For example -CFG: Enhanced code generation by executing feedback in real time

The role of program synthesis and prompts in code generation

Introducing EG-CFG: Tel Aviv University’s Executive Guidance Code Generation

How EG-CFG works: Real-time feedback conforms to beam search and AST analysis

Benchmark results: EG-CFG performs better than GPT-4 on HumaneVal and MBPP-ET, while Claude is better than Claude

Conclusion: EG-CFG simulates human-machine debugging to improve code generation

You may also like...

Forecasting meta-advertising changes in 2025

Ancient plants show hope for memory loss

New study points out how universal anesthesia can make mice fall asleep

Leave a Reply Cancel reply