0

Apple introduces Diffucoder: 7B diffusion LLM tailored to code generation

Diffusion LLM as a paradigm transfer for code generation

LLM has revolutionized natural language processing, with impressive results from conversations to code generation tasks. The mask diffusion model has become an alternative and extends it to diffusion-based LLMS such as LLADA and DREAM. The model iteratively parallels to draw the entire order, allowing global content planning. The diffusion LLM method is very suitable for code generation, as writing code often involves non-sequential round-trip simplification. However, it is not clear how open source diffusion LLM performs on coding tasks. This is because existing post-training work shows marginal benefits or reliance on semi-automatic wizard decoding, which deviates from the global planning nature of diffusion.

The evolution of text diffusion model and its impact on code synthesis

Early text diffusion models include mask diffusion models, and recent expansion work has produced diffusion LLMs such as Diffillama, Llada, and Dream. Block diffusion proposes a hybrid method for diffusion within each block. Multi-models such as Lavida, Mmada and Dimpl combine text diffusion models with visual models. In code generation, CodeFusion is the first to combine diffusion models with code generation, but only for small-scale models and simple tasks. Recent commercial scale diffusions of LLM, such as Mercury and Gemini, have shown comparable performance to leading autocyclotron code models. However, the RL approach using current DLLMs of GRPO (such as D1 and MMADA) depends on the block diffusion decoding during the outgoing and evaluation process.

Apple and HKU introduce Diffucoder: a dedicated diffusion model for code

Researchers from Apple and the University of Hong Kong proposed that Diffucoder is a 7B-scale masking diffusion model specially used for code generation, training on 130B valid tokens. Make it a valuable test bed for exploring diffusion-based LLM behavior and advancing post-training methods. The researchers introduced local and global autoregressive metrics to measure how to follow a left-to-right pattern. Analysis shows that diffusion LLMS exhibits entropy tank effect, resulting in a strong causal deviation that is conditionally generated. As the sampling temperature increases from 0.2 to 1.2, the differentiator becomes more flexible in the order of token generation, freeing itself from the strict left and right limits, and obtaining higher passes on 10 accuracy.

Four-stage training pipeline using oil refining and coupled GRPO

The researchers adjusted their model from QWEN-2.5 encoding to the base model and used Refinecode and STACKV2’s 400B token code pre-training corpus for continuous pre-training. The training consists of four stages: adaptation pre-training, mid-term training, annealing code data with 16B tokens, instruction adjustment using 436K SFT samples, and training using coupled GRPO with 21K hard samples from AceCoder-87K. After processing the 65B token, the early stops in phase 1. Phase 2 was trained for 4 periods, with a total of 65B tokens. The evaluation environment is built using BigCodeBench’s three code benchmarks (Humaneval, MBPP and Evalplus). They include full and hard subsets covering completed and directive-based query types.

Benchmark results: Performance and optimization insights from Diffucucoder

Diffucoder was trained on 130B code tokens, which is comparable to QWEN2.5 encoding and OpenCoder. However, compared with QWEN2.5-CODER+SFT, all DLLMs show only edge improvements to the base model after guidance adjustments, which can be significant improvements through guidance adjustments to the same data. Furthermore, coupled GRPO training exhibits strong effectiveness, while baseline variants (e.g., D1, sampling of comprehensive masking completion and decoupling) tend to exhibit unstable reward learning behaviors. The optimal sampling temperature for RL fine-tuning during evaluation increased from 0.2 to higher values, indicating that training can be gradually distributed. This reduces the model’s dependence on strict automatic back-decoding and enhances its ability to generate tokens in parallel.

The future of coupled GRPO and diffusion-based code models

In this article, the researchers propose Diffucoder, a 7B-scale open source diffusion model of code with powerful performance, as well as its complete training recipe and detailed analysis of code-generated DLLM. They further introduced coupled GRPO, an RL algorithm that respects the non-gravity properties of DLLM through coupled sampling techniques for more accurate probability estimates. Coupled GRPO improves the performance of Diffucoder and shows the effectiveness of the RL method consistent with the diffusion principle. This work provides the community with deeper insights into DLLM and provides a solid foundation for future research into its application in complex inference and generative tasks.


View the paper and code. All credits for this study are to the researchers on the project.

Ready to connect with 1 million+ AI development/engineers/researchers? See how NVIDIA, LG AI Research and Advanced AI companies leverage Marktechpost to reach target audiences [Learn More]


Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.