AI

Quantitative spatial utilization (QSur): A new type of post -training method, aiming to improve the efficiency of the Dayu Model (LLMS)


Quantitative after training (PTQ) Focus on reducing the size of the size and increasing the high language model (LLMS) to make them more practical in the real world. Such a model requires a large amount of data, but in the process of quantification, it has a strong or oblique and highly heterogeneous data distribution. This will inevitably expand the range of quantification, so that the general performance of the expression in most values ​​is not accurate and reduce the model accuracy. Although the PTQ method is designed to solve these problems, the challenge is still effectively allocating data in the entire quantitative space, thereby restricting the potential of optimization and obstacles in the resource constraint environment.

The current post -training quantification (PTQ) method of Dayu Model (LLMS) is concentrated in pure weight and weight activation quantification. The only way, such as GPTQ,,,,, AWQand OWQTry to reduce memory usage through minimum quantitative errors or solve activation abnormal values, but cannot fully optimize the accuracy of all values. Similar technology Satire and Satire# The use of random matrix and vector quantification, but it is still limited in terms of processing extreme data distribution. The aim of weight activation is to accelerate inference through quantification and activation. But the method likes smooth,,,,, zeroand Quarot Strive to manage the dominant position of activating abnormal values, causing errors in most values. Overall, these methods depend on inspiration and cannot optimize the data distribution of the entire quantitative field, thereby limiting performance and efficiency.

In order to solve the limitations of quantification (PTQ) method after inspiration, and indicators that lack evaluation and quantitative efficiency Nanjing university Houmo AI, and Southeast University Propose The concept of quantitative spatial utilization (QSur)Essence QSUR measures how to effectively lose weight and activate the distribution of quantitative space to provide quantitative foundation to evaluate and improve the PTQ method. Public system uses statistical characteristics (such as characteristic value decomposition and confidence oval shape) to calculate the high amount of weight and activation distribution. QSur analysis shows how linear and rotation conversion can affect quantitative efficiency by reducing the differences between channels and minimizing abnormal values ​​to improve performance.

Researchers proposed oersted The framework combines orthogonal and zoom conversion to optimize the weight and activation distribution of large language models. This method integrates the learning equivalent conversion pair of the diagonal scale and the orthogonal matrix to ensure the calculation efficiency while maintaining the equivalent of quantification. It can reduce overfitting without damaging the output of the original network during reasoning. oersted Use blocks to learn and transform globally LLM Block, using similar technology Light separation value initialization (WOMI) Used for effective initialization. This method reaches higher QsurReduce running overhead and improve quantitative performance in LLMS.

For the purpose of the evaluation, the researchers applied for oersted arrive camel family(Llama-1, LLAMA-2, and Llama-3) And use confusion assessment performance wikitext2 And nine zero -strike tasks. Compared with a similar method smooth,,,,, GPTQ,,,,, Quarotand Rotate,,,,, oersted Always surpass them, at least realize 99.5 % Floating point accuracy 4-16-16 Set And greatly narrow the performance gap. Llama-3-8b Only one occur 0.29-Plip insertion ZeroCompared with more than loss 1.55 The main points of others. In more difficult circumstances, obesity is better than rotating, and it has obtained a lot 6.53 point Llama-2 7b In the setting of 4-4-16. The KL-TOP loss function provides a better semantic and better fit to reduce noise, thereby improving performance and reducing the gap W4A4KV4 go through 32 %Essence These results indicate oersted It is more effective to handle and ensure that distribution is more fair.

Finally, the proposed method optimizes the data distribution in the quantitative space according to the QSUR measurement and loss function KL-TOP, thereby improving the performance of large language models. Compared with existing quantitative technologies, using low calibration data can reduce noise and reserved semantic richness, thereby achieving high performance in multiple benchmark tests. This framework can be used as the basis of future work. A process will be launched. This process will help improve quantitative technology and make the model more efficient for applications that need to be more efficient in resource constraints.


Check Paper. All the credit of this research is researchers at the project. Also, don’t forget to follow us twitter And join us Telegraph and LinkedIn GrOutEssence Don’t forget to join us 70K+ ML ZitiditEssence

Bleak [Recommended Read] Nebius AI Studio uses visual models, new language models, embedded and LoRa expansion (Promotion)

Post -quantitative spatial utilization (QSur): A new type of post -training method, which aims to improve the efficiency (LLMS) of the large -language model, first appears on MarkTechPost.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button