Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 2.36 KB

paper_20.md

File metadata and controls

11 lines (6 loc) · 2.36 KB

Dynamic Scoring Code Token Tree: A Novel Decoding Strategy for Generating High-Performance Code

Authors: Qu, Muzi and Liu, Jie and Kang, Liangyi and Wang, Shuai and Ye, Dan and Huang, Tao

Abstract:

Within the realms of scientific computing, large-scale data processing, and artificial intelligence-powered computation, disparities in performance, which originate from differing code implementations, directly influence the practicality of the code. Although existing works tried to utilize code knowledge to enhance the execution performance of codes generated by large language models, they neglect code evaluation outcomes which directly refer to the code execution details, resulting in inefficient computation. To address this issue, we propose DSCT-Decode, an innovative adaptive decoding strategy for large language models, that employs a data structure named 'Code Token Tree' (CTT), which guides token selection based on code evaluation outcomes. DSCT-Decode assesses generated code across three dimensions---correctness, performance, and similarity---and utilizes a dynamic penalty-based boundary intersection method to compute multi-objective scores, which are then used to adjust the scores of nodes in the CTT during backpropagation. By maintaining a balance between exploration, through token selection probabilities, and exploitation, through multi-objective scoring, DSCT-Decode effectively navigates the code space to swiftly identify high-performance code solutions. To substantiate our framework, we developed a new benchmark, big-DS-1000, which is an extension of DS-1000. This benchmark is the first of its kind to specifically evaluate code generation methods based on execution performance. Comparative evaluations with leading large language models, such as CodeLlama and GPT-4, show that our framework achieves an average performance enhancement of nearly 30%. Furthermore, 30% of the codes exhibited a performance improvement of more than 20%, underscoring the effectiveness and potential of our framework for practical applications.

Link: Read Paper

Labels: code generation, program synthesis, code model, code model training, source code model