An Adaptive Parallel Layer-Skipping Framework for Large Language Model Inference Speedup With Speculative Decoding

ZHE WEN, LIANG XU, MEIQIWANG

Integrated Circuits and Systems ›› 2025, Vol. 2 ›› Issue (2) : 58-66.

PDF(3032 KB)
PDF(3032 KB)
Integrated Circuits and Systems ›› 2025, Vol. 2 ›› Issue (2) : 58-66. DOI: 10.23919/ICS.2025.3575371
Co-Optimization for Large Language Models: Advances in Algorithm and Hardware

An Adaptive Parallel Layer-Skipping Framework for Large Language Model Inference Speedup With Speculative Decoding

    {{javascript:window.custom_author_en_index=0;}}
  • {{article.zuoZhe_EN}}
Author information +
History +

HeighLight

{{article.keyPoints_en}}

Abstract

{{article.zhaiyao_en}}

Key words

QR code of this article

Cite this article

Download Citations
{{article.zuoZheEn_L}}. {{article.title_en}}[J]. {{journal.qiKanMingCheng_EN}}, 2025, 2(2): 58-66 https://doi.org/10.23919/ICS.2025.3575371

References

References

{{article.reference}}

Funding

RIGHTS & PERMISSIONS

{{article.copyrightStatement_en}}
{{article.copyrightLicense_en}}
PDF(3032 KB)

Accesses

Citation

Detail

Sections
Recommended

/