Special Issue on Selected Papers from ICTA2023

A High-Speed Hardware Architecture of an NTT Accelerator for CRYSTALS-Kyber

  • JUNYAN SUN ,
  • XUEFEI BAI
Expand
  • School of Microelectronics, University of Science and Technology of China, Hefei 230026, China

JUNYAN SUN (Student Member, IEEE) received the B.S. degree in electronics science and technology in 2021 from the University of Science and Technology of China, Hefei, China, where he is currently working toward the M.S. degree. His research interests include hardware/hardware- assisted security and IC design.

XUEFEI BAI received the B.S. degree in geochemistry, and the M.S. and Ph.D. degrees in electronic science and technology from the University of Science and Technology of China (USTC), Hefei, China, in 1998, 2001, and 2008, respectively. He is currently a Lecturer with the School of Micro-electronics, USTC. His research interests include digital VLSI design and computer arithmetic.

Received date: 2024-02-27

  Revised date: 2024-05-06

  Accepted date: 2024-05-28

  Online published: 2024-11-27

Supported by

National Key R&D Program of China under Grant(2019YFB2204800)

Abstract

CRYSTALS-Kyber has emerged as a notable lattice-based post-quantum cryptography (PQC) scheme. As one of the four finalists in NIST’s PQC standardization round three, CRYSTALS-Kyber is the only encryption algorithm demonstrating superior performance compared to other algorithms. The number theoretic transform (NTT) is employed to optimize polynomial multiplication, which constitutes the most complex operation within CRYSTALS-Kyber. This study introduces a high-speed NTT accelerator architecture, featuring a novel butterfly unit and an efficient modular polynomial multiplier. The proposed accelerator utilizes a radix-4-based configurable NTT design, which is capable of executing both forward and inverse NTT operations on a unified architecture. When implemented on the Xilinx Virtex-7 FPGA platform, the proposed architecture achieves an acceleration of 1.02-2.30 times in terms of latency, a throughput improvement of 1.02-2.30 times, and an area throughput improvement of up to 3.30 times, relative to the prior works.

Cite this article

JUNYAN SUN , XUEFEI BAI . A High-Speed Hardware Architecture of an NTT Accelerator for CRYSTALS-Kyber[J]. Integrated Circuits and Systems, 2024 , 1(2) : 92 -102 . DOI: 10.23919/ICS.2024.3419562

[1]
P. W. Shor, “Algorithms for quantum computation: Discrete logarithms and factoring,” in Proc. IEEE 35th Annu. Symp. Found. Comput. Sci., Santa Fe, NM, USA, Nov. 1994, pp. 124- 134.

[2]
R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2, pp. 120- 126, Feb. 1978.

[3]
N. Koblitz, “Elliptic curve cryptosystems,” Math. Comp., vol. 48, no. 177, pp. 203- 209, Jan. 1987.

[4]
D. Moody, “Post quantum cryptography standardization: Announcement and outline of NIST’s call for submissions,” Feb. 2016. [Online]. Available:

[5]
G. Alagic et al. , “Status report on the second round of the NIST post-quantum cryptography standardization process,” Jul. 2020. [Online]. Available:

[6]
O. Regev, “On lattices, learning with errors, random linear codes, and cryptography,” in Proc. 37th Annu. ACM Symp. Theory Comput., Baltimore, MD, USA, May 2005, pp. 84- 93.

[7]
V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learning with errors over rings,” J. ACM, vol. 60, no. 6, Nov. 2013, Art. no. 43.

[8]
A. Langlois and D. Stehlé “Worst-case to average-case reductions for module lattices,” Des. Codes Cryptogr., vol. 75, no. 3, pp. 565- 599, Jun. 2015.

[9]
M. Andrzejczak, F. Farahmand, and K. Gaj, “Full hardware implementation of the post-quantum public-key cryptography scheme Round5,” in Proc. IEEE Int. Conf. ReConFigurable Comput., Cancun, Mexico , Dec. 2019, pp. 1- 2.

[10]
S. Kim, W. Jung, J. Park, and J. H. Ahn, “Accelerating number theoretic transformations for bootstrappable homomorphic encryption on GPUs,” in Proc. IEEE Int. Symp. Workload Characterization, Beijing, China , Oct. 2020, pp. 264- 275.

[11]
J. Xu, Y. Wang, J. Liu, and X. Wang, “A general-purpose number theoretic transform algorithm for compact RLWE cryptoprocessors,” in Proc. IEEE 14th Int. Conf. Anti-Counterfeiting Secur. Identification, Xiamen, China, Oct./Nov. 2020, pp. 159- 163.

[12]
A. C. Mert, E. Öztürk, and E. Savas¸, “FPGA implementation of a run-time configurable NTT-based polynomial multiplication hardware,” Microprocess. Microsyst., vol. 78, Oct. 2020, Art. no. 103219.

[13]
K. Derya, A. C. Mert, E. Öztürk, and E. Savas¸, “CoHA-NTT: A config- urable hardware accelerator for NTT-based polynomial multiplication,” Microprocess. Microsyst., vol. 89, Mar. 2022, Art. no. 104451.

[14]
W. Tan, B. M. Case, G. Hu, S. Gao, and Y. Lao, “An ultra-highly parallel polynomial multiplier for the bootstrapping algorithm in a fully homomorphic encryption scheme,” J. Sign. Process. Syst., vol. 93, no. 6, pp. 643- 656, Jun. 2021.

[15]
M. Bisheh-Niasar, R. Azarderakhsh, and M. Mozaffari-Kermani, “Instruction-set accelerated implementation of CRYSTALS-Kyber,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 11, pp. 4648- 4659, Nov. 2021.

[16]
A. C. Mert, E. Karabulut, E. Öztürk, E. Savas¸, and A. Aysu, “An extensive study of flexible design methods for the number theoretic transform,” IEEE Trans. Comput., vol. 71, no. 11, pp. 2829- 2843, Nov. 2022.

[17]
J. Sun, X. Bai, and Y. Kang, “An FPGA-based efficient NTT accelerator for post-quantum cryptography CRYSTALS-Kyber,” in Proc. IEEE Int. Conf. Integr. Circuits Technol. Appl., Hefei, China , Oct. 2023, pp. 142- 143.

[18]
M. Bisheh-Niasar, R. Azarderakhsh, and M. Mozaffari-Kermani, “High-speed NTT-based polynomial multiplication accelerator for post- quantum cryptography,” in Proc. IEEE 28th Symp. Comput. Arithmetic, Lyngby, Denmark, Jun. 2021, pp. 94- 101.

[19]
Y. Itabashi, R. Ueno, and N. Homma, “Efficient modular polynomial multiplier for NTT accelerator of CRYSTALS-Kyber,” in Proc. 25th Euromicro Conf. Digit. Syst. Des., Maspalomas, Spain, Sep. 2022, pp. 528- 533.

[20]
P. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” in Proc. Adv. Cryptol., Santa Barbara, CA, USA, Aug. 1986, pp. 311- 323.

[21]
P. L. Montgomery, “Modular multiplication without trial division,” Math. Comp., vol. 44, no. 170, pp. 519- 521, Apr. 1985.

[22]
P. Longa and M. Naehrig, “Speeding up the number theoretic transform for faster ideal lattice-based cryptography,” in Proc. 15th Int. Conf. Cryptol. Netw. Secur., Milan, Italy, Nov. 2016, pp. 124- 139.

[23]
D. T. Nguyen, V. B. Dang, and K. Gaj, “High-level synthesis in implementing and benchmarking number theoretic transform in lattice based post-quantum cryptography using software/hardware codesign,” in Proc. 16th Int. Symp. Appl. Reconfigurable Comput. Archit. Tools Appl., Toledo, Spain, Apr. 2020, pp. 247- 257.

[24]
Y. Xing and S. Li, “A compact hardware implementation of CCA- secure key exchange mechanism CRYSTALS-Kyber on FPGA,” IACR Trans. Cryptogr. Hardw. Embed. Syst., vol. 2021, no. 2, pp. 328- 356, Feb. 2021.

[25]
H. Nguyen and L. Tran, “Design of polynomial NTT and INTT accelerator for post-quantum cryptography CRYSTALS-Kyber,” Arab. J. Sci. Eng., vol. 48, no. 2, pp. 1527- 1536, Feb. 2023.

[26]
W. Guo and S. Li, “Highly-efficient hardware architecture for CRYSTALS-Kyber with a novel conflict-free memory access pattern,” IEEE Trans. Circuits Syst. I., Reg. Papers, vol. 70, no. 11, pp. 4505- 4515, Nov. 2023.

[27]
Z. Ye, R. C. C. Cheung, and K. Huang, “PipeNTT: A pipelined number theoretic transform architecture,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 69, no. 10, pp. 4068- 4072, Oct. 2022.

[28]
F. Yaman, A. C. Mert, E. Öztürk, and E. Savas¸, “A hardware accelerator for polynomial multiplication operation of CRYSTALS-Kyber PQC scheme,” in Proc. IEEE Des. Automat. Test Europe Conf. Exhib., Grenoble, France , Feb. 2021, pp. 1020- 1025.

[29]
W. Guo and S. Li, “Split-radix based compact hardware architecture for CRYSTALS-Kyber,” IEEE Trans. Comput., vol. 73, no. 1, pp. 97- 108, Jan. 2024.

[30]
C. Zhang et al. , “Towards efficient hardware implementation of NTT for Kyber on FPGAs,” in Proc. IEEE Int. Symp. Circuits Syst., Daegu, Korea , 2021, pp. 1- 5.

[31]
Z. Ni, A. Khalid, W. Liu, and M. O’Neill, “Towards a lightweight CRYSTALS-Kyber in FPGAs: An ultra-lightweight BRAM-free NTT core,” in Proc. IEEE Int. Symp. Circuits Syst., Monterey, CA, USA , May 2023, pp. 1- 5.

[32]
Xilinx, Inc., “Series FPGAs Data Sheet:Overview,” Sep. 2020. [Online]. Available:

Outlines

/