# Gadget decomposition for key switch A method[^2] for optimizing classic [[CKKS Key switching]] by decomposing the key switch polynomials on top of decomposition of the random part of the ciphertext. The primes used in this scheme are $q_0,...,q_{\ell-1}$ and special primes $q_\ell,...,q_{\tilde\ell-1}$ . They are the basis for the RNS modulo $\tilde Q$. define $\tilde D_j:=\prod_{k\in \tilde I_j}q_k$ where $\tilde I_j$ are partition of primes $q_0,...,q_{\tilde\ell-1}$ . note that here the grouping of primes is different that standard key switch and therefore the use of tilde in the notation. Let $U=(u_i)_{i=0}^{\tilde d-1}$ be switch key gadget polynomials and $a$ the random part polynomial of ciphertext to be key switched. compute $b:=([a]_{\tilde D_j})_{j=0}^{\tilde d-1}$ , $v_i:=([u_i]_{\tilde D_j})_{j=0}^{\tilde d-1}$. Its gadget vector is $\tilde g=(\tilde g_j)_{j=0}^{\tilde d-1}$ , $\tilde g\in R_{\tilde Q}^{\tilde d}$ such that $\tilde g_j=1\bmod \tilde D_j$ and $\tilde g_j=1\bmod \tilde D_{j^\prime}$ for $j\ne j^\prime$ i.e., $\langle v_i,g_i\rangle = u_i$. There are $d$ ![[Pasted image 20260216172219.png|300]] The inner product of decomposed polynomials are bounded as the gadget decomposition map is bounded by fixed $B^\prime$. Therefore, inner product can be done in the ring $R_{B^\prime}\ll R_{\tilde Q}$ . Here is the main trick of this work when computing in a smaller ring reduce the number of NTT transformation. $\tilde c_{i,j}:=\langle b,v_{i,j}\rangle$ for $i=0,1$ and $0\le j<\tilde d$ ,again- multiplication is done in NTT's of $R_{B^\prime}$. We have, $ \tilde c_i=\sum_{0\le j <\tilde d}\langle b,v_{i,j}\rangle\cdot\tilde g_j =\sum_{0\le j <\tilde d}\tilde c_{i,j}\cdot \tilde g_j \bmod \tilde Q $ observe that $\tilde c_i=\tilde c_{i,j}\bmod \tilde D_j$ fot all $0\le j <\tilde d$. Therefore, we have $\tilde c_i=\tilde c_{i,j}\bmod q_k$ for all $k\in\tilde I_j$ and the RNS representation of $\tilde c_i$ modulo $\tilde Q$ is $\big([\tilde c_{i,j}]_{q_k}\big)_{0\le j<\tilde d,k\in\tilde I_j}$ Note: no need to compute the multiplication with $\tilde g_j$ and sum instead the last step of computing the modulo $q_k$ are the RNS representation of the result. Note: consider using this method to reduce the numbers of eval keys in our lib (one per batch of qi) by storing only eval keys for Q and use the gadget decomp to efficiently compute in most of the key switching in small qi's. When it comes to GPU implementation this work, the paper [^1] criticized it :"However, its improvement comes at the cost of increased computation overhead on EBConv and IP operations." and they address this issue. consider reading this work. ## Created 2026-02-10 16:01 [^1]: Athena: Accelerating KeySwitch and Bootstrapping for Fully Homomorphic Encryption on CUDA GPU [^2]: [Accelerating HE Operations from Key Decomposition Technique](https://eprint.iacr.org/2023/413.pdf)