# CKKS Key switching
## Motivation (First try)
**First idea,** define $\text{swk}_{s_{in}\rightarrow s_{out}}:=(a^* s_{out}+s_{in}+e^*, a^*) \bmod Q$. This is an encryption of the plaintext $s_{in}$ w.r.t secret key $s_{out}$ modulo Q.
Let $ct$ be a [[R-LWE ciphertext (RLWE)]] $ct = (ct_0,ct_1)=(a\cdot s_{in}+m+e,a) \bmod Q$ a ciphertext of the plaintext $m$ w.r.t secret key $s_{in}$
The main idea is to subtract the term $a\cdot s_{in}$ from $a\cdot s_{in}+m+e$ by computing
$ct_0 \\ -a\cdot \text{swk}_0$ :
$\begin{array}{l}
a\cdot s_{in}+m+e \\
-a( a^*s_{out}+s_{in}+e^*) \\
\hline \rule{4cm}{0.4pt} \\
-aa^*s_{out} + m+e-ae^*
\end{array}$
defining a new ciphertext $(ct_0-a\cdot swk_0,-aa^*)\bmod Q$ which "encrypts" m w.r.t secret key $s_{out}$. note that as $a$ and $a^*$ are uniformly random than so is their multiplication.
#### But this will NOT work!
There are two problems:
- The switching key is not valid as the noise $e^*$ and the plaintext $s$ have both small norm and so $e*$ will "hide" the plaintext $s$.
- The output $ct^\prime$ is NOT a ciphertext of $m$ w.r.t secret key $s$. The problem here is that the error term $a e^*$ does not have a small norm and so $m +e^\prime$ may wrap around the modulo $Q$ i.e., this ciphertext is invalid and may not be decryptable.
**Second idea**, choose $P$ s.t. $|P|\approx |Q|$ and divide the noise $ae^*$ by $P$ but not divide the plaintext $m$
define $\text{swk}_{s_{in}\rightarrow s_{out}}:=(a^* s_{out}+P\cdot s_{in}+e^*, a^*) \bmod PQ$ . This corrects the first issue as now the message is $P\cdot s_{in}$
We raise $a \mod Q$ to $a \bmod QP$ (the coefficients of the latter have the same magnitude as the former - raising the modulo does not change the polynomial), compute $a\cdot \text{swk}_{s_{in}\rightarrow s_{out}}$ , divide by $P^{-1}$ and drop the modulo back to $Q$: $P^{-1}\cdot a\cdot \text{swk}_0=P^{-1}aa^*s_{out}+as_{in}+P^{-1}ae^* \bmod Q$
note that the rightmost term has the magnitude of $e^*$ as $|a|<Q$ and $|P|\approx|Q|$. This solves the second issue.
define $ct_{s_{in}\rightarrow s_{out}}:=ct_0 - \lfloor P^{-1}\cdot a\cdot \text{swk}_0\rceil\bmod Q$ :
The bx part of $ct_{s_{in}\rightarrow s_{out}}$ is:
$\begin{array}{l}
a\cdot s_{in}+m+e \bmod Q\\
- \lfloor a\cdot P^{-1} ( a^*s_{out}+P\cdot s_{in}+e^*)\rceil\bmod Q\\
\hline \rule{6cm}{0.4pt} \\
-P^{-1}aa^*s_{out} + m+ \underbrace{e-P^{-1}ae^* +e_{\text{round}}}_{\text{small error}}\bmod Q
\end{array}$
The "a" (random) part of $ct_{s_{in}\rightarrow s_{out}}$ is $-P^{-1}aa^*$
This ciphertext encrypts the original message $m$ w.r.t secret key $s_{out}$
> Notes:
> 1) division by $P$ of an element in ring $PQ$ result in an element with the same value divided by P (with possible rounding) in ring $Q$ e.g., $x=[x]_{15}+15y$ s.t. $[x]_{15}<15$ so $x/5=[x]_{15}/5+3y$ s.t. $[x]_{15}/5<3$ which means that $[x/5]_{3}= [x]_{15}/5$ .
> 2) Computing the modulo reduction to $Q_\ell$ requires converting the polynomial to coefficient representation, reduce the modulo of the coefficients and back to NTT
> 3) This idea caused a new problem - the encryption of switch key is computed modulo $PQ$ and $|P|\approx|Q|$ and the modulo size is limited for security constraints. This means we have half the levels for computing homomorphic operations.
**Third idea**. The problem with the 2nd idea is the half of the levels are taken by special primes and as security limits the number of levels then this reduce the number of effective levels for homomorphic computation. To solve this, the new idea is to reduce the size of $P$ by reducing the norm of $a$ (the uniform random element of a ciphertext $ct$). smaller $a$ means we need smaller $P$ that divides it and make its norm small when reducing the modulo to Q from modulo QP in the computation above.
1. Define switching keys for all $i$ by a set of ciphertexts. This is computed during scheme initialization. $\text{swk}^{(i)}_{s_{in}\rightarrow s_{out}}:=(a^*_i\cdot s_{out}+ \hat{q_i}\cdot P\cdot s_{in}+e^*_i, a^*_i) \bmod PQ$
2. Decompose $a$ by $\big([a\cdot\hat{q_0}^{-1}]_{q_0},\ldots, [a\cdot\hat{q_L}^{-1}]_{q_L}\big)$
Note by the CRT: $a=\sum_{j=0}^L [a]_{q_j}\cdot[\hat{q}_j^{-1}]_{q_j}\cdot\hat{q}_j=\sum_{j=0}^L [a\hat{q}_j^{-1}]_{q_j}\hat{q}_j \bmod Q = \langle ([a\hat{q}_j^{-1}]_{q_j})_j , (\hat{q}_j)_j\rangle\bmod Q$ where $\hat{q_j}:=\prod_{i\ne j}{q_i}$ and all components have small norm i.e., $\big|[a\cdot\hat{q_i}^{-1}]_{q_i}\big|< q_i$ for $i=0,\ldots,L$
3. Set $P$ to be a prime of size 64 bits i.e., $P>q_i$ for all $i$
4. mod-up of the decomposition $i$ of $a$: $[a\cdot\hat{q_i}^{-1}]_{q_i}$ to be modulo $P\cdot Q$
5. multiply the i-th switch key and the $i$-th decomposition of $a$, we have
$\begin{align}
\big[[a\cdot\hat{q_i}^{-1}]_{q_i}\big]_{PQ}\cdot \text{swk}^{(i)}_{s_{in}\rightarrow s_{out}} &=
[a\cdot\hat{q_i}^{-1}]_{q_i}\cdot \text{swk}^{(i)}_{s_{in}\rightarrow s_{out}} \bmod PQ \\
&= \big([a\cdot\hat{q_i}^{-1}]_{q_i}\cdot a^*_i\cdot s_{out} + [a\cdot \hat{q_i}^{-1}]_{q_i}\cdot \hat{q_i}\cdot
P\cdot s_{in}+ e^*_i[a\cdot\hat{q_i}^{-1}]_{q_i} , [a\cdot\hat{q_i}^{-1}]_{q_i} a^*_i\big)\bmod PQ
\end{align}$
7. mod_down to modulo $P$:
$\bigg(P^{-1}[a\cdot\hat{q_i}^{-1}]_{q_i}\cdot a^*_i\cdot s_{out} + [a\cdot \hat{q_i}^{-1}]_{q_i}\cdot \hat{q_i}\cdot
s_{in}+ P^{-1}\cdot e^*_i[a\cdot\hat{q_i}^{-1}]_{q_i} \ ,P^{-1}\ [a\cdot\hat{q_i}^{-1}]_{q_i} a^*_i\bigg)\bmod Q$
6. sum all results: $e_i^\dagger:=P^{-1}\cdot e^*_i[a\cdot\hat{q_i}^{-1}]_{q_i}$, $a_i^\dagger:=P^{-1}[a\cdot\hat{q_i}^{-1}]_{q_i}\cdot a^*_i$
> $t = \sum_{i=0}^L \big([a\cdot\hat{q_i}^{-1}]_{q_i}\cdot \text{swk}^{(i)}_{s_{in}\rightarrow s_{out}}\big)= \big(\sum_{i=0}^L a^\dagger_i\cdot s_{out}+ a\cdot s_{in}+ \sum_{i=0}^L e_i^\dagger,\sum_{i=0}^L a^\dagger_i\big)\bmod Q$
7. subtract from original ciphertext:
$ct_0- t=as_{in}+m+e-t=a^{\dagger\dagger}s_{out}+m+e+e^{\dagger\dagger}$
where $a^{\dagger\dagger}:= \sum_{i=0}^L a^{\dagger}_i$ and $e^{\dagger\dagger}:=\sum_{i=0}^L e_i^\dagger$
**Fourth idea**. The problem with the 3rd idea that you need more memory to store the extra switch keys and more computation is being done for each key switch. To balance this, the new idea is to use batches of $q_i