README.md
13.2 KB · 221 lines · markdown Raw
1 # PQC KV Cache Encryption
2
3 ![PQC Native](https://img.shields.io/badge/PQC-Native-blue)
4 ![ML-KEM-768](https://img.shields.io/badge/ML--KEM--768-FIPS%20203-green)
5 ![AES-256-GCM](https://img.shields.io/badge/AES--256--GCM-NIST%20SP%20800--38D-teal)
6 ![License](https://img.shields.io/badge/License-Apache%202.0-orange)
7 ![Version](https://img.shields.io/badge/version-0.1.0-lightgrey)
8
9 **Per-tenant, quantum-safe encryption for the LLM KV cache.** Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh **AES-256-GCM** envelope whose key is derived per session via **ML-KEM-768**, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.
10
11 ## The Problem
12
13 Long-context LLM inference keeps past token activations in the **KV cache** - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:
14
15 - **Side-channel reads.** A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.
16 - **Cross-request leakage.** A bug in cache eviction or session routing can hand one tenant's intermediate state to another.
17 - **Harvest-now-decrypt-later.** Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.
18 - **Regulated workloads.** Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.
19
20 ## The Solution
21
22 - **ML-KEM-768** derives a fresh 32-byte symmetric key per `TenantSession`. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to [`quantumshield`](https://github.com/dyber-pqc/quantumshield).
23 - **AES-256-GCM** encrypts every `KVCacheEntry`. One nonce per entry, AAD binds `EntryMetadata` + `sequence_number` + `key_len` so tampering with layer/position/sequence surfaces as a `DecryptionError`.
24 - **`TenantIsolationManager`** holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises `TenantIsolationError` before AES touches the bytes.
25 - **`KeyRotationPolicy`** rotates the per-session key after N entries or T seconds, resetting the sequence counter.
26 - **`KVAuditLog`** is append-only and records `encrypt`, `decrypt`, `rotate`, and `isolation-violation` events.
27
28 ## Installation
29
30 ```bash
31 pip install pqc-kv-cache-encryption
32 ```
33
34 Development:
35
36 ```bash
37 pip install -e ".[dev]"
38 ```
39
40 ## Quick Start
41
42 ```python
43 import os
44
45 from pqc_kv_cache import (
46 CacheDecryptor,
47 CacheEncryptor,
48 EntryMetadata,
49 KVCacheEntry,
50 TenantIdentity,
51 establish_tenant_session,
52 )
53
54 # 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).
55 tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")
56 session = establish_tenant_session(tenant)
57
58 # 2. Wrap a KV cache entry in a signed envelope.
59 meta = EntryMetadata(
60 tenant_id=tenant.tenant_id,
61 session_id=session.session_id,
62 layer_idx=0,
63 position=12,
64 token_id=2048,
65 )
66 entry = KVCacheEntry(
67 metadata=meta,
68 key_tensor_bytes=os.urandom(64), # raw bytes of K vector
69 value_tensor_bytes=os.urandom(64), # raw bytes of V vector
70 )
71 enc = CacheEncryptor(session).encrypt_entry(entry)
72
73 # 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.
74 decrypted = CacheDecryptor(session).decrypt_entry(enc)
75 assert decrypted.key_tensor_bytes == entry.key_tensor_bytes
76 ```
77
78 Multi-tenant with strict isolation:
79
80 ```python
81 from pqc_kv_cache import TenantIsolationManager, TenantIsolationError
82
83 mgr = TenantIsolationManager()
84 mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))
85 mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))
86
87 alice_enc = mgr.encrypt("tenant-alice", alice_entry)
88
89 # Bob can NEVER decrypt Alice's entry, even when using his own valid session.
90 try:
91 mgr.decrypt("tenant-bob", alice_enc)
92 except TenantIsolationError:
93 print("blocked at the isolation boundary")
94 ```
95
96 ## Architecture
97
98 ```
99 +-----------------------------+ +-----------------------------+
100 | Tenant Alice | | Tenant Bob |
101 | (client) | | (client) |
102 +--------------+--------------+ +--------------+--------------+
103 | |
104 | ML-KEM-768 handshake (per session) |
105 v v
106 +---------------------------------------------------------------------------+
107 | Inference Server (multi-tenant) |
108 | |
109 | TenantIsolationManager |
110 | +------------------------+ +------------------------+ |
111 | | TenantSession (alice) | | TenantSession (bob) | |
112 | | symmetric_key (32B) | | symmetric_key (32B) | |
113 | | next_sequence | | next_sequence | |
114 | | entries_encrypted | | entries_encrypted | |
115 | +----------+-------------+ +----------+-------------+ |
116 | | | |
117 | v v |
118 | CacheEncryptor / CacheDecryptor CacheEncryptor / CacheDecryptor |
119 | AES-256-GCM + AAD AES-256-GCM + AAD |
120 | + tenant-id enforcement + tenant-id enforcement |
121 | | | |
122 | v v |
123 | +---------------------+ +---------------------+ |
124 | | EncryptedEntry | | EncryptedEntry | |
125 | | (alice ciphertext) | | (bob ciphertext) | |
126 | +---------+-----------+ +---------+-----------+ |
127 | | | |
128 | +-----------+------------------+ |
129 | v |
130 | +---------------------------+ |
131 | | KV cache in GPU/host RAM | (only ciphertext lives here) |
132 | +---------------------------+ |
133 | |
134 | KeyRotationPolicy -- rotates session keys on entry count / age |
135 | KVAuditLog -- encrypt / decrypt / rotate / isolation-violation |
136 +---------------------------------------------------------------------------+
137 ```
138
139 ## Cryptography
140
141 | Primitive | Purpose | Algorithm |
142 | -------------------------- | ----------------------------------------------------------- | ------------- |
143 | Per-session key | Fresh 32-byte symmetric key per tenant session | ML-KEM-768 |
144 | Per-entry encryption | Confidentiality + integrity of K/V tensor bytes | AES-256-GCM |
145 | AAD binding | `EntryMetadata` + `sequence_number` + `key_len` -> tag | AES-GCM tag |
146 | Session-key derivation | SHA3-256 over KEM keypair bytes (production: Decapsulate) | SHA3-256 |
147
148 Signing and KEM keys are delegated to [`quantumshield`](https://github.com/dyber-pqc/quantumshield), which prefers real `liboqs` ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.
149
150 ## Threat Model
151
152 | Adversary capability | Coverage |
153 | --------------------------------------------------------------- | ----------------------------------------------------------------------------- |
154 | Read KV cache pages for another tenant | All entries are AES-256-GCM encrypted; attacker sees only ciphertext. |
155 | Replay a previously captured `EncryptedEntry` | `CacheDecryptor` tracks seen nonces and raises `NonceReplayError`. |
156 | Tamper with `EntryMetadata` (layer_idx, position, tenant_id) | AAD binding -> AES-GCM tag fails -> `DecryptionError`. |
157 | Submit another tenant's ciphertext through a valid session | `TenantIsolationError` raised before AES touches bytes. |
158 | Long-lived session key exposure | `KeyRotationPolicy` rotates on entry-count / age; sequence counter resets. |
159 | Session outlives its TTL | `SessionExpiredError` on every encrypt/decrypt after `expires_at`. |
160 | Harvest-now-decrypt-later on the KEM handshake | ML-KEM-768 provides IND-CCA2 security under quantum adversaries. |
161 | Orphaned tenant state after disconnect | `close_session()` drops the session and its key from memory. |
162
163 ## Performance Considerations
164
165 This library is written in pure Python and is intended as the **cryptographic envelope** for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:
166
167 - A CUDA / ROCm kernel that operates on the K/V tensors in device memory.
168 - A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).
169 - A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.
170
171 The envelope formats (`EncryptedEntry`, AAD shape, `TenantSession` state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.
172
173 ## API Reference
174
175 ### `TenantIdentity`
176 `tenant_id: str`, `display_name: str = ""` — frozen dataclass identifying a tenant.
177
178 ### `establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession`
179 Derive a fresh 32-byte symmetric key for `tenant` via ML-KEM-768 and return a `TenantSession`.
180
181 ### `TenantSession`
182 Holds `symmetric_key`, `next_sequence`, `entries_encrypted`, `created_at`, `expires_at`. Methods: `is_valid()`, `check_valid()`, `consume_sequence()`, `rotate_key(new_key)`, `to_public_dict()`.
183
184 ### `KVCacheEntry` / `EncryptedEntry` / `EntryMetadata`
185 `KVCacheEntry` holds `metadata`, `key_tensor_bytes`, `value_tensor_bytes`. `EncryptedEntry` holds `metadata`, `nonce` (hex), `ciphertext` (hex), `key_len`, `sequence_number`. `EntryMetadata` is frozen and carries `tenant_id`, `session_id`, `layer_idx`, `position`, `token_id`, `kv_role`.
186
187 ### `CacheEncryptor(session)` / `CacheDecryptor(session)`
188 `encrypt_entry(KVCacheEntry) -> EncryptedEntry` and `decrypt_entry(EncryptedEntry) -> KVCacheEntry`. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.
189
190 ### `KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)`
191 `should_rotate(session) -> (bool, RotationTrigger | None)` and `rotate(session) -> bytes` (new 32-byte key). `RotationTrigger` is `ENTRY_COUNT`, `TIME_ELAPSED`, or `MANUAL`.
192
193 ### `TenantIsolationManager`
194 `create_session(tenant)`, `get_session(tenant_id)`, `encrypt(tenant_id, entry)`, `decrypt(tenant_id, enc)`, `close_session(tenant_id)`, `list_active_tenants()`.
195
196 ### `KVAuditLog` / `KVAuditEntry`
197 `log_encrypt(...)`, `log_decrypt(...)`, `log_rotate(...)`, `log_isolation_violation(...)`, `entries(limit, tenant_id, operation)`, `export_json()`.
198
199 ### Errors
200 All under `KVCacheError`: `TenantIsolationError`, `SessionExpiredError`, `DecryptionError`, `NonceReplayError`, `KeyRotationRequiredError`, `UnknownTenantError`.
201
202 ## Why PQC Matters for the KV Cache
203
204 Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:
205
206 - **Healthcare (HIPAA):** 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.
207 - **Finance (SEC 17a-4, MiFID II):** 5-7 year retention on all communications with a client, including AI-assisted drafting.
208 - **Legal (privilege / e-discovery):** communications privilege only survives if the confidentiality chain is intact.
209
210 The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.
211
212 ## Examples
213
214 - `examples/basic_kv_encryption.py` - single tenant, encrypt/decrypt 3 entries, inspect audit log.
215 - `examples/multi_tenant_isolation.py` - Alice and Bob co-resident, cross-tenant decrypt is rejected.
216 - `examples/key_rotation.py` - `KeyRotationPolicy` with `max_entries=5`, observe rotation mid-stream.
217
218 ## License
219
220 Apache License 2.0 - see `LICENSE`.
221