README.md
| 1 | # PQC KV Cache Encryption |
| 2 | |
| 3 |  |
| 4 |  |
| 5 |  |
| 6 |  |
| 7 |  |
| 8 | |
| 9 | **Per-tenant, quantum-safe encryption for the LLM KV cache.** Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh **AES-256-GCM** envelope whose key is derived per session via **ML-KEM-768**, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event. |
| 10 | |
| 11 | ## The Problem |
| 12 | |
| 13 | Long-context LLM inference keeps past token activations in the **KV cache** - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory: |
| 14 | |
| 15 | - **Side-channel reads.** A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages. |
| 16 | - **Cross-request leakage.** A bug in cache eviction or session routing can hand one tenant's intermediate state to another. |
| 17 | - **Harvest-now-decrypt-later.** Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC. |
| 18 | - **Regulated workloads.** Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar. |
| 19 | |
| 20 | ## The Solution |
| 21 | |
| 22 | - **ML-KEM-768** derives a fresh 32-byte symmetric key per `TenantSession`. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to [`quantumshield`](https://github.com/dyber-pqc/quantumshield). |
| 23 | - **AES-256-GCM** encrypts every `KVCacheEntry`. One nonce per entry, AAD binds `EntryMetadata` + `sequence_number` + `key_len` so tampering with layer/position/sequence surfaces as a `DecryptionError`. |
| 24 | - **`TenantIsolationManager`** holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises `TenantIsolationError` before AES touches the bytes. |
| 25 | - **`KeyRotationPolicy`** rotates the per-session key after N entries or T seconds, resetting the sequence counter. |
| 26 | - **`KVAuditLog`** is append-only and records `encrypt`, `decrypt`, `rotate`, and `isolation-violation` events. |
| 27 | |
| 28 | ## Installation |
| 29 | |
| 30 | ```bash |
| 31 | pip install pqc-kv-cache-encryption |
| 32 | ``` |
| 33 | |
| 34 | Development: |
| 35 | |
| 36 | ```bash |
| 37 | pip install -e ".[dev]" |
| 38 | ``` |
| 39 | |
| 40 | ## Quick Start |
| 41 | |
| 42 | ```python |
| 43 | import os |
| 44 | |
| 45 | from pqc_kv_cache import ( |
| 46 | CacheDecryptor, |
| 47 | CacheEncryptor, |
| 48 | EntryMetadata, |
| 49 | KVCacheEntry, |
| 50 | TenantIdentity, |
| 51 | establish_tenant_session, |
| 52 | ) |
| 53 | |
| 54 | # 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key). |
| 55 | tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp") |
| 56 | session = establish_tenant_session(tenant) |
| 57 | |
| 58 | # 2. Wrap a KV cache entry in a signed envelope. |
| 59 | meta = EntryMetadata( |
| 60 | tenant_id=tenant.tenant_id, |
| 61 | session_id=session.session_id, |
| 62 | layer_idx=0, |
| 63 | position=12, |
| 64 | token_id=2048, |
| 65 | ) |
| 66 | entry = KVCacheEntry( |
| 67 | metadata=meta, |
| 68 | key_tensor_bytes=os.urandom(64), # raw bytes of K vector |
| 69 | value_tensor_bytes=os.urandom(64), # raw bytes of V vector |
| 70 | ) |
| 71 | enc = CacheEncryptor(session).encrypt_entry(entry) |
| 72 | |
| 73 | # 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay. |
| 74 | decrypted = CacheDecryptor(session).decrypt_entry(enc) |
| 75 | assert decrypted.key_tensor_bytes == entry.key_tensor_bytes |
| 76 | ``` |
| 77 | |
| 78 | Multi-tenant with strict isolation: |
| 79 | |
| 80 | ```python |
| 81 | from pqc_kv_cache import TenantIsolationManager, TenantIsolationError |
| 82 | |
| 83 | mgr = TenantIsolationManager() |
| 84 | mgr.create_session(TenantIdentity(tenant_id="tenant-alice")) |
| 85 | mgr.create_session(TenantIdentity(tenant_id="tenant-bob")) |
| 86 | |
| 87 | alice_enc = mgr.encrypt("tenant-alice", alice_entry) |
| 88 | |
| 89 | # Bob can NEVER decrypt Alice's entry, even when using his own valid session. |
| 90 | try: |
| 91 | mgr.decrypt("tenant-bob", alice_enc) |
| 92 | except TenantIsolationError: |
| 93 | print("blocked at the isolation boundary") |
| 94 | ``` |
| 95 | |
| 96 | ## Architecture |
| 97 | |
| 98 | ``` |
| 99 | +-----------------------------+ +-----------------------------+ |
| 100 | | Tenant Alice | | Tenant Bob | |
| 101 | | (client) | | (client) | |
| 102 | +--------------+--------------+ +--------------+--------------+ |
| 103 | | | |
| 104 | | ML-KEM-768 handshake (per session) | |
| 105 | v v |
| 106 | +---------------------------------------------------------------------------+ |
| 107 | | Inference Server (multi-tenant) | |
| 108 | | | |
| 109 | | TenantIsolationManager | |
| 110 | | +------------------------+ +------------------------+ | |
| 111 | | | TenantSession (alice) | | TenantSession (bob) | | |
| 112 | | | symmetric_key (32B) | | symmetric_key (32B) | | |
| 113 | | | next_sequence | | next_sequence | | |
| 114 | | | entries_encrypted | | entries_encrypted | | |
| 115 | | +----------+-------------+ +----------+-------------+ | |
| 116 | | | | | |
| 117 | | v v | |
| 118 | | CacheEncryptor / CacheDecryptor CacheEncryptor / CacheDecryptor | |
| 119 | | AES-256-GCM + AAD AES-256-GCM + AAD | |
| 120 | | + tenant-id enforcement + tenant-id enforcement | |
| 121 | | | | | |
| 122 | | v v | |
| 123 | | +---------------------+ +---------------------+ | |
| 124 | | | EncryptedEntry | | EncryptedEntry | | |
| 125 | | | (alice ciphertext) | | (bob ciphertext) | | |
| 126 | | +---------+-----------+ +---------+-----------+ | |
| 127 | | | | | |
| 128 | | +-----------+------------------+ | |
| 129 | | v | |
| 130 | | +---------------------------+ | |
| 131 | | | KV cache in GPU/host RAM | (only ciphertext lives here) | |
| 132 | | +---------------------------+ | |
| 133 | | | |
| 134 | | KeyRotationPolicy -- rotates session keys on entry count / age | |
| 135 | | KVAuditLog -- encrypt / decrypt / rotate / isolation-violation | |
| 136 | +---------------------------------------------------------------------------+ |
| 137 | ``` |
| 138 | |
| 139 | ## Cryptography |
| 140 | |
| 141 | | Primitive | Purpose | Algorithm | |
| 142 | | -------------------------- | ----------------------------------------------------------- | ------------- | |
| 143 | | Per-session key | Fresh 32-byte symmetric key per tenant session | ML-KEM-768 | |
| 144 | | Per-entry encryption | Confidentiality + integrity of K/V tensor bytes | AES-256-GCM | |
| 145 | | AAD binding | `EntryMetadata` + `sequence_number` + `key_len` -> tag | AES-GCM tag | |
| 146 | | Session-key derivation | SHA3-256 over KEM keypair bytes (production: Decapsulate) | SHA3-256 | |
| 147 | |
| 148 | Signing and KEM keys are delegated to [`quantumshield`](https://github.com/dyber-pqc/quantumshield), which prefers real `liboqs` ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise. |
| 149 | |
| 150 | ## Threat Model |
| 151 | |
| 152 | | Adversary capability | Coverage | |
| 153 | | --------------------------------------------------------------- | ----------------------------------------------------------------------------- | |
| 154 | | Read KV cache pages for another tenant | All entries are AES-256-GCM encrypted; attacker sees only ciphertext. | |
| 155 | | Replay a previously captured `EncryptedEntry` | `CacheDecryptor` tracks seen nonces and raises `NonceReplayError`. | |
| 156 | | Tamper with `EntryMetadata` (layer_idx, position, tenant_id) | AAD binding -> AES-GCM tag fails -> `DecryptionError`. | |
| 157 | | Submit another tenant's ciphertext through a valid session | `TenantIsolationError` raised before AES touches bytes. | |
| 158 | | Long-lived session key exposure | `KeyRotationPolicy` rotates on entry-count / age; sequence counter resets. | |
| 159 | | Session outlives its TTL | `SessionExpiredError` on every encrypt/decrypt after `expires_at`. | |
| 160 | | Harvest-now-decrypt-later on the KEM handshake | ML-KEM-768 provides IND-CCA2 security under quantum adversaries. | |
| 161 | | Orphaned tenant state after disconnect | `close_session()` drops the session and its key from memory. | |
| 162 | |
| 163 | ## Performance Considerations |
| 164 | |
| 165 | This library is written in pure Python and is intended as the **cryptographic envelope** for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in: |
| 166 | |
| 167 | - A CUDA / ROCm kernel that operates on the K/V tensors in device memory. |
| 168 | - A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP). |
| 169 | - A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests. |
| 170 | |
| 171 | The envelope formats (`EncryptedEntry`, AAD shape, `TenantSession` state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts. |
| 172 | |
| 173 | ## API Reference |
| 174 | |
| 175 | ### `TenantIdentity` |
| 176 | `tenant_id: str`, `display_name: str = ""` — frozen dataclass identifying a tenant. |
| 177 | |
| 178 | ### `establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession` |
| 179 | Derive a fresh 32-byte symmetric key for `tenant` via ML-KEM-768 and return a `TenantSession`. |
| 180 | |
| 181 | ### `TenantSession` |
| 182 | Holds `symmetric_key`, `next_sequence`, `entries_encrypted`, `created_at`, `expires_at`. Methods: `is_valid()`, `check_valid()`, `consume_sequence()`, `rotate_key(new_key)`, `to_public_dict()`. |
| 183 | |
| 184 | ### `KVCacheEntry` / `EncryptedEntry` / `EntryMetadata` |
| 185 | `KVCacheEntry` holds `metadata`, `key_tensor_bytes`, `value_tensor_bytes`. `EncryptedEntry` holds `metadata`, `nonce` (hex), `ciphertext` (hex), `key_len`, `sequence_number`. `EntryMetadata` is frozen and carries `tenant_id`, `session_id`, `layer_idx`, `position`, `token_id`, `kv_role`. |
| 186 | |
| 187 | ### `CacheEncryptor(session)` / `CacheDecryptor(session)` |
| 188 | `encrypt_entry(KVCacheEntry) -> EncryptedEntry` and `decrypt_entry(EncryptedEntry) -> KVCacheEntry`. Both enforce tenant-id match. Decryptor tracks nonces for replay protection. |
| 189 | |
| 190 | ### `KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)` |
| 191 | `should_rotate(session) -> (bool, RotationTrigger | None)` and `rotate(session) -> bytes` (new 32-byte key). `RotationTrigger` is `ENTRY_COUNT`, `TIME_ELAPSED`, or `MANUAL`. |
| 192 | |
| 193 | ### `TenantIsolationManager` |
| 194 | `create_session(tenant)`, `get_session(tenant_id)`, `encrypt(tenant_id, entry)`, `decrypt(tenant_id, enc)`, `close_session(tenant_id)`, `list_active_tenants()`. |
| 195 | |
| 196 | ### `KVAuditLog` / `KVAuditEntry` |
| 197 | `log_encrypt(...)`, `log_decrypt(...)`, `log_rotate(...)`, `log_isolation_violation(...)`, `entries(limit, tenant_id, operation)`, `export_json()`. |
| 198 | |
| 199 | ### Errors |
| 200 | All under `KVCacheError`: `TenantIsolationError`, `SessionExpiredError`, `DecryptionError`, `NonceReplayError`, `KeyRotationRequiredError`, `UnknownTenantError`. |
| 201 | |
| 202 | ## Why PQC Matters for the KV Cache |
| 203 | |
| 204 | Inference logs and intermediate conversation state are retained for 7+ years in regulated industries: |
| 205 | |
| 206 | - **Healthcare (HIPAA):** 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it. |
| 207 | - **Finance (SEC 17a-4, MiFID II):** 5-7 year retention on all communications with a client, including AI-assisted drafting. |
| 208 | - **Legal (privilege / e-discovery):** communications privilege only survives if the confidentiality chain is intact. |
| 209 | |
| 210 | The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer. |
| 211 | |
| 212 | ## Examples |
| 213 | |
| 214 | - `examples/basic_kv_encryption.py` - single tenant, encrypt/decrypt 3 entries, inspect audit log. |
| 215 | - `examples/multi_tenant_isolation.py` - Alice and Bob co-resident, cross-tenant decrypt is rejected. |
| 216 | - `examples/key_rotation.py` - `KeyRotationPolicy` with `max_entries=5`, observe rotation mid-stream. |
| 217 | |
| 218 | ## License |
| 219 | |
| 220 | Apache License 2.0 - see `LICENSE`. |
| 221 | |