README.md · PQC Memory Encryption for LLM KV Cache

1

# PQC KV Cache Encryption

2

3

![PQC Native](https://img.shields.io/badge/PQC-Native-blue)

4

![ML-KEM-768](https://img.shields.io/badge/ML--KEM--768-FIPS%20203-green)

5

![AES-256-GCM](https://img.shields.io/badge/AES--256--GCM-NIST%20SP%20800--38D-teal)

6

![License](https://img.shields.io/badge/License-Apache%202.0-orange)

7

![Version](https://img.shields.io/badge/version-0.1.0-lightgrey)

8

9

**Per-tenant, quantum-safe encryption for the LLM KV cache.** Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh **AES-256-GCM** envelope whose key is derived per session via **ML-KEM-768**, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.

10

11

## The Problem

12

13

Long-context LLM inference keeps past token activations in the **KV cache** - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:

14

15

- **Side-channel reads.** A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.

16

- **Cross-request leakage.** A bug in cache eviction or session routing can hand one tenant's intermediate state to another.

17

- **Harvest-now-decrypt-later.** Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.

18

- **Regulated workloads.** Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.

19

20

## The Solution

21

22

- **ML-KEM-768** derives a fresh 32-byte symmetric key per `TenantSession`. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to [`quantumshield`](https://github.com/dyber-pqc/quantumshield).

23

- **AES-256-GCM** encrypts every `KVCacheEntry`. One nonce per entry, AAD binds `EntryMetadata` + `sequence_number` + `key_len` so tampering with layer/position/sequence surfaces as a `DecryptionError`.

24

- **`TenantIsolationManager`** holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises `TenantIsolationError` before AES touches the bytes.

25

- **`KeyRotationPolicy`** rotates the per-session key after N entries or T seconds, resetting the sequence counter.

26

- **`KVAuditLog`** is append-only and records `encrypt`, `decrypt`, `rotate`, and `isolation-violation` events.

27

28

## Installation

29

30

```bash

31

pip install pqc-kv-cache-encryption

32

```

33

34

Development:

35

36

```bash

37

pip install -e ".[dev]"

38

```

39

40

## Quick Start

41

42

```python

43

import os

44

45

from pqc_kv_cache import (

46

CacheDecryptor,

47

CacheEncryptor,

48

EntryMetadata,

49

KVCacheEntry,

50

TenantIdentity,

51

establish_tenant_session,

52

)

53

54

# 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).

55

tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")

56

session = establish_tenant_session(tenant)

57

58

# 2. Wrap a KV cache entry in a signed envelope.

59

meta = EntryMetadata(

60

tenant_id=tenant.tenant_id,

61

session_id=session.session_id,

62

layer_idx=0,

63

position=12,

64

token_id=2048,

65

)

66

entry = KVCacheEntry(

67

metadata=meta,

68

key_tensor_bytes=os.urandom(64), # raw bytes of K vector

69

value_tensor_bytes=os.urandom(64), # raw bytes of V vector

70

)

71

enc = CacheEncryptor(session).encrypt_entry(entry)

72

73

# 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.

74

decrypted = CacheDecryptor(session).decrypt_entry(enc)

75

assert decrypted.key_tensor_bytes == entry.key_tensor_bytes

76

```

77

78

Multi-tenant with strict isolation:

79

80

```python

81

from pqc_kv_cache import TenantIsolationManager, TenantIsolationError

82

83

mgr = TenantIsolationManager()

84

mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))

85

mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))

86

87

alice_enc = mgr.encrypt("tenant-alice", alice_entry)

88

89

# Bob can NEVER decrypt Alice's entry, even when using his own valid session.

90

try:

91

mgr.decrypt("tenant-bob", alice_enc)

92

except TenantIsolationError:

93

print("blocked at the isolation boundary")

94

```

95

96

## Architecture

97

98

```

99

+-----------------------------+ +-----------------------------+

100

| Tenant Alice | | Tenant Bob |

101

| (client) | | (client) |

102

+--------------+--------------+ +--------------+--------------+

103

| |

104

| ML-KEM-768 handshake (per session) |

105

v v

106

+---------------------------------------------------------------------------+

107

| Inference Server (multi-tenant) |

108

| |

109

| TenantIsolationManager |

110

| +------------------------+ +------------------------+ |

111

| | TenantSession (alice) | | TenantSession (bob) | |

112

| | symmetric_key (32B) | | symmetric_key (32B) | |

113

| | next_sequence | | next_sequence | |

114

| | entries_encrypted | | entries_encrypted | |

115

| +----------+-------------+ +----------+-------------+ |

116

| | | |

117

| v v |

118

| CacheEncryptor / CacheDecryptor CacheEncryptor / CacheDecryptor |

119

| AES-256-GCM + AAD AES-256-GCM + AAD |

120

| + tenant-id enforcement + tenant-id enforcement |

121

| | | |

122

| v v |

123

| +---------------------+ +---------------------+ |

124

| | EncryptedEntry | | EncryptedEntry | |

125

| | (alice ciphertext) | | (bob ciphertext) | |

126

| +---------+-----------+ +---------+-----------+ |

127

| | | |

128

| +-----------+------------------+ |

129

| v |

130

| +---------------------------+ |

131

| | KV cache in GPU/host RAM | (only ciphertext lives here) |

132

| +---------------------------+ |

133

| |

134

| KeyRotationPolicy -- rotates session keys on entry count / age |

135

| KVAuditLog -- encrypt / decrypt / rotate / isolation-violation |

136

+---------------------------------------------------------------------------+

137

```

138

139

## Cryptography

140

141

| Primitive | Purpose | Algorithm |

142

| -------------------------- | ----------------------------------------------------------- | ------------- |

143

| Per-session key | Fresh 32-byte symmetric key per tenant session | ML-KEM-768 |

144

| Per-entry encryption | Confidentiality + integrity of K/V tensor bytes | AES-256-GCM |

145

| AAD binding | `EntryMetadata` + `sequence_number` + `key_len` -> tag | AES-GCM tag |

146

| Session-key derivation | SHA3-256 over KEM keypair bytes (production: Decapsulate) | SHA3-256 |

147

148

Signing and KEM keys are delegated to [`quantumshield`](https://github.com/dyber-pqc/quantumshield), which prefers real `liboqs` ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.

149

150

## Threat Model

151

152

| Adversary capability                                            | Coverage                                                                      |

153

| --------------------------------------------------------------- | ----------------------------------------------------------------------------- |

154

| Read KV cache pages for another tenant                          | All entries are AES-256-GCM encrypted; attacker sees only ciphertext.         |

155

| Replay a previously captured `EncryptedEntry`                   | `CacheDecryptor` tracks seen nonces and raises `NonceReplayError`.            |

156

| Tamper with `EntryMetadata` (layer_idx, position, tenant_id)    | AAD binding -> AES-GCM tag fails -> `DecryptionError`.                        |

157

| Submit another tenant's ciphertext through a valid session      | `TenantIsolationError` raised before AES touches bytes.                       |

158

| Long-lived session key exposure                                 | `KeyRotationPolicy` rotates on entry-count / age; sequence counter resets.    |

159

| Session outlives its TTL                                        | `SessionExpiredError` on every encrypt/decrypt after `expires_at`.            |

160

| Harvest-now-decrypt-later on the KEM handshake                  | ML-KEM-768 provides IND-CCA2 security under quantum adversaries.              |

161

| Orphaned tenant state after disconnect                          | `close_session()` drops the session and its key from memory.                  |

162

163

## Performance Considerations

164

165

This library is written in pure Python and is intended as the **cryptographic envelope** for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:

166

167

- A CUDA / ROCm kernel that operates on the K/V tensors in device memory.

168

- A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).

169

- A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.

170

171

The envelope formats (`EncryptedEntry`, AAD shape, `TenantSession` state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.

172

173

## API Reference

174

175

### `TenantIdentity`

176

`tenant_id: str`, `display_name: str = ""` — frozen dataclass identifying a tenant.

177

178

### `establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession`

179

Derive a fresh 32-byte symmetric key for `tenant` via ML-KEM-768 and return a `TenantSession`.

180

181

### `TenantSession`

182

Holds `symmetric_key`, `next_sequence`, `entries_encrypted`, `created_at`, `expires_at`. Methods: `is_valid()`, `check_valid()`, `consume_sequence()`, `rotate_key(new_key)`, `to_public_dict()`.

183

184

### `KVCacheEntry` / `EncryptedEntry` / `EntryMetadata`

185

`KVCacheEntry` holds `metadata`, `key_tensor_bytes`, `value_tensor_bytes`. `EncryptedEntry` holds `metadata`, `nonce` (hex), `ciphertext` (hex), `key_len`, `sequence_number`. `EntryMetadata` is frozen and carries `tenant_id`, `session_id`, `layer_idx`, `position`, `token_id`, `kv_role`.

186

187

### `CacheEncryptor(session)` / `CacheDecryptor(session)`

188

`encrypt_entry(KVCacheEntry) -> EncryptedEntry` and `decrypt_entry(EncryptedEntry) -> KVCacheEntry`. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.

189

190

### `KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)`

191

`should_rotate(session) -> (bool, RotationTrigger | None)` and `rotate(session) -> bytes` (new 32-byte key). `RotationTrigger` is `ENTRY_COUNT`, `TIME_ELAPSED`, or `MANUAL`.

192

193

### `TenantIsolationManager`

194

`create_session(tenant)`, `get_session(tenant_id)`, `encrypt(tenant_id, entry)`, `decrypt(tenant_id, enc)`, `close_session(tenant_id)`, `list_active_tenants()`.

195

196

### `KVAuditLog` / `KVAuditEntry`

197

`log_encrypt(...)`, `log_decrypt(...)`, `log_rotate(...)`, `log_isolation_violation(...)`, `entries(limit, tenant_id, operation)`, `export_json()`.

198

199

### Errors

200

All under `KVCacheError`: `TenantIsolationError`, `SessionExpiredError`, `DecryptionError`, `NonceReplayError`, `KeyRotationRequiredError`, `UnknownTenantError`.

201

202

## Why PQC Matters for the KV Cache

203

204

Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:

205

206

- **Healthcare (HIPAA):** 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.

207

- **Finance (SEC 17a-4, MiFID II):** 5-7 year retention on all communications with a client, including AI-assisted drafting.

208

- **Legal (privilege / e-discovery):** communications privilege only survives if the confidentiality chain is intact.

209

210

The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.

211

212

## Examples

213

214

- `examples/basic_kv_encryption.py` - single tenant, encrypt/decrypt 3 entries, inspect audit log.

215

- `examples/multi_tenant_isolation.py` - Alice and Bob co-resident, cross-tenant decrypt is rejected.

216

- `examples/key_rotation.py` - `KeyRotationPolicy` with `max_entries=5`, observe rotation mid-stream.

217

218

## License

219

220

Apache License 2.0 - see `LICENSE`.

221