README.md · PQC-Hardened GPU Driver

1

# PQC GPU Driver

2

3

![PQC Native](https://img.shields.io/badge/PQC-Native-blue)

4

![ML-KEM-768](https://img.shields.io/badge/ML--KEM--768-FIPS%20203-green)

5

![ML-DSA-65](https://img.shields.io/badge/ML--DSA--65-FIPS%20204-green)

6

![AES-256-GCM](https://img.shields.io/badge/AES--256--GCM-NIST%20SP%20800--38D-teal)

7

![License](https://img.shields.io/badge/License-Apache%202.0-orange)

8

![Version](https://img.shields.io/badge/version-0.1.0-lightgrey)

9

10

**Post-quantum confidential computing for the PCIe bus.** Modern AI inference ships multi-gigabyte model weights and activations between CPU and GPU over PCIe. That bus is visible to the host OS, the hypervisor, and the kernel's GPU driver. A malicious host or curious operator can trivially snoop or rewrite the bytes in flight — and even the NVIDIA Confidential Computing story on Hopper (H100/H200) relies on classical crypto that a future CRQC can retroactively forge. This library is the **post-quantum cryptographic envelope** — ML-KEM-768 channel keys, AES-256-GCM per-transfer, ML-DSA driver attestation — that a confidential-inference framework plugs its real CUDA / ROCm / vendor primitives into.

11

12

## The Problem

13

14

GPU driver stacks (the NVIDIA kernel module, AMD `amdgpu`, the vendor's userland runtime) move tensor bytes between CPU RAM and device memory. In the default configuration:

15

16

- **Plaintext traverses PCIe.** A hypervisor with DMA introspection, a VFIO passthrough attacker, or a kernel-module rootkit can read every byte.

17

- **Driver modules are loaded with classical signatures.** NVIDIA's `modinfo` signatures and secure-boot chains rely on RSA / ECDSA that Shor's algorithm breaks.

18

- **Key establishment for confidential computing uses ECDH.** A recorded session today can be passively decrypted once a CRQC exists.

19

20

A confidential AI workload needs more than "the GPU is in a TEE": it needs a post-quantum envelope around every tensor hitting the bus, and a cryptographic check on every driver that touches those tensors.

21

22

## The Solution

23

24

- **ML-KEM-768** establishes a fresh 32-byte channel key between CPU and GPU at module-load time. In production this is `ML-KEM.Decapsulate` run on both sides; the library delegates to [`quantumshield`](https://github.com/dyber-pqc/quantumshield) for the keypair.

25

- **AES-256-GCM** encrypts every tensor transfer. The authentication tag binds the ciphertext to `TensorMetadata + sequence_number` via AAD, so tampering with either the bytes or the metadata surfaces as a `DecryptionError`.

26

- **ML-DSA-65** signs every GPU driver / kernel module before load. The verifier checks both the signature and an allow-list of trusted signer DIDs.

27

- **`ChannelSession`** enforces strictly monotonic sequence numbers and tracks recent nonces, rejecting replays.

28

- **Pluggable backends** — the library never touches real device memory. Backends do, using `cuMemcpy` / `cuMemAlloc` / `CUDA-IPC` or `hipMemcpy` / `hipMalloc` / `HIP-IPC`. An `InMemoryBackend` ships for tests.

29

30

## Installation

31

32

```bash

33

pip install pqc-gpu-driver

34

```

35

36

Development:

37

38

```bash

39

pip install -e ".[dev]"

40

```

41

42

## Quick Start

43

44

```python

45

from pqc_gpu_driver import (

46

InMemoryBackend,

47

TensorMetadata,

48

establish_channel,

49

)

50

51

# 1. Bring up an encrypted CPU<->GPU channel (ML-KEM-768 -> AES-256-GCM key).

52

cpu, gpu = establish_channel(cpu_side_label="inference-host",

53

gpu_side_label="h100-0")

54

55

# 2. Encrypt a tensor on the CPU side.

56

tensor_bytes = b"\x01" * 4096

57

meta = TensorMetadata(

58

tensor_id="layer_0.q_proj",

59

name="model.layers.0.self_attn.q_proj.weight",

60

dtype="float32",

61

shape=(1024,),

62

size_bytes=len(tensor_bytes),

63

)

64

enc = cpu.encrypt_tensor(tensor_bytes, meta)

65

66

# 3. Move the ciphertext through a GPU backend. Backends only carry bytes,

67

# they never see plaintext.

68

backend = InMemoryBackend()

69

handle = backend.upload(enc)

70

pulled = backend.download(handle)

71

72

# 4. Decrypt on the GPU side. AES-GCM verifies AAD + ciphertext + sequence.

73

plaintext = gpu.decrypt_tensor(pulled)

74

assert plaintext == tensor_bytes

75

```

76

77

Driver attestation before a kernel module is allowed to load:

78

79

```python

80

from quantumshield.identity.agent import AgentIdentity

81

from pqc_gpu_driver import (

82

DriverAttestationVerifier,

83

DriverAttester,

84

DriverModule,

85

)

86

87

driver_bytes = open("/lib/modules/.../nvidia.ko", "rb").read()

88

module = DriverModule(

89

name="nvidia.ko",

90

version="550.54.14",

91

module_hash=DriverModule.hash_module_bytes(driver_bytes),

92

module_size=len(driver_bytes),

93

)

94

95

vendor = AgentIdentity.create("nvidia-driver-signer")

96

attestation = DriverAttester(vendor).attest(module)

97

98

verifier = DriverAttestationVerifier(trusted_signers={vendor.did})

99

verifier.verify_or_raise(attestation, actual_module_bytes=driver_bytes)

100

```

101

102

## Architecture

103

104

```

105

+------------------------------+ +------------------------------+

106

| CPU (inference host) | | GPU (confidential device) |

107

| | | |

108

| ChannelSession (cpu) | ML-KEM-768 | ChannelSession (gpu) |

109

| symmetric_key <----------+ handshake/KDF +-----------> symmetric_key |

110

| next_send_seq | | last_recv_seq |

111

| | | _used_nonces_recent |

112

| | | | ^ |

113

| v | | | |

114

| AES-256-GCM encrypt | | AES-256-GCM decrypt + |

115

| + AAD(metadata || seq) | | AAD check + replay check |

116

| | | | ^ |

117

| v | | | |

118

| EncryptedTensor --+ | | EncryptedTensor |

119

| | | | ^ |

120

| v | | | |

121

| GPUBackend.upload() PCIe bus | GPUBackend.download() |

122

| | | (ciphertext | | |

123

| +--------+----> only) ------+--------+ |

124

+------------------------------+ +------------------------------+

125

126

DriverAttester --(ML-DSA-65 sign)--> DriverAttestation

127

|

128

v

129

DriverAttestationVerifier

130

- module hash check

131

- ML-DSA signature check

132

- trusted-signers allow-list

133

```

134

135

## Cryptography

136

137

| Primitive | Purpose | Algorithm |

138

| -------------------------- | ------------------------------------------------- | ------------------ |

139

| CPU/GPU channel key | Establish fresh 32-byte symmetric key | ML-KEM-768 |

140

| Per-tensor encryption | Confidentiality + integrity of tensor bytes | AES-256-GCM |

141

| AAD over metadata | Bind `TensorMetadata` + sequence number | AES-GCM tag |

142

| Driver attestation | Bind driver bytes to a trusted signer DID | ML-DSA-65 |

143

| Content / canonical digest | Module hash and attestation canonical digest | SHA3-256 |

144

145

Signing and KEM keys are delegated to [`quantumshield`](https://github.com/dyber-pqc/quantumshield), which prefers real `liboqs` ML-KEM / ML-DSA when available and falls back to a transitional implementation otherwise.

146

147

## Threat Model

148

149

| Adversary capability                                               | Coverage                                                                   |

150

| ------------------------------------------------------------------ | -------------------------------------------------------------------------- |

151

| Sniffs PCIe DMA to read model weights in transit                   | Blocked — every transfer is AES-256-GCM ciphertext.                        |

152

| Rewrites bytes in a DMA buffer mid-transfer                        | Detected — GCM tag binds ciphertext; decrypt raises `DecryptionError`.     |

153

| Tampers with tensor metadata while preserving ciphertext           | Detected — metadata is in AAD, decrypt fails.                              |

154

| Swaps in a backdoored `nvidia.ko` / `amdgpu.ko` at load time       | Blocked — driver hash must match the ML-DSA-signed `DriverAttestation`.    |

155

| Ships a driver signed by an untrusted key                          | Blocked — verifier's `trusted_signers` allow-list filters signer DIDs.     |

156

| Replays an old `EncryptedTensor` to corrupt state                  | Blocked — strictly-monotonic sequence number + nonce cache.                |

157

| Reuses an expired channel key                                      | Blocked — `ChannelSession.is_valid()` enforces TTL.                        |

158

| Records traffic today, decrypts in 2035 with a CRQC                | Blocked — ML-KEM + ML-DSA are post-quantum.                                |

159

| Compromises the CPU-side host OS fully                             | Out of scope — mitigate with SEV-SNP / TDX around the inference workload.  |

160

| Extracts the session key from GPU device RAM                       | Out of scope — mitigate with H100/H200 Confidential Computing enclaves.    |

161

162

## Backend Integration Guide

163

164

The library defines a minimal interface:

165

166

```python

167

class GPUBackend(ABC):

168

name: str

169

device_type: str

170

171

def upload(self, tensor: EncryptedTensor) -> str: ...

172

def download(self, device_handle: str) -> EncryptedTensor: ...

173

def free(self, device_handle: str) -> None: ...

174

def device_info(self) -> dict: ...

175

```

176

177

Backends move opaque ciphertext. They never call `decrypt_tensor()` — that is the session's job, run inside the trusted compute boundary.

178

179

### NVIDIA CUDA (`CUDABackend`)

180

181

The shipped `CUDABackend` is a stub. A real integration:

182

183

1. Opens a CUDA context via `cuInit` / `cuCtxCreate` for the target device (H100 / H200 with Confidential Computing enabled).

184

2. On `upload`: `cuMemAlloc` a device buffer sized for the ciphertext, then `cuMemcpyHtoD` the hex-decoded ciphertext bytes. Register a `CUDA-IPC` handle if cross-process.

185

3. On `download`: `cuMemcpyDtoH` the buffer back to pinned host memory.

186

4. On `free`: `cuMemFree` plus drop the IPC handle.

187

5. Keep bytes encrypted at rest on device; plaintext lives only inside the confidential-computing enclave.

188

189

### AMD ROCm (`ROCmBackend`)

190

191

Mirror the CUDA flow with `hipInit` / `hipMalloc` / `hipMemcpy` / `hipFree` and `HIP-IPC`.

192

193

### Your own backend

194

195

Subclass `GPUBackend`, implement the four methods, and pass an instance through the upload / download path. The session handles encryption, AAD binding, and replay protection uniformly.

196

197

## API Reference

198

199

### Data types

200

201

| Class | Description |

202

| -------------------- | ---------------------------------------------------------------- |

203

| `TensorMetadata` | Non-secret tensor descriptor used as AAD. |

204

| `EncryptedTensor` | Metadata + nonce + AES-GCM ciphertext + sequence number. |

205

| `DriverModule` | Driver binary summary: `(name, version, module_hash, size)`. |

206

| `DriverAttestation` | ML-DSA-signed claim about a `DriverModule`. |

207

| `VerificationResult` | Pass/fail breakdown with `error` detail. |

208

209

### Channel

210

211

| Symbol | Purpose |

212

| -------------------------------- | ------------------------------------------------------------ |

213

| `establish_channel(...)` | Produce a matched `(cpu_session, gpu_session)` pair. |

214

| `ChannelSession.encrypt_tensor` | Encrypt tensor bytes + bind metadata via AAD. |

215

| `ChannelSession.decrypt_tensor` | Decrypt, verify AAD, enforce monotonic sequence + nonce. |

216

| `ChannelSession.is_valid` | Check TTL has not elapsed. |

217

218

### Driver attestation

219

220

| Symbol | Purpose |

221

| ----------------------------------------- | --------------------------------------------------- |

222

| `DriverAttester.attest` | Produce an ML-DSA-signed `DriverAttestation`. |

223

| `DriverAttestationVerifier.verify` | Return a `VerificationResult`. |

224

| `DriverAttestationVerifier.verify_or_raise` | Raise `DriverAttestationError` on failure. |

225

226

### Backends

227

228

| Class | Use |

229

| ------------------ | --------------------------------------------------------- |

230

| `InMemoryBackend` | Reference / tests / tutorials. |

231

| `CUDABackend` | Stub for NVIDIA CUDA (plug into `cuMemcpy` / CUDA-IPC). |

232

| `ROCmBackend` | Stub for AMD ROCm (plug into `hipMemcpy` / HIP-IPC). |

233

234

### Exceptions

235

236

`GPUDriverError` -> `ChannelEstablishmentError`, `ChannelExpiredError`, `NonceReplayError`, `DecryptionError`, `DriverAttestationError`, `BackendError`.

237

238

## Why PQC Matters for GPU Drivers

239

240

Confidential GPU computing on H100 and H200 today relies on classical primitives: RSA / ECDSA driver signatures, ECDH for the session key between CPU and GPU enclaves, SHA-256 content hashes. Each of those is a direct target for a CRQC running Shor's algorithm. A session key negotiated in 2026 can be recovered from a recorded PCIe trace in 2035, and a driver signed today with a 256-bit EC key is forgeable by any adversary holding the public key once Shor arrives. "Model weights in flight" is exactly the kind of secret whose confidentiality must survive the cryptographic transition — you cannot ship a 70B parameter model through an EC-protected channel and expect that secret to stay safe for the decade-plus lifetime of the deployment. This library is the PQC layer that belongs above the vendor's confidential-computing story, not below it: ML-KEM for key agreement, ML-DSA for driver integrity, AES-256 for the bulk data.

241

242

## Examples

243

244

* [`examples/basic_channel.py`](examples/basic_channel.py) — establish an ML-KEM channel and round-trip a tensor.

245

* [`examples/driver_attestation.py`](examples/driver_attestation.py) — sign a fake `nvidia.ko`, verify via allow-list, show the untrusted-signer reject path.

246

* [`examples/tensor_tamper_detection.py`](examples/tensor_tamper_detection.py) — flip a byte of ciphertext and watch AES-GCM detect it.

247

248

## Development

249

250

```bash

251

pip install -e ".[dev]"

252

pytest

253

ruff check src/ tests/ examples/

254

```

255

256

## License

257

258

Apache 2.0. See [LICENSE](LICENSE).

259