README.md
12.2 KB · 269 lines · markdown Raw
1 # PQC Immutable AI Audit Log (Filesystem)
2
3 ![PQC Native](https://img.shields.io/badge/PQC-Native-blue)
4 ![ML-DSA-65](https://img.shields.io/badge/ML--DSA--65-FIPS%20204-green)
5 ![SHA3-256](https://img.shields.io/badge/Merkle-SHA3--256-green)
6 ![EU AI Act Ready](https://img.shields.io/badge/EU%20AI%20Act-Audit%20Trail-purple)
7 ![License](https://img.shields.io/badge/License-Apache%202.0-orange)
8 ![Version](https://img.shields.io/badge/version-0.1.0-lightgrey)
9
10 **Tamper-evident audit log for AI inference events, designed for legal discovery.** When an AI system denies a loan, flags a medical claim, moderates content, or calls a tool, that decision needs to survive 15+ years of potential litigation. This library writes each inference to an append-only segmented file, builds an SHA3-256 Merkle tree per segment, signs the segment header with **ML-DSA** (FIPS 204), and chains every segment to the previous one by root hash. Any single bit flipped anywhere in the log is detected on verification.
11
12 ## The Problem
13
14 Regulators and plaintiff's lawyers are converging on the same demand: *show us the inference log*. The EU AI Act (Article 12) requires high-risk AI systems to keep automatically-generated logs for the lifetime of the system. US class-action litigation against AI lenders, insurers, and content platforms routinely subpoenas inference histories. Existing solutions fall short:
15
16 - **Application-DB logs** are mutable — a DBA or a compromised service can edit them without trace.
17 - **Cloud log services** are opaque to the model operator; if the provider loses them you have no recourse.
18 - **RSA/ECDSA-signed** archives decay the moment a cryptographically relevant quantum computer exists — signatures made today must still verify in 2040.
19
20 ## The Solution
21
22 A pure-Python library with an append-only on-disk layout:
23
24 - Each `InferenceEvent` stores SHA3-256 hashes of the input/output (not the raw content — privacy-preserving) plus model DID, actor DID, decision label, timestamp.
25 - `LogAppender` writes events as JSON-Lines into `segment-NNNNN.log`. When the rotation policy fires (events, bytes, or age), a `SegmentHeader` is built: Merkle root over every leaf hash, plus `previous_segment_root` chaining to the prior segment. The header is signed with ML-DSA and written to `segment-NNNNN.sig.json`.
26 - `LogReader.verify_chain()` walks every segment, recomputes each Merkle root, verifies each ML-DSA signature, and confirms every chain link. One mutation anywhere fails verification.
27 - `InclusionProver` produces `O(log n)` proofs that a specific event was in a specific segment — useful when you must surrender a single decision to a court without leaking the surrounding log.
28
29 ## Installation
30
31 ```bash
32 pip install pqc-audit-log-fs
33 ```
34
35 Development:
36
37 ```bash
38 pip install -e ".[dev]"
39 ```
40
41 ## Quick Start
42
43 ```python
44 from quantumshield.identity.agent import AgentIdentity
45 from pqc_audit_log_fs import (
46 InferenceEvent, LogAppender, LogReader, RotationPolicy,
47 )
48
49 signer = AgentIdentity.create(name="audit-signer")
50
51 with LogAppender(
52 "./audit-log",
53 signer,
54 rotation=RotationPolicy(max_events_per_segment=10_000),
55 ) as appender:
56 for decision in your_decisions:
57 appender.append(InferenceEvent.create(
58 model_did="did:pqaid:credit-model-v3",
59 model_version="3.2.1",
60 input_bytes=decision.input_blob,
61 output_bytes=decision.output_blob,
62 decision_type="classification",
63 decision_label=decision.label, # 'approve' | 'deny'
64 actor_did=decision.user_did,
65 session_id=decision.session_id,
66 ))
67
68 reader = LogReader("./audit-log")
69 ok, errors = reader.verify_chain()
70 assert ok, errors
71 ```
72
73 ## Architecture
74
75 ```
76 Inference Service Verifier / Court / Auditor
77 ---------------- ---------------------------
78
79 event -> LogAppender.append()
80 |
81 | jsonl line
82 v
83 segment-NNNNN.log (append-only)
84 |
85 | rotation trigger
86 v
87 [merkle root over leaf hashes]
88 |
89 | ML-DSA sign
90 v
91 segment-NNNNN.sig.json <--------------> LogReader.verify_chain()
92 | |
93 | previous_segment_root | recompute roots
94 v | verify ML-DSA sigs
95 segment-(N+1)*****.log -> .sig.json | check chain links
96 | v
97 | ok / errors
98 v
99 (optional) MerkleAnchor
100 -> external transparency log
101 ```
102
103 ## Cryptography
104
105 | Primitive | Algorithm | Purpose |
106 |-----------------------------|----------------|-------------------------------------------|
107 | Leaf hash | SHA3-256 | `SHA3-256(0x00 ‖ canonical(event))` |
108 | Internal Merkle node | SHA3-256 | `SHA3-256(0x01 ‖ left ‖ right)` |
109 | Segment signature | ML-DSA-65 | over `SHA3-256(canonical(header))` |
110 | Cross-segment chaining | SHA3-256 | `header.previous_segment_root` |
111
112 Leaves and internal nodes use domain-separation prefixes to prevent second-preimage attacks. Segments chain like a blockchain: rewriting segment `k` forces every subsequent `previous_segment_root` to also be rewritten, and every subsequent ML-DSA signature to be forged.
113
114 ## Segment File Layout
115
116 ```
117 audit-log/
118 segment-00001.log JSON-Lines, one InferenceEvent per line
119 segment-00001.sig.json signed SegmentHeader
120 segment-00002.log
121 segment-00002.sig.json
122 ...
123 ```
124
125 A `segment-NNNNN.sig.json` looks like:
126
127 ```json
128 {
129 "segment_id": "segment-00001",
130 "segment_number": 1,
131 "created_at": "2026-04-20T12:00:00+00:00",
132 "sealed_at": "2026-04-20T13:00:00+00:00",
133 "event_count": 10000,
134 "merkle_root": "a1b2c3...",
135 "previous_segment_root": "",
136 "log_id": "urn:pqc-audit-log:...",
137 "signer_did": "did:pqaid:...",
138 "algorithm": "ML-DSA-65",
139 "signature": "ff...",
140 "public_key": "aa..."
141 }
142 ```
143
144 ## Rotation Policy
145
146 `RotationPolicy` triggers a seal when **any** threshold is crossed:
147
148 | Field | Default | Meaning |
149 |-----------------------------|--------------|-------------------------------------------|
150 | `max_events_per_segment` | 10,000 | Seal after N events |
151 | `max_bytes_per_segment` | 10 MB | Seal after N bytes of JSONL |
152 | `max_segment_age_seconds` | 3600 (1h) | Seal after time elapsed |
153
154 ## Threat Model
155
156 | Attack | How we detect it |
157 |---------------------------------------------|-----------------------------------------------------------------|
158 | Flip a byte in a sealed `.log` file | Merkle root mismatch in `verify_segment` |
159 | Delete an event line from a sealed segment | Merkle root mismatch |
160 | Swap a whole segment for a forged one | Chain break: `previous_segment_root` of next segment mismatches |
161 | Forge a signature today | ML-DSA-65 — no known classical or quantum break |
162 | Re-sign after tamper using the signer's key | Requires private key exfiltration; out of scope |
163 | Delete trailing segments | Detectable if segment roots are anchored externally (`MerkleAnchor`) |
164
165 The log is designed to be post-quantum hard: ML-DSA-65 targets NIST security category 3, equivalent to AES-192 classical / post-quantum. Signatures made today remain verifiable after cryptographically relevant quantum computers arrive.
166
167 ## EU AI Act Mapping
168
169 | Requirement (Article 12, "Record-keeping") | This library |
170 |---------------------------------------------------|---------------------------------------------|
171 | Automatic generation of logs | `LogAppender.append()` |
172 | Logs appropriate to intended purpose | `InferenceEvent.metadata` is free-form |
173 | Logs kept for the lifetime of the system | Append-only segments; no size cap |
174 | Logs traceable to a specific system version | `model_did` + `model_version` per event |
175 | Logs enabling post-market monitoring | `LogReader.verify_chain()` for spot audits |
176 | Integrity protection | SHA3-256 + ML-DSA-65 + cross-segment chain |
177
178 Combine with `MerkleAnchor` to publish segment roots to a public transparency log (blockchain, Rekor, etc.) for externally-anchored non-repudiation.
179
180 ## CLI Reference
181
182 ```
183 pqc-audit verify <log_dir>
184 pqc-audit prove <log_dir> <segment_number> <event_id>
185 pqc-audit info <log_dir>
186 ```
187
188 Example:
189
190 ```
191 $ pqc-audit info ./audit-log
192 log_dir: ./audit-log
193 segments: 3
194 segment 00001 events=10000 root=a1b2c3d4e5f6a7b8... prev=<genesis> sealed_at=2026-04-20T13:00:00+00:00
195 segment 00002 events=10000 root=b2c3d4e5f6a7b8c9... prev=a1b2c3d4e5f6... sealed_at=2026-04-20T14:00:00+00:00
196 segment 00003 events= 4231 root=c3d4e5f6a7b8c9d0... prev=b2c3d4e5f6a7... sealed_at=2026-04-20T15:00:00+00:00
197
198 $ pqc-audit verify ./audit-log
199 [OK] all 3 segments verify
200 ```
201
202 ## API Reference
203
204 ### `InferenceEvent`
205
206 | Field | Type | Description |
207 |------------------------|------------|-----------------------------------------------|
208 | `event_id` | str | `urn:pqc-audit-evt:<hex>` |
209 | `timestamp` | str (ISO) | UTC wall-clock |
210 | `model_did` | str | `did:pqaid:...` identifying the model |
211 | `model_version` | str | Semver or hash of model binary |
212 | `input_hash` | str | SHA3-256 hex of canonical input |
213 | `output_hash` | str | SHA3-256 hex of canonical output |
214 | `reasoning_chain_hash` | str | SHA3-256 hex over chain-of-thought |
215 | `decision_type` | str | e.g. `classification`, `generation` |
216 | `decision_label` | str | e.g. `approve`, `deny` |
217 | `actor_did` | str | DID of the user/agent that invoked the model |
218 | `session_id` | str | Free-form session identifier |
219 | `metadata` | dict | Free-form metadata |
220
221 ### `LogAppender`
222
223 - `append(event)` — append one event; may trigger a seal.
224 - `seal_current_segment()` — force-seal now.
225 - `close()` — seal and flush; also invoked by `__exit__`.
226
227 ### `LogReader`
228
229 - `list_segments() -> list[int]`
230 - `read_header(n) -> SegmentHeader`
231 - `read_segment(n) -> AuditSegment`
232 - `verify_segment(n) -> bool`
233 - `verify_chain() -> (ok, errors)`
234
235 ### `InclusionProver`
236
237 - `prove_event(segment_number, event_id) -> InclusionProof`
238 - `verify_proof(event, proof) -> bool`
239
240 ### `MerkleAnchor` / `AnchorSink`
241
242 - Pluggable sink interface for publishing segment roots to an external
243 transparency log (blockchain, Rekor-style log, internal KMS, etc.).
244
245 ### `FilesystemGuard`
246
247 - Best-effort OS-level enforcement: `chmod` on all platforms; `chattr +a/+i`
248 on Linux; `chflags uchg` on macOS.
249
250 ## Why PQC for Audit Logs
251
252 AI liability litigation runs on timescales of a decade or more:
253
254 - A 2026 loan denial may surface in a 2035 class-action settlement.
255 - A 2027 medical model may face a 2040 product-liability suit.
256 - EU AI Act retention is tied to the *lifetime of the system* — potentially 20+ years.
257
258 Classical signatures made in 2026 will not survive a cryptographically relevant quantum computer ("Q-day") if one arrives mid-retention-window. ML-DSA-65 is the right default: NIST-standardized, FIPS 204, security category 3.
259
260 ## Examples
261
262 - `examples/basic_log.py` — Write 30 events with rotation every 10, verify chain.
263 - `examples/prove_inclusion.py` — Build and verify an inclusion proof for event #25.
264 - `examples/tamper_detection.py` — Mutate a JSONL line; show `verify_chain()` flagging the specific segment.
265
266 ## License
267
268 Apache 2.0
269