README.md
| 1 | # PQC Immutable AI Audit Log (Filesystem) |
| 2 | |
| 3 |  |
| 4 |  |
| 5 |  |
| 6 |  |
| 7 |  |
| 8 |  |
| 9 | |
| 10 | **Tamper-evident audit log for AI inference events, designed for legal discovery.** When an AI system denies a loan, flags a medical claim, moderates content, or calls a tool, that decision needs to survive 15+ years of potential litigation. This library writes each inference to an append-only segmented file, builds an SHA3-256 Merkle tree per segment, signs the segment header with **ML-DSA** (FIPS 204), and chains every segment to the previous one by root hash. Any single bit flipped anywhere in the log is detected on verification. |
| 11 | |
| 12 | ## The Problem |
| 13 | |
| 14 | Regulators and plaintiff's lawyers are converging on the same demand: *show us the inference log*. The EU AI Act (Article 12) requires high-risk AI systems to keep automatically-generated logs for the lifetime of the system. US class-action litigation against AI lenders, insurers, and content platforms routinely subpoenas inference histories. Existing solutions fall short: |
| 15 | |
| 16 | - **Application-DB logs** are mutable — a DBA or a compromised service can edit them without trace. |
| 17 | - **Cloud log services** are opaque to the model operator; if the provider loses them you have no recourse. |
| 18 | - **RSA/ECDSA-signed** archives decay the moment a cryptographically relevant quantum computer exists — signatures made today must still verify in 2040. |
| 19 | |
| 20 | ## The Solution |
| 21 | |
| 22 | A pure-Python library with an append-only on-disk layout: |
| 23 | |
| 24 | - Each `InferenceEvent` stores SHA3-256 hashes of the input/output (not the raw content — privacy-preserving) plus model DID, actor DID, decision label, timestamp. |
| 25 | - `LogAppender` writes events as JSON-Lines into `segment-NNNNN.log`. When the rotation policy fires (events, bytes, or age), a `SegmentHeader` is built: Merkle root over every leaf hash, plus `previous_segment_root` chaining to the prior segment. The header is signed with ML-DSA and written to `segment-NNNNN.sig.json`. |
| 26 | - `LogReader.verify_chain()` walks every segment, recomputes each Merkle root, verifies each ML-DSA signature, and confirms every chain link. One mutation anywhere fails verification. |
| 27 | - `InclusionProver` produces `O(log n)` proofs that a specific event was in a specific segment — useful when you must surrender a single decision to a court without leaking the surrounding log. |
| 28 | |
| 29 | ## Installation |
| 30 | |
| 31 | ```bash |
| 32 | pip install pqc-audit-log-fs |
| 33 | ``` |
| 34 | |
| 35 | Development: |
| 36 | |
| 37 | ```bash |
| 38 | pip install -e ".[dev]" |
| 39 | ``` |
| 40 | |
| 41 | ## Quick Start |
| 42 | |
| 43 | ```python |
| 44 | from quantumshield.identity.agent import AgentIdentity |
| 45 | from pqc_audit_log_fs import ( |
| 46 | InferenceEvent, LogAppender, LogReader, RotationPolicy, |
| 47 | ) |
| 48 | |
| 49 | signer = AgentIdentity.create(name="audit-signer") |
| 50 | |
| 51 | with LogAppender( |
| 52 | "./audit-log", |
| 53 | signer, |
| 54 | rotation=RotationPolicy(max_events_per_segment=10_000), |
| 55 | ) as appender: |
| 56 | for decision in your_decisions: |
| 57 | appender.append(InferenceEvent.create( |
| 58 | model_did="did:pqaid:credit-model-v3", |
| 59 | model_version="3.2.1", |
| 60 | input_bytes=decision.input_blob, |
| 61 | output_bytes=decision.output_blob, |
| 62 | decision_type="classification", |
| 63 | decision_label=decision.label, # 'approve' | 'deny' |
| 64 | actor_did=decision.user_did, |
| 65 | session_id=decision.session_id, |
| 66 | )) |
| 67 | |
| 68 | reader = LogReader("./audit-log") |
| 69 | ok, errors = reader.verify_chain() |
| 70 | assert ok, errors |
| 71 | ``` |
| 72 | |
| 73 | ## Architecture |
| 74 | |
| 75 | ``` |
| 76 | Inference Service Verifier / Court / Auditor |
| 77 | ---------------- --------------------------- |
| 78 | |
| 79 | event -> LogAppender.append() |
| 80 | | |
| 81 | | jsonl line |
| 82 | v |
| 83 | segment-NNNNN.log (append-only) |
| 84 | | |
| 85 | | rotation trigger |
| 86 | v |
| 87 | [merkle root over leaf hashes] |
| 88 | | |
| 89 | | ML-DSA sign |
| 90 | v |
| 91 | segment-NNNNN.sig.json <--------------> LogReader.verify_chain() |
| 92 | | | |
| 93 | | previous_segment_root | recompute roots |
| 94 | v | verify ML-DSA sigs |
| 95 | segment-(N+1)*****.log -> .sig.json | check chain links |
| 96 | | v |
| 97 | | ok / errors |
| 98 | v |
| 99 | (optional) MerkleAnchor |
| 100 | -> external transparency log |
| 101 | ``` |
| 102 | |
| 103 | ## Cryptography |
| 104 | |
| 105 | | Primitive | Algorithm | Purpose | |
| 106 | |-----------------------------|----------------|-------------------------------------------| |
| 107 | | Leaf hash | SHA3-256 | `SHA3-256(0x00 ‖ canonical(event))` | |
| 108 | | Internal Merkle node | SHA3-256 | `SHA3-256(0x01 ‖ left ‖ right)` | |
| 109 | | Segment signature | ML-DSA-65 | over `SHA3-256(canonical(header))` | |
| 110 | | Cross-segment chaining | SHA3-256 | `header.previous_segment_root` | |
| 111 | |
| 112 | Leaves and internal nodes use domain-separation prefixes to prevent second-preimage attacks. Segments chain like a blockchain: rewriting segment `k` forces every subsequent `previous_segment_root` to also be rewritten, and every subsequent ML-DSA signature to be forged. |
| 113 | |
| 114 | ## Segment File Layout |
| 115 | |
| 116 | ``` |
| 117 | audit-log/ |
| 118 | segment-00001.log JSON-Lines, one InferenceEvent per line |
| 119 | segment-00001.sig.json signed SegmentHeader |
| 120 | segment-00002.log |
| 121 | segment-00002.sig.json |
| 122 | ... |
| 123 | ``` |
| 124 | |
| 125 | A `segment-NNNNN.sig.json` looks like: |
| 126 | |
| 127 | ```json |
| 128 | { |
| 129 | "segment_id": "segment-00001", |
| 130 | "segment_number": 1, |
| 131 | "created_at": "2026-04-20T12:00:00+00:00", |
| 132 | "sealed_at": "2026-04-20T13:00:00+00:00", |
| 133 | "event_count": 10000, |
| 134 | "merkle_root": "a1b2c3...", |
| 135 | "previous_segment_root": "", |
| 136 | "log_id": "urn:pqc-audit-log:...", |
| 137 | "signer_did": "did:pqaid:...", |
| 138 | "algorithm": "ML-DSA-65", |
| 139 | "signature": "ff...", |
| 140 | "public_key": "aa..." |
| 141 | } |
| 142 | ``` |
| 143 | |
| 144 | ## Rotation Policy |
| 145 | |
| 146 | `RotationPolicy` triggers a seal when **any** threshold is crossed: |
| 147 | |
| 148 | | Field | Default | Meaning | |
| 149 | |-----------------------------|--------------|-------------------------------------------| |
| 150 | | `max_events_per_segment` | 10,000 | Seal after N events | |
| 151 | | `max_bytes_per_segment` | 10 MB | Seal after N bytes of JSONL | |
| 152 | | `max_segment_age_seconds` | 3600 (1h) | Seal after time elapsed | |
| 153 | |
| 154 | ## Threat Model |
| 155 | |
| 156 | | Attack | How we detect it | |
| 157 | |---------------------------------------------|-----------------------------------------------------------------| |
| 158 | | Flip a byte in a sealed `.log` file | Merkle root mismatch in `verify_segment` | |
| 159 | | Delete an event line from a sealed segment | Merkle root mismatch | |
| 160 | | Swap a whole segment for a forged one | Chain break: `previous_segment_root` of next segment mismatches | |
| 161 | | Forge a signature today | ML-DSA-65 — no known classical or quantum break | |
| 162 | | Re-sign after tamper using the signer's key | Requires private key exfiltration; out of scope | |
| 163 | | Delete trailing segments | Detectable if segment roots are anchored externally (`MerkleAnchor`) | |
| 164 | |
| 165 | The log is designed to be post-quantum hard: ML-DSA-65 targets NIST security category 3, equivalent to AES-192 classical / post-quantum. Signatures made today remain verifiable after cryptographically relevant quantum computers arrive. |
| 166 | |
| 167 | ## EU AI Act Mapping |
| 168 | |
| 169 | | Requirement (Article 12, "Record-keeping") | This library | |
| 170 | |---------------------------------------------------|---------------------------------------------| |
| 171 | | Automatic generation of logs | `LogAppender.append()` | |
| 172 | | Logs appropriate to intended purpose | `InferenceEvent.metadata` is free-form | |
| 173 | | Logs kept for the lifetime of the system | Append-only segments; no size cap | |
| 174 | | Logs traceable to a specific system version | `model_did` + `model_version` per event | |
| 175 | | Logs enabling post-market monitoring | `LogReader.verify_chain()` for spot audits | |
| 176 | | Integrity protection | SHA3-256 + ML-DSA-65 + cross-segment chain | |
| 177 | |
| 178 | Combine with `MerkleAnchor` to publish segment roots to a public transparency log (blockchain, Rekor, etc.) for externally-anchored non-repudiation. |
| 179 | |
| 180 | ## CLI Reference |
| 181 | |
| 182 | ``` |
| 183 | pqc-audit verify <log_dir> |
| 184 | pqc-audit prove <log_dir> <segment_number> <event_id> |
| 185 | pqc-audit info <log_dir> |
| 186 | ``` |
| 187 | |
| 188 | Example: |
| 189 | |
| 190 | ``` |
| 191 | $ pqc-audit info ./audit-log |
| 192 | log_dir: ./audit-log |
| 193 | segments: 3 |
| 194 | segment 00001 events=10000 root=a1b2c3d4e5f6a7b8... prev=<genesis> sealed_at=2026-04-20T13:00:00+00:00 |
| 195 | segment 00002 events=10000 root=b2c3d4e5f6a7b8c9... prev=a1b2c3d4e5f6... sealed_at=2026-04-20T14:00:00+00:00 |
| 196 | segment 00003 events= 4231 root=c3d4e5f6a7b8c9d0... prev=b2c3d4e5f6a7... sealed_at=2026-04-20T15:00:00+00:00 |
| 197 | |
| 198 | $ pqc-audit verify ./audit-log |
| 199 | [OK] all 3 segments verify |
| 200 | ``` |
| 201 | |
| 202 | ## API Reference |
| 203 | |
| 204 | ### `InferenceEvent` |
| 205 | |
| 206 | | Field | Type | Description | |
| 207 | |------------------------|------------|-----------------------------------------------| |
| 208 | | `event_id` | str | `urn:pqc-audit-evt:<hex>` | |
| 209 | | `timestamp` | str (ISO) | UTC wall-clock | |
| 210 | | `model_did` | str | `did:pqaid:...` identifying the model | |
| 211 | | `model_version` | str | Semver or hash of model binary | |
| 212 | | `input_hash` | str | SHA3-256 hex of canonical input | |
| 213 | | `output_hash` | str | SHA3-256 hex of canonical output | |
| 214 | | `reasoning_chain_hash` | str | SHA3-256 hex over chain-of-thought | |
| 215 | | `decision_type` | str | e.g. `classification`, `generation` | |
| 216 | | `decision_label` | str | e.g. `approve`, `deny` | |
| 217 | | `actor_did` | str | DID of the user/agent that invoked the model | |
| 218 | | `session_id` | str | Free-form session identifier | |
| 219 | | `metadata` | dict | Free-form metadata | |
| 220 | |
| 221 | ### `LogAppender` |
| 222 | |
| 223 | - `append(event)` — append one event; may trigger a seal. |
| 224 | - `seal_current_segment()` — force-seal now. |
| 225 | - `close()` — seal and flush; also invoked by `__exit__`. |
| 226 | |
| 227 | ### `LogReader` |
| 228 | |
| 229 | - `list_segments() -> list[int]` |
| 230 | - `read_header(n) -> SegmentHeader` |
| 231 | - `read_segment(n) -> AuditSegment` |
| 232 | - `verify_segment(n) -> bool` |
| 233 | - `verify_chain() -> (ok, errors)` |
| 234 | |
| 235 | ### `InclusionProver` |
| 236 | |
| 237 | - `prove_event(segment_number, event_id) -> InclusionProof` |
| 238 | - `verify_proof(event, proof) -> bool` |
| 239 | |
| 240 | ### `MerkleAnchor` / `AnchorSink` |
| 241 | |
| 242 | - Pluggable sink interface for publishing segment roots to an external |
| 243 | transparency log (blockchain, Rekor-style log, internal KMS, etc.). |
| 244 | |
| 245 | ### `FilesystemGuard` |
| 246 | |
| 247 | - Best-effort OS-level enforcement: `chmod` on all platforms; `chattr +a/+i` |
| 248 | on Linux; `chflags uchg` on macOS. |
| 249 | |
| 250 | ## Why PQC for Audit Logs |
| 251 | |
| 252 | AI liability litigation runs on timescales of a decade or more: |
| 253 | |
| 254 | - A 2026 loan denial may surface in a 2035 class-action settlement. |
| 255 | - A 2027 medical model may face a 2040 product-liability suit. |
| 256 | - EU AI Act retention is tied to the *lifetime of the system* — potentially 20+ years. |
| 257 | |
| 258 | Classical signatures made in 2026 will not survive a cryptographically relevant quantum computer ("Q-day") if one arrives mid-retention-window. ML-DSA-65 is the right default: NIST-standardized, FIPS 204, security category 3. |
| 259 | |
| 260 | ## Examples |
| 261 | |
| 262 | - `examples/basic_log.py` — Write 30 events with rotation every 10, verify chain. |
| 263 | - `examples/prove_inclusion.py` — Build and verify an inclusion proof for event #25. |
| 264 | - `examples/tamper_detection.py` — Mutate a JSONL line; show `verify_chain()` flagging the specific segment. |
| 265 | |
| 266 | ## License |
| 267 | |
| 268 | Apache 2.0 |
| 269 | |