Browse PQC-verified AI models, datasets, and tools
15T token dataset of cleaned English web data. Deduplicated and filtered from CommonCrawl, outperforms C4 and RefinedWeb for LLM pretraining.