Browse PQC-verified AI models, datasets, and tools
π AutoMathText-V2: A 2.46 Trillion Token AI-Curated STEM Pretraining Dataset π AutoMathText-v2 has surpassed 1.5 million downloads! We'd love to know how you're using it. Please take 1 minute to fill out our use case survey. Your feedback will directly shape the future roadmap of this dataset.π Share your use case here π AutoMathText-V2 consists of 2.46 trillion tokens of high-quality, deduplicated text spanning web content, mathematics, code, reasoning, andβ¦ See the full description on the dataset page: https://huggingface.co/datasets/OpenSQZ/AutoMathText-V2.