π FineWeb-Edu 1.3 trillion tokens of the finest educational data the π web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? π FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from π· FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We thenβ¦ See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.
Use this model
Pull with QuantumShield
quantumshield pull HuggingFaceFW/fineweb-edu Verify integrity
quantumshield verify HuggingFaceFW/fineweb-edu pip install
pip install quantumshield && quantumshield pull HuggingFaceFW/fineweb-edu PQC-Verified with ML-DSA-87
This model has a real FIPS 204 ML-DSA-87 (Dilithium5) signature from the platform signing authority. Signature chain includes 2 verification(s). Last verified 2026-05-08.
README.md
fineweb-edu
π FineWeb-Edu 1.3 trillion tokens of the finest educational data the π web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? π FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from π· FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We thenβ¦ See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.
Intended Uses
This model is registered on the QuantaMrkt quantum-safe registry. All files have been cryptographically verified using post-quantum signatures.
Quick Start
# Install the CLI pip install quantumshield # Pull the model quantumshield pull HuggingFaceFW/fineweb-edu # Verify file integrity quantumshield verify HuggingFaceFW/fineweb-edu
About
π FineWeb-Edu 1.3 trillion tokens of the finest educational data the π web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? π FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from π· FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We thenβ¦ See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.
Get this model
Pull with QuantumShield
quantumshield pull HuggingFaceFW/fineweb-edu Verify signatures
quantumshield verify HuggingFaceFW/fineweb-edu