Z

Zyphra / Zyda-2

Unverified HuggingFace

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers. To construct Zyda-2, we took the best open-source datasets available: Zyda, FineWeb, DCLM, and Dolma. Models trained on Zyda-2 significantly outperform identical models trained on the… See the full description on the dataset page: https://huggingface.co/datasets/Zyphra/Zyda-2.

98 138,634 1

Unverified Model

This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.

README.md

Zyda-2

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers. To construct Zyda-2, we took the best open-source datasets available: Zyda, FineWeb, DCLM, and Dolma. Models trained on Zyda-2 significantly outperform identical models trained on the… See the full description on the dataset page: https://huggingface.co/datasets/Zyphra/Zyda-2.

Intended Uses

This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.

Quick Start

# Install the CLI
pip install quantumshield

# Pull the model
quantumshield pull Zyphra/Zyda-2

# Verify file integrity
quantumshield verify Zyphra/Zyda-2

About

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers. To construct Zyda-2, we took the best open-source datasets available: Zyda, FineWeb, DCLM, and Dolma. Models trained on Zyda-2 significantly outperform identical models trained on the… See the full description on the dataset page: https://huggingface.co/datasets/Zyphra/Zyda-2.

Created 2026-06-23
Downloads 138,634
Likes 98

Get this model

View on HuggingFace

Pull with QuantumShield

quantumshield pull Zyphra/Zyda-2

Verify signatures

quantumshield verify Zyphra/Zyda-2

Signers

V1
did:quantamrkt:regis...hield-v1