M

mlfoundations / MINT-1T-HTML

Unverified HuggingFace

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML.

94 138,820 1

Unverified Model

This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.

README.md

MINT-1T-HTML

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML.

Intended Uses

This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.

Quick Start

# Install the CLI
pip install quantumshield

# Pull the model
quantumshield pull mlfoundations/MINT-1T-HTML

# Verify file integrity
quantumshield verify mlfoundations/MINT-1T-HTML

About

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML.

Created 2026-05-06
Downloads 138,820
Likes 94

Get this model

View on HuggingFace

Pull with QuantumShield

quantumshield pull mlfoundations/MINT-1T-HTML

Verify signatures

quantumshield verify mlfoundations/MINT-1T-HTML

Signers

V1
did:quantamrkt:regis...hield-v1