Model Hub

Browse PQC-verified AI models, datasets, and tools

Sort: Most Downloaded Most Liked Recently Updated

Dataset Card for "super_glue" Dataset Summary SuperGLUE (https://super.gluebenchmark.com/) is a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances axb Size of downloaded dataset files: 0.03 MB Size of… See the full description on the dataset page: https://huggingface.co/datasets/aps/super_glue.

Task_categories:text-ClassificationTask_categories:token-ClassificationTask_categories:question-AnsweringTask_ids:natural-Language-InferenceTask_ids:word-Sense-DisambiguationTask_ids:coreference-Resolution

157K 188

Updated 2026-06-30 Source available

TIGER-Lab/MMLU-Pro HF Unverified

MMLU-Pro Dataset MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines. |Github | 🏆Leaderboard | 📖Paper | 🚀 What's New [2026.03.11] Added more cutting-edge frontier models to the leaderboard, including the Claude-4.6 series, Seed2.0 series, Qwen3.5 series, and Gemini-3.1-Pro, among… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro.

Benchmark:officialTask_categories:question-AnsweringLanguage:enSize_categories:10K<n<100KFormat:parquetModality:tabular

157K 490

Updated 2026-06-30 Source available

facebook/vjepa2-vitl-fpc64-256 HF Unverified

Video-ClassificationTransformersSafetensorsVjepa2Feature ExtractionVideo HIGH

157K 202

Updated 2026-06-30

jobs-git/HPLT2.0_cleaned HF Unverified

This is a large-scale collection of web-crawled documents in 191 world languages, produced by the HPLT project. The source of the data is mostly Internet Archive with some additions from Common Crawl. For a detailed description of the dataset, please refer to https://hplt-project.org/datasets/v2.0 The Cleaned variant of HPLT Datasets v2.0 This is the cleaned variant of the HPLT Datasets v2.0 converted to the Parquet format semi-automatically when being uploaded here. The original JSONL files… See the full description on the dataset page: https://huggingface.co/datasets/jobs-git/HPLT2.0_cleaned.

Task_categories:fill-MaskTask_categories:text-GenerationTask_ids:language-ModelingMultilinguality:multilingualLanguage:aceLanguage:af

157K 0

Updated 2026-06-30 Source available

allenai/dolma3_mix-6T-1025-7B HF Unverified

⚠️ WARNING: This dataset is intended ONLY for reproducing Olmo 3 7B ⚠️ For all other training use cases, including training from scratch, please utilize our primary dolma 3 data mix: https://huggingface.co/datasets/allenai/dolma3_mix-6T. Note: Some olmOCR science PDFs in the current dataset have been redacted following the training of Olmo 3 7B. These texts are indicated with [REMOVED] in the text field. This will affect reproducibility of Olmo 3 7B. For this reason, please use our… See the full description on the dataset page: https://huggingface.co/datasets/allenai/dolma3_mix-6T-1025-7B.

Task_categories:text-GenerationLanguage:en

156K 53

Updated 2026-06-30 Source available

monologg/koelectra-small-v2-distilled-korquad-384 HF Unverified

Question AnsweringTransformersPyTorchTfliteSafetensorsElectra MEDIUM

156K 7

Updated 2026-06-30

mlfoundations/MINT-1T-PDF-CC-2023-23 HF PQC Verified

🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-23.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:1M<n<10MFormat:webdatasetModality:image

155K 10

Updated 2026-05-01 Source available

allenai/openbookqa HF Unverified

Dataset Card for OpenBookQA Dataset Summary OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. OpenBookQA is a new kind of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/openbookqa.

Task_categories:question-AnsweringTask_ids:open-Domain-QaAnnotations_creators:crowdsourcedAnnotations_creators:expert-GeneratedLanguage_creators:expert-GeneratedMultilinguality:monolingual

155K 133

Updated 2026-06-30 Source available

deepset/tinyroberta-squad2 HF Unverified

Question AnsweringTransformersPyTorchSafetensorsRobertaModel-Index MEDIUM

154K 114

Updated 2026-06-30

facebook/vjepa2-vith-fpc64-256 HF Unverified

Video-ClassificationTransformersSafetensorsVjepa2Feature ExtractionVideo HIGH

150K 20

Updated 2026-06-30

depth-anything/DA3NESTED-GIANT-LARGE-1.1 HF Unverified

Depth-EstimationDepth-Anything-3SafetensorsComputer-VisionMonocular-DepthMulti-View-Geometry HIGH

148K 27

Updated 2026-06-30

nvidia/Nemotron-CC-v2 HF Unverified

Nemotron-Pre-Training-Dataset-v1 Release Data Overview This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of intelligent, globally-capable models. This dataset supports NVIDIA Nemotron Nano 2, a family of large language models (LLMs) that consists of the NVIDIA-Nemotron-Nano-9B-v2, NVIDIA-Nemotron-Nano-9B-v2-Base, and NVIDIA-Nemotron-Nano-12B-v2-Base… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Nemotron-CC-v2.

Task_categories:text-GenerationSize_categories:1B<n<10BFormat:parquetModality:textLibrary:datasetsLibrary:dask

147K 116

Updated 2026-05-02 Source available

Intel/zoedepth-nyu-kitti HF PQC Verified

Depth-EstimationTransformersSafetensorsZoedepthVision HIGH

146K 20

Updated 2026-06-30

philschmid/bart-large-cnn-samsum HF Unverified

SummarizationTransformersPyTorchBartText2text-GenerationSagemaker HIGH

145K 268

Updated 2026-06-30

PekingU/rtdetr_r18vd_coco_o365 HF PQC Verified

Object-DetectionTransformersSafetensorsRt_detrVisionEnglish MEDIUM

145K 5

Updated 2026-04-26

typeform/distilbert-base-uncased-mnli HF Unverified

Zero-Shot ClassificationTransformersPyTorchTfSafetensorsDistilbert MEDIUM

143K 45

Updated 2026-06-24

abhishtagatya/hubert-base-960h-itw-deepfake HF Unverified

Audio-ClassificationTransformersTensorboardSafetensorsHubertDeepfake MEDIUM

142K 1

Updated 2026-04-22

zekaiwang/trex_dataset HF Unverified

T-Rex Dataset A large-scale, tactile-reactive bimanual manipulation dataset, collected via teleoperation on a Dexmate Vega-1 robot with two Sharpa Wave dexterous hands. Stored as a LeRobotDataset v3.0. 🌐 Project Page · ✍️ Paper (arXiv) · 💻 Code (T-Rex) · 🚀 Dataset Quickstart · 📓 Colab notebook One episode from each of 20 motor primitives (head-camera view, cropped to the workspace), each with a different object. Teleoperation setup: Manus gloves + VIVE… See the full description on the dataset page: https://huggingface.co/datasets/zekaiwang/trex_dataset.

Task_categories:roboticsLanguage:enSize_categories:1M<n<10MFormat:parquetModality:tabularModality:text

141K 6

Updated 2026-06-30 Source available

Zyphra/Zyda-2 HF Unverified

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers. To construct Zyda-2, we took the best open-source datasets available: Zyda, FineWeb, DCLM, and Dolma. Models trained on Zyda-2 significantly outperform identical models trained on the… See the full description on the dataset page: https://huggingface.co/datasets/Zyphra/Zyda-2.

Task_categories:text-GenerationLanguage:enSize_categories:n>1T

139K 98

Updated 2026-06-30 Source available

ylecun/mnist HF Unverified

Dataset Card for MNIST Dataset Summary The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school students… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

Task_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:expert-GeneratedLanguage_creators:foundMultilinguality:monolingualSource_datasets:extended|other-Nist

138K 252

Updated 2026-06-30 Source available

Showing 20 of 665 items (page 24 of 34)

Prev Next