Model Hub

Browse PQC-verified AI models, datasets, and tools

Sort: Most Downloaded Most Liked Recently Updated

Dataset Card for MMLU Dataset Summary Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. This covers 57 tasks… See the full description on the dataset page: https://huggingface.co/datasets/cais/mmlu.

Task_categories:question-AnsweringTask_ids:multiple-Choice-QaAnnotations_creators:no-AnnotationLanguage_creators:expert-GeneratedMultilinguality:monolingualSource_datasets:original

429K 778

Updated 2026-06-30 Source available

CompVis/stable-diffusion-v1-4 HF PQC Verified

Text-to-ImageDiffusersSafetensorsStable-DiffusionStable-Diffusion-DiffusersDiffusers:StableDiffusionPipeline CRITICAL

426K 7,028

Updated 2026-06-30

facebook/seamless-m4t-v2-large HF Unverified

Speech RecognitionTransformersSafetensorsSeamless_m4t_v2Feature ExtractionAudio-To-Audio HIGH

426K 989

Updated 2026-06-30

nyu-mll/glue HF PQC Verified

Dataset Card for GLUE Dataset Summary GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. Supported Tasks and Leaderboards The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks: ax A manually-curated evaluation dataset for fine-grained analysis of system… See the full description on the dataset page: https://huggingface.co/datasets/nyu-mll/glue.

Task_categories:text-ClassificationTask_ids:acceptability-ClassificationTask_ids:natural-Language-InferenceTask_ids:semantic-Similarity-ScoringTask_ids:sentiment-ClassificationTask_ids:text-Scoring

421K 508

Updated 2026-06-30 Source available

allenai/ai2_arc HF Unverified

Dataset Card for "ai2_arc" Dataset Summary A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We are also including a corpus of over 14 million science sentences relevant to… See the full description on the dataset page: https://huggingface.co/datasets/allenai/ai2_arc.

Task_categories:question-AnsweringTask_ids:open-Domain-QaTask_ids:multiple-Choice-QaAnnotations_creators:foundLanguage_creators:foundMultilinguality:monolingual

417K 359

Updated 2026-06-30 Source available

cross-encoder/nli-deberta-v3-small HF Unverified

Zero-Shot ClassificationSentence-TransformersPyTorchONNXSafetensorsDeberta-V2 HIGH

416K 14

Updated 2026-06-30

XDOF/ABC-130k HF Unverified

ABC-130k ABC-130k is the largest open-source robot teleoperation dataset. It contains bimanual manipulation trajectories collected on two-arm YAM stations. Episodes are distributed as MCAP files, with subtask annotations kept as separate artifacts so they can be revised or extended independently of the underlying episode data. For details on the accompanying paper, see abc.bot. Please see the GitHub repo here for code to train and deploy with this dataset. Dataset… See the full description on the dataset page: https://huggingface.co/datasets/XDOF/ABC-130k.

Task_categories:roboticsLanguage:enSize_categories:n>1TRoboticsManipulationImitation-Learning

412K 66

Updated 2026-06-30 Source available

valhalla/distilbart-mnli-12-1 HF Unverified

Zero-Shot ClassificationTransformersPyTorchJAXBartText Classification HIGH

412K 56

Updated 2026-06-30

HuggingFaceFW/finephrase HF PQC Verified

Dataset Card for HuggingFaceFW/finephrase Dataset Summary Synthetic data generated by DataTrove: Model: HuggingFaceTB/SmolLM2-1.7B-Instruct (main) Source dataset: HuggingFaceFW/fineweb-edu, config sample-350BT, split train Generation config: temperature=1.0, top_p=1.0, top_k=50, max_tokens=2048, model_max_context=8192 Speculative decoding: {"method":"suffix","num_speculative_tokens":32} System prompt: None Input column: text Prompt families: faq prompt Rewrite the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finephrase.

Task_categories:text-GenerationTask_ids:language-ModelingAnnotations_creators:machine-GeneratedLanguage_creators:foundSource_datasets:HuggingFaceFW/fineweb-Edu/sample-350BTLanguage:en

404K 131

Updated 2026-06-30 Source available

facebook/mms-lid-256 HF Unverified

Audio-ClassificationTransformersPyTorchSafetensorsWav2vec2Mms HIGH

395K 18

Updated 2026-06-30

robbyant/mdm_depth HF Unverified

LingBot-Depth Dataset Self-curated RGB-D dataset for training LingBot-Depth, a masked depth modeling approach (arxiv:2601.17895). Each sample contains an RGB image, raw sensor depth, and ground truth depth. Total size: 2.71 TBDepth scale: millimeters (mm), stored as 16-bit PNGLicense: CC BY-NC-SA 4.0 Sub-datasets Name Description Samples RobbyReal Real-world indoor scenes captured with multiple RGB-D cameras 1,400,000 RobbyVla Real-world data collected… See the full description on the dataset page: https://huggingface.co/datasets/robbyant/mdm_depth.

Task_categories:depth-EstimationLanguage:enModality:3d3D3dDepth

394K 29

Updated 2026-05-08 Source available

mlfoundations/MINT-1T-PDF-CC-2023-40 HF PQC Verified

🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-40.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal

393K 9

Updated 2026-05-01 Source available

cadene/droid HF Unverified

This dataset was created using LeRobot. DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset One of the biggest open-source dataset for robotics with 27.044,326 frames, 92,223 episodes, 31,308 unique task description in natural language. Ported from Tensorflow Dataset format (2TB) to LeRobotDataset format (400GB) with the help from IPEC-COMMUNITY. Visualization: LeRobot Homepage: Droid Paper: Arxiv License: apache-2.0 Dataset Structure meta/info.json: {… See the full description on the dataset page: https://huggingface.co/datasets/cadene/droid.

Task_categories:roboticsLanguage:enSize_categories:10M<n<100MModality:videoLeRobotOpenx

387K 16

Updated 2026-06-30 Source available

cross-encoder/nli-deberta-v3-base HF Unverified

Zero-Shot ClassificationSentence-TransformersPyTorchONNXSafetensorsDeberta-V2 HIGH

384K 46

Updated 2026-06-30

google-t5/t5-large HF Unverified

TranslationTransformersPyTorchTfJAXSafetensors HIGH

382K 257

Updated 2026-06-30

HuggingFaceFW/fineweb-edu HF PQC Verified

📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We then… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.

Task_categories:text-GenerationLanguage:enSize_categories:1B<n<10BFormat:parquetModality:tabularModality:text

382K 1,167

Updated 2026-06-30 Source available

SWE-bench/SWE-bench_Multilingual HF Unverified

Language:enSize_categories:n<1KFormat:parquetModality:textLibrary:datasetsLibrary:pandas

379K 19

Updated 2026-06-30 Source available

LiheYoung/depth-anything-large-hf HF PQC Verified

Depth-EstimationTransformersSafetensorsDepth_anythingVision HIGH

376K 65

Updated 2026-06-30

PekingU/rtdetr_v2_r50vd HF Unverified