Model Hub

Browse PQC-verified AI models, datasets, and tools

Sort: Most Downloaded Most Liked Recently Updated

MultiEURLEX comprises 65k EU laws in 23 official EU languages (some low-ish resource). Each EU law has been annotated with EUROVOC concepts (labels) by the Publication Office of EU. As with the English EURLEX, the goal is to predict the relevant EUROVOC concepts (labels); this is multi-label classification task (given the text, predict multiple labels).

Size_categories:10M<n<100MModality:textLibrary:datasetsLibrary:mlcroissant

118K 6

Updated 2026-05-08 Source available

moussaKam/mbarthez HF Unverified

Fill MaskTransformersPyTorchMbartText2text-GenerationSummarization MEDIUM

118K 8

Updated 2026-06-30

tau/splinter-base HF Unverified

Question AnsweringTransformersPyTorchSplinterSplinterModelEnglish MEDIUM

117K 1

Updated 2026-06-30

black-forest-labs/FLUX.1-Kontext-dev HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationFluxDiffusion-Single-File HIGH

116K 2,574

Updated 2026-03-26

PekingU/rtdetr_r101vd_coco_o365 HF Unverified

Object-DetectionTransformersSafetensorsRt_detrVisionEnglish MEDIUM

114K 18

Updated 2026-05-06

Williamsanderson/MedQA-Darija-MultiLingual HF Unverified

MedQA-Darija-MultiLingual The largest open trilingual medical Q&A dataset with directly-playable speech audio for English, French, and Moroccan Darija. A research dataset for the BRAIN HEALTH initiative, designed for multilingual medical NLP, low-resource speech recognition, healthcare chatbots, and clinical education tools targeting Morocco and the broader Maghreb region. Dataset is currently in scientific validation phase. After programmatic validation (Stage 1 LOF outlier… See the full description on the dataset page: https://huggingface.co/datasets/Williamsanderson/MedQA-Darija-MultiLingual.

Task_categories:question-AnsweringTask_categories:automatic-Speech-RecognitionTask_categories:text-To-SpeechLanguage:arLanguage:frLanguage:en

112K 4

Updated 2026-06-30 Source available

MMMU/MMMU HF Unverified

MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI) 🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub 🔔News 🛠️[2026-04-21]: Fixed option issue in test_Psychology_15. ‼️[2026-02-12]: We have released the answers for the test set! You can now evaluate your models on the test set locally! 🎉 🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25;… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.

Task_categories:question-AnsweringTask_categories:visual-Question-AnsweringTask_categories:multiple-ChoiceLanguage:enSize_categories:10K<n<100KFormat:parquet

112K 325

Updated 2026-05-08 Source available

nguyenvulebinh/vi-mrc-large HF Unverified

Question AnsweringTransformersPyTorchRobertaVnVi HIGH

110K 6

Updated 2026-04-26

Wan-AI/Wan2.2-TI2V-5B-Diffusers HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:WanPipelineEnglishChinese HIGH

109K 144

Updated 2026-06-27

Wan-AI/Wan2.2-T2V-A14B-Diffusers HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:WanPipeline HIGH

107K 140

Updated 2026-06-30

ILSVRC/imagenet-1k HF Unverified

Dataset Card for ImageNet Dataset Summary ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). ImageNet aims to provide on average 1000 images to illustrate each synset. Images of each concept are… See the full description on the dataset page: https://huggingface.co/datasets/ILSVRC/imagenet-1k.

Task_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:crowdsourcedLanguage_creators:crowdsourcedMultilinguality:monolingualSource_datasets:original

107K 844

Updated 2026-06-30 Source available

black-forest-labs/FLUX.2-klein-9B HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationImage-EditingFlux HIGH

107K 598

Updated 2026-04-05

ali-vilab/text-to-video-ms-1.7b HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:TextToVideoSDPipeline HIGH

106K 666

Updated 2026-06-30

HuggingFaceM4/FineVision HF Unverified

Fine Vision FineVision is a massive collection of datasets with 17.3M images, 24.3M samples, 88.9M turns, and 9.5B answer tokens, designed for training state-of-the-art open Vision-Language-Models. More detail can be found in the blog post: https://huggingface.co/spaces/HuggingFaceM4/FineVision Load the data from datasets import load_dataset, get_dataset_config_names # Get all subset names and load the first one available_subsets =… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/FineVision.

Size_categories:10M<n<100MFormat:parquetModality:imageModality:textLibrary:datasetsLibrary:dask

106K 498

Updated 2026-06-30 Source available

sebastiandizon/genius-song-lyrics HF Unverified

Size_categories:1M<n<10MFormat:csvModality:tabularModality:textLibrary:datasetsLibrary:pandas

106K 33

Updated 2026-06-27 Source available

ServiceNow/GroundCUA HF Unverified

GroundCUA: Grounding Computer Use Agents on Human Demonstrations 🌐 Website | 📑 Paper | 🤗 Dataset | 🤖 Models GroundCUA Dataset GroundCUA is a large and diverse dataset of real UI screenshots paired with structured annotations for building multimodal computer use agents. It covers 87 software platforms across productivity tools, browsers, creative tools, communication apps, development environments, and system utilities. GroundCUA is designed for research on GUI… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/GroundCUA.

Task_categories:image-To-TextLanguage:enSize_categories:1M<n<10MModality:imageComputer_useAgents

106K 34

Updated 2026-05-08 Source available

google-research-datasets/paws HF Unverified

Dataset Card for PAWS: Paraphrase Adversaries from Word Scrambling Dataset Summary PAWS: Paraphrase Adversaries from Word Scrambling This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset. For further… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/paws.

Task_categories:text-ClassificationTask_ids:semantic-Similarity-ClassificationTask_ids:semantic-Similarity-ScoringTask_ids:text-ScoringTask_ids:multi-Input-Text-ClassificationAnnotations_creators:expert-Generated

100K 40

Updated 2026-06-30 Source available

monologg/koelectra-base-v3-finetuned-korquad HF Unverified

Question AnsweringTransformersPyTorchSafetensorsElectra MEDIUM

98K 6

Updated 2026-06-30

mlfoundations/MINT-1T-PDF-CC-2023-50 HF PQC Verified

🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-50.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:1M<n<10MFormat:webdatasetModality:image

98K 13

Updated 2026-05-03 Source available

fixie-ai/common_voice_17_0 HF Unverified

Size_categories:10M<n<100MFormat:parquetModality:audioModality:textLibrary:datasetsLibrary:dask

97K 17

Updated 2026-06-30 Source available

Showing 20 of 665 items (page 26 of 34)

Prev Next