Model Hub

Browse PQC-verified AI models, datasets, and tools

chenxran/uspto_full HF Unverified

Dataset Card for "uspto_full" More Information needed

Size_categories:1M<n<10MFormat:parquetModality:textLibrary:datasetsLibrary:pandasLibrary:mlcroissant
HKUSTAudio/Audio-FLAN-Dataset HF Unverified

Audio-FLAN Dataset (Paper) (the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound. 1. Dataset Structure The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

Task_categories:text-To-SpeechTask_categories:text-To-AudioTask_categories:automatic-Speech-RecognitionLanguage:enLanguage:zhSize_categories:10M<n<100M
P
PekingU/rtdetr_v2_r18vd HF Unverified

Object-DetectionTransformersSafetensorsRt_detr_v2VisionEnglish MEDIUM
H
human-centered-summarization/financial-summarization-pegasus HF Unverified

SummarizationTransformersPyTorchTfSafetensorsPegasus HIGH
HuggingFaceFW/fineweb-2 HF Unverified

🥂 FineWeb2 A sparkling update with 1000s of languages What is it? This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages. The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments. In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-2.

Task_categories:text-GenerationLanguage:aaiLanguage:aakLanguage:aauLanguage:aazLanguage:aba
Beijing-AISI/panda-bench HF Unverified

PandaBench PandaBench is a comprehensive benchmark for evaluating Large Language Model (LLM) safety, focusing on jailbreak attacks, defense mechanisms, and evaluation methodologies. The PandaGuard framework architecture illustrating the end-to-end pipeline for LLM safety evaluation. The system connects three key components: Attackers, Defenders, and Judges. Dataset Description This repository contains the benchmark results from extensive evaluations of various LLMs… See the full description on the dataset page: https://huggingface.co/datasets/Beijing-AISI/panda-bench.

Task_categories:text-GenerationLanguage:enSize_categories:100K<n<1MFormat:csvModality:tabularModality:text
google/IFEval HF Unverified

Dataset Card for IFEval Dataset Summary This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset ifeval = load_dataset("google/IFEval") Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.

Task_categories:text-GenerationLanguage:enSize_categories:n<1KFormat:jsonModality:textLibrary:datasets
M
microsoft/xclip-base-patch32 HF Unverified

Video-ClassificationTransformersPyTorchSafetensorsXclipVision HIGH
ibrahimhamamci/CT-RATE HF Unverified

The CT-RATE Team organizes the VLM3D Challenge VLM3D 2026 (2nd Edition) → Challenge Finals at MICCAI 2026 VLM3D 2025 (1st Edition) → Challenge Finals at MICCAI 2025 • Workshop at ICCV 2025 The CT-RATE Team is developing the MR-RATE Dataset A large-scale brain MRI dataset with paired radiology reports for training 3D vision-language models. GitHub   |   Dataset   |   Metadata Dashboard Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography… See the full description on the dataset page: https://huggingface.co/datasets/ibrahimhamamci/CT-RATE.

Task_categories:image-To-TextTask_categories:text-To-ImageTask_categories:image-ClassificationTask_categories:question-AnsweringTask_categories:visual-Question-AnsweringTask_categories:zero-Shot-Classification
D
distilbert/distilbert-base-uncased-distilled-squad HF Unverified

Question AnsweringTransformersPyTorchTfTfliteCoreml HIGH
HuggingFaceFW/finetranslations HF Unverified

💬 FineTranslations The world's knowledge in 1+1T tokens of parallel text What is it? This dataset contains over 1 trillion tokens of parallel text in English and 500+ languages. It was obtained by translating data from 🥂 FineWeb2 into English using Gemma3 27B. We relied on datatrove's inference runner to deploy a synthetic data pipeline at scale. Its checkpointing and VLLM lifecycle management features allowed us to use leftover compute from the HF cluster… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finetranslations.

Task_categories:text-GenerationTask_categories:translationLanguage:abkLanguage:abqLanguage:absLanguage:acm
jobs-git/Zyda-2 HF Unverified

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, highly educational content, math, code, and scientific papers. To construct Zyda-2, we took the best open-source datasets available: Zyda, FineWeb, DCLM, and Dolma. Models trained on Zyda-2 significantly outperform identical models trained on the… See the full description on the dataset page: https://huggingface.co/datasets/jobs-git/Zyda-2.

Task_categories:text-GenerationLanguage:enSize_categories:n>1T
Muennighoff/multi_eurlex HF Unverified

MultiEURLEX comprises 65k EU laws in 23 official EU languages (some low-ish resource). Each EU law has been annotated with EUROVOC concepts (labels) by the Publication Office of EU. As with the English EURLEX, the goal is to predict the relevant EUROVOC concepts (labels); this is multi-label classification task (given the text, predict multiple labels).

Size_categories:10M<n<100MModality:textLibrary:datasetsLibrary:mlcroissant
B
black-forest-labs/FLUX.1-Kontext-dev HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationFluxDiffusion-Single-File HIGH
P
PekingU/rtdetr_r101vd_coco_o365 HF Unverified

Object-DetectionTransformersSafetensorsRt_detrVisionEnglish MEDIUM
fancyzhx/ag_news HF Unverified

Dataset Card for "ag_news" Dataset Summary AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml… See the full description on the dataset page: https://huggingface.co/datasets/fancyzhx/ag_news.

Task_categories:text-ClassificationTask_ids:topic-ClassificationAnnotations_creators:foundLanguage_creators:foundMultilinguality:monolingualSource_datasets:original
MMMU/MMMU HF Unverified

MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI) 🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub 🔔News 🛠️[2026-04-21]: Fixed option issue in test_Psychology_15. ‼️[2026-02-12]: We have released the answers for the test set! You can now evaluate your models on the test set locally! 🎉 🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25;… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.

Task_categories:question-AnsweringTask_categories:visual-Question-AnsweringTask_categories:multiple-ChoiceLanguage:enSize_categories:10K<n<100KFormat:parquet
N
nguyenvulebinh/vi-mrc-large HF Unverified

Question AnsweringTransformersPyTorchRobertaVnVi HIGH
B
black-forest-labs/FLUX.2-klein-9B HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationImage-EditingFlux HIGH
F
facebook/vjepa2-vitl-fpc64-256 HF Unverified

Video-ClassificationTransformersSafetensorsVjepa2Feature ExtractionVideo HIGH
Showing 20 of 531 items (page 21 of 27)