Model Hub

Browse PQC-verified AI models, datasets, and tools

SAGE-10k SAGE-10k is a large-scale interactive indoor scene dataset featuring realistic layouts, generated by the agentic-driven pipeline introduced in "SAGE: Scalable Agentic 3D Scene Generation for Embodied AI". The dataset contains 10,000 diverse scenes spanning 50 room types and styles, along with 565K uniquely generated 3D objects. 🔑 Key Features SAGE-10k integrates a wide variety of scenes, and particularly, preserves small items for… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/SAGE-10k.

Task_categories:text-To-3dLanguage:enSize_categories:10K<n<100KScene-GenerationInteractive-ScenesEmbodied-AI

206K 70

Updated 2026-04-21 Source available

nvidia/segformer-b1-finetuned-ade-512-512 HF Unverified

Image-SegmentationTransformersPyTorchTfSegformerVision MEDIUM

193K 15

Updated 2026-04-30

anon8231489123/ShareGPT_Vicuna_unfiltered HF Unverified

Further cleaning done. Please look through the dataset and ensure that I didn't miss anything. Update: Confirmed working method for training the model: https://huggingface.co/AlekseyKorshuk/vicuna-7b/discussions/4#64346c08ef6d5abefe42c12c Two choices: Removes instances of "I'm sorry, but": https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json Has instances of "I'm sorry, but":… See the full description on the dataset page: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.

Language:en

172K 866

Updated 2026-05-08 Source available

nvidia/Nemotron-CC-v2 HF Unverified

Nemotron-Pre-Training-Dataset-v1 Release Data Overview This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of intelligent, globally-capable models. This dataset supports NVIDIA Nemotron Nano 2, a family of large language models (LLMs) that consists of the NVIDIA-Nemotron-Nano-9B-v2, NVIDIA-Nemotron-Nano-9B-v2-Base, and NVIDIA-Nemotron-Nano-12B-v2-Base… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Nemotron-CC-v2.

Task_categories:text-GenerationSize_categories:1B<n<10BFormat:parquetModality:textLibrary:datasetsLibrary:dask

147K 116

Updated 2026-05-02 Source available

mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M HF Unverified

🚀 LLaVA-One-Vision-1.5-Mid-Training-85M Dataset is being uploaded 🚀 Upload Status All Completed: ImageNet-21k、LAIONCN、DataComp-1B、Zero250M、COYO700M、SA-1B、MINT、Obelics 📜 Cite If you find LLaVA-One-Vision-1.5-Mid-Training-85M useful in your research, please consider to cite the following related papers: @misc{an2025llavaonevision15fullyopenframework, title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training}… See the full description on the dataset page: https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M.

Size_categories:10M<n<100MFormat:parquetModality:imageModality:textLibrary:datasetsLibrary:dask

140K 70

Updated 2026-05-08 Source available

Idavidrein/gpqa HF Unverified

Dataset Card for GPQA GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.

Benchmark:officialBenchmark:eval-YamlTask_categories:question-AnsweringTask_categories:text-GenerationLanguage:enSize_categories:1K<n<10K

128K 428

Updated 2026-05-08 Source available

nguyenvulebinh/vi-mrc-large HF Unverified

Question AnsweringTransformersPyTorchRobertaVnVi HIGH

110K 6

Updated 2026-04-26

facebook/vjepa2-vitl-fpc64-256 HF Unverified

Video-ClassificationTransformersSafetensorsVjepa2Feature ExtractionVideo HIGH

107K 196

Updated 2026-05-08

ServiceNow/GroundCUA HF Unverified

GroundCUA: Grounding Computer Use Agents on Human Demonstrations 🌐 Website | 📑 Paper | 🤗 Dataset | 🤖 Models GroundCUA Dataset GroundCUA is a large and diverse dataset of real UI screenshots paired with structured annotations for building multimodal computer use agents. It covers 87 software platforms across productivity tools, browsers, creative tools, communication apps, development environments, and system utilities. GroundCUA is designed for research on GUI… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/GroundCUA.

Task_categories:image-To-TextLanguage:enSize_categories:1M<n<10MModality:imageComputer_useAgents

106K 34

Updated 2026-05-08 Source available

HuggingFaceM4/FineVision HF Unverified

Fine Vision FineVision is a massive collection of datasets with 17.3M images, 24.3M samples, 88.9M turns, and 9.5B answer tokens, designed for training state-of-the-art open Vision-Language-Models. More detail can be found in the blog post: https://huggingface.co/spaces/HuggingFaceM4/FineVision Load the data from datasets import load_dataset, get_dataset_config_names # Get all subset names and load the first one available_subsets =… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/FineVision.

Size_categories:10M<n<100MFormat:parquetModality:imageModality:textLibrary:datasetsLibrary:dask

102K 484

Updated 2026-05-08 Source available

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data HF Unverified

LLaVA-OneVision-1.5 Instruction Data Paper | Code 📌 Introduction This dataset, LLaVA-OneVision-1.5-Instruct, was collected and integrated during the development of LLaVA-OneVision-1.5. LLaVA-OneVision-1.5 is a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. This meticulously curated 22M instruction dataset (LLaVA-OneVision-1.5-Instruct) is part of a comprehensive and… See the full description on the dataset page: https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data.

Task_categories:image-Text-To-TextLanguage:enSize_categories:10M<n<100MModality:imageModality:textMultimodal

90K 71

Updated 2026-05-08 Source available

nvidia/PhysicalAI-Robotics-GR00T-Teleop-GR1 HF Unverified

Introduction TL;DR: DreamDojo is a generalist robot world model pretrained on 44k hours of human egocentric data, showing unprecedented generalization to diverse objects and environments. Project page: https://dreamdojo-world.github.io/ Paper: https://arxiv.org/abs/2602.06949 Code: https://github.com/NVIDIA/DreamDojo How to Use Check out https://github.com/NVIDIA/DreamDojo Citation @article{gao2026dreamdojo, title={DreamDojo: A Generalist Robot… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-Teleop-GR1.

Size_categories:1M<n<10MFormat:parquetModality:tabularModality:videoLibrary:datasetsLibrary:dask

84K 23

Updated 2026-05-08 Source available

ali-vilab/text-to-video-ms-1.7b HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:TextToVideoSDPipeline HIGH

84K 658

Updated 2026-05-08

nvidia/Alpamayo-1.5-10B HF Unverified

RoboticsSafetensorsAlpamayo1_5Base_model:nvidia/Cosmos-Reason2-8BBase_model:finetune:nvidia/Cosmos-Reason2-8BEnglish HIGH

80K 64

Updated 2026-05-07

Showing 20 of 53 items (page 2 of 3)

Prev Next