Model Hub
Browse PQC-verified AI models, datasets, and tools
GroundCUA: Grounding Computer Use Agents on Human Demonstrations š Website | š Paper | š¤ Dataset | š¤ Models GroundCUA Dataset GroundCUA is a large and diverse dataset of real UI screenshots paired with structured annotations for building multimodal computer use agents. It covers 87 software platforms across productivity tools, browsers, creative tools, communication apps, development environments, and system utilities. GroundCUA is designed for research on GUI⦠See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/GroundCUA.
Dataset Card for GPQA GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model⦠See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.
LLaVA-OneVision-1.5 Instruction Data Paper | Code š Introduction This dataset, LLaVA-OneVision-1.5-Instruct, was collected and integrated during the development of LLaVA-OneVision-1.5. LLaVA-OneVision-1.5 is a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. This meticulously curated 22M instruction dataset (LLaVA-OneVision-1.5-Instruct) is part of a comprehensive and⦠See the full description on the dataset page: https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data.
Introduction TL;DR: DreamDojo is a generalist robot world model pretrained on 44k hours of human egocentric data, showing unprecedented generalization to diverse objects and environments. Project page: https://dreamdojo-world.github.io/ Paper: https://arxiv.org/abs/2602.06949 Code: https://github.com/NVIDIA/DreamDojo How to Use Check out https://github.com/NVIDIA/DreamDojo Citation @article{gao2026dreamdojo, title={DreamDojo: A Generalist Robot⦠See the full description on the dataset page: https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-Teleop-GR1.
OpenMathInstruct-2 OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways: Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.⦠See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
Dataset Card for "rotten_tomatoes" Dataset Summary Movie Review Dataset. This is a dataset of containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. This data was first used in Bo Pang and Lillian Lee, ``Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.'', Proceedings of the ACL, 2005. Supported Tasks and Leaderboards More Information Needed Languages⦠See the full description on the dataset page: https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes.