Model Hub

Browse PQC-verified AI models, datasets, and tools

Sort: Most Downloaded Most Liked Recently Updated

JAT Dataset Dataset Description The Jack of All Trades (JAT) dataset combines a wide range of individual datasets. It includes expert demonstrations by expert RL agents, image and caption pairs, textual data and more. The JAT dataset is part of the JAT project, which aims to build a multimodal generalist agent. Paper: https://huggingface.co/papers/2402.09844 Usage >>> from datasets import load_dataset >>> dataset = load_dataset("jat-project/jat-dataset"… See the full description on the dataset page: https://huggingface.co/datasets/jat-project/jat-dataset.

Task_categories:reinforcement-LearningTask_categories:text-GenerationTask_categories:question-AnsweringAnnotations_creators:foundAnnotations_creators:machine-GeneratedSource_datasets:conceptual-Captions

548K 63

Updated 2026-06-29 Source available

jat-project/jat-dataset-tokenized HF Unverified

Dataset Card for "jat-dataset-tokenized" More Information needed

Size_categories:10M<n<100MFormat:parquetModality:timeseriesLibrary:datasetsLibrary:daskLibrary:mlcroissant

333K 15

Updated 2026-06-29 Source available

google-research-datasets/mbpp HF Unverified

Dataset Card for Mostly Basic Python Problems (mbpp) Dataset Summary The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us. Released here as part of… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/mbpp.

Annotations_creators:crowdsourcedAnnotations_creators:expert-GeneratedLanguage_creators:crowdsourcedLanguage_creators:expert-GeneratedMultilinguality:monolingualSource_datasets:original

186K 230

Updated 2026-05-08 Source available

zekaiwang/trex_dataset HF Unverified

T-Rex Dataset A large-scale, tactile-reactive bimanual manipulation dataset, collected via teleoperation on a Dexmate Vega-1 robot with two Sharpa Wave dexterous hands. Stored as a LeRobotDataset v3.0. 🌐 Project Page · ✍️ Paper (arXiv) · 💻 Code (T-Rex) · 🚀 Dataset Quickstart · 📓 Colab notebook One episode from each of 20 motor primitives (head-camera view, cropped to the workspace), each with a different object. Teleoperation setup: Manus gloves + VIVE… See the full description on the dataset page: https://huggingface.co/datasets/zekaiwang/trex_dataset.

Task_categories:roboticsLanguage:enSize_categories:1M<n<10MFormat:parquetModality:tabularModality:text

140K 6

Updated 2026-06-29 Source available

HKUSTAudio/Audio-FLAN-Dataset HF Unverified

Audio-FLAN Dataset (Paper) (the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound. 1. Dataset Structure The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

Task_categories:text-To-SpeechTask_categories:text-To-AudioTask_categories:automatic-Speech-RecognitionLanguage:enLanguage:zhSize_categories:10M<n<100M

123K 42

Updated 2026-04-23 Source available

legacy-datasets/wikipedia HF Unverified

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:crowdsourced

120K 645

Updated 2026-06-28 Source available

google-research-datasets/paws HF Unverified

Dataset Card for PAWS: Paraphrase Adversaries from Word Scrambling Dataset Summary PAWS: Paraphrase Adversaries from Word Scrambling This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset. For further… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/paws.

Task_categories:text-ClassificationTask_ids:semantic-Similarity-ClassificationTask_ids:semantic-Similarity-ScoringTask_ids:text-ScoringTask_ids:multi-Input-Text-ClassificationAnnotations_creators:expert-Generated

99K 40

Updated 2026-06-29 Source available

labofsahil/pypi-packages-metadata-dataset HF Unverified

Size_categories:10M<n<100MModality:text

87K 0

Updated 2026-06-29 Source available

JosephusCheung/GuanacoDataset HF Unverified

Sorry, it's no longer available on Hugging Face. Please reach out to those who have already downloaded it. If you have a copy, please refrain from re-uploading it to Hugging Face. The people here don't deserve it. See also: https://twitter.com/RealJosephus/status/1779913520529707387 GuanacoDataset News: We're heading towards multimodal VQA, with blip2-flan-t5-xxl Alignment to Guannaco 7B LLM. Still under construction: GuanacoVQA weight & GuanacoVQA Dataset Notice: Effective… See the full description on the dataset page: https://huggingface.co/datasets/JosephusCheung/GuanacoDataset.

Task_categories:text-GenerationTask_categories:question-AnsweringLanguage:zhLanguage:enLanguage:jaLanguage:de

81K 516

Updated 2026-05-08 Source available

amphion/Emilia-Dataset HF Unverified

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. News 🔥 2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.

Task_categories:text-To-SpeechTask_categories:automatic-Speech-RecognitionLanguage:zhLanguage:enLanguage:jaLanguage:fr

80K 460

Updated 2026-06-29 Source available

muset-ai/DeepResearch-Bench-II-Dataset HF Unverified

Task_categories:text-GenerationTask_categories:text-ClassificationLanguage:zhLanguage:enSize_categories:n<1KModality:document

60K 2

Updated 2026-06-29 Source available

a2015003713/military-aircraft-detection-dataset HF Unverified

Military Aircraft Detection Dataset Military aircraft detection dataset in COCO and YOLO format. This dataset is synchronized from the original Kaggle dataset:https://www.kaggle.com/datasets/a2015003713/militaryaircraftdetectiondataset

Task_categories:object-DetectionTask_categories:image-ClassificationTask_categories:image-Feature-ExtractionSize_categories:10K<n<100KFormat:textModality:image

51K 1

Updated 2026-05-08 Source available

codraja2006/tomato-leaves-dataset HF Unverified

Tomato Leaves Dataset Overview This dataset contains images of tomato leaves categorized into different classes based on the type of disease or health condition. The dataset is divided into training, validation, and test sets, with a ratio of 8:1:1. The classes include various diseases as well as healthy leaves. The dataset includes both augmented and non-augmented images. Dataset Structure The dataset is organized into three main splits: train validation test… See the full description on the dataset page: https://huggingface.co/datasets/codraja2006/tomato-leaves-dataset.

Task_categories:feature-ExtractionTask_categories:image-ClassificationLanguage:enSize_categories:n<1KModality:imageTomato

48K 0

Updated 2026-04-30 Source available

ChengyouJia/agentic-critic-dataset HF Unverified

Agentic Critic Dataset High-quality AIGC images with rich metadata for aesthetic evaluation. Metadata Fields Each entry in metadata.jsonl contains: prompt: Positive prompt negative_prompt: Negative prompt model: Model name and hash sampler: Sampling method steps: Generation steps cfg_scale: CFG scale seed: Random seed stats: Engagement metrics image_path: Relative path to image Usage from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/ChengyouJia/agentic-critic-dataset.

Task_categories:image-ClassificationTask_categories:text-To-ImageSize_categories:n<1KAigcCivitaiAesthetic

48K 0

Updated 2026-04-25 Source available

Ahnuf/Military_Aircraft_Detection_Classification_Image_Dataset HF Unverified

Military Aircraft Detection & Classification Dataset 88 Classes with Advanced Background Suppression Overview This dataset is a professionally curated resource for training high-performance object detection and image classification models such as YOLOv11.It contains 88 distinct military aircraft classes and is explicitly designed for real-world deployment, where false positives from civilian aircraft, birds, and small drones are common. To address this, the… See the full description on the dataset page: https://huggingface.co/datasets/Ahnuf/Military_Aircraft_Detection_Classification_Image_Dataset.

Task_categories:object-DetectionTask_categories:image-ClassificationMilitaryAircraftAerospaceYolo

48K 1

Updated 2026-05-08 Source available

zalando-datasets/fashion_mnist HF Unverified

Dataset Card for FashionMNIST Dataset Summary Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing… See the full description on the dataset page: https://huggingface.co/datasets/zalando-datasets/fashion_mnist.

Task_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:expert-GeneratedLanguage_creators:foundMultilinguality:monolingualSource_datasets:original

43K 66

Updated 2026-05-08 Source available

anilbhujel/Gilt_posture_dataset HF Unverified

Gilt Posture Recognition Dataset Each RGB image has a matching depth image (same filename, .png extension). YOLO-format label files correspond to each image. 🐷 Annotated Postures Five postures are labeled using YOLO bounding boxes: Class Name Class ID feeding 0 lateral_lying 1 sitting 2 standing 3 sternal_lying 4 📊 Class Distribution Below is a histogram showing the distribution of posture classes across the dataset:… See the full description on the dataset page: https://huggingface.co/datasets/anilbhujel/Gilt_posture_dataset.

Task_categories:object-DetectionTask_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:expert-AnnotatedMultilinguality:monolingualLanguage:en

41K 1

Updated 2026-06-29 Source available

krithik274/NOAA-PIFSC-ESD-CORAL-Bleaching-Dataset HF Unverified

Dataset Card for NOAA-ESD-CORAL-Bleaching Classification Dataset v1 Overview For the development of machine learning models to classify coral health, specifically identifying healthy hard coral (CORAL) and bleached hard coral (CORAL_BL).This dataset contains underwater imagery collected by NOAA's Ecosystem Sciences Division (ESD) and other benthic surveys. Labels Label Name Functional Group CORAL Healthy Hard Coral Hard Coral CORAL_BL Bleached… See the full description on the dataset page: https://huggingface.co/datasets/krithik274/NOAA-PIFSC-ESD-CORAL-Bleaching-Dataset.

Task_categories:image-ClassificationLanguage:enModality:imageCoralBleachingCoral-Reef

41K 0

Updated 2026-06-29 Source available

Showing 18 of 18 items (page 1 of 1)

Prev Next