Datasets

Training datasets with quantum-safe provenance

FLARE 2026: Multimodal Model for 3D Medical Image Parsing The task is to train one multimodal model for report generation and vision QA. Data Description The dataset contains two subsets for abdomen and lung CT report generation and VQA. FLARE-Task5-MLLM-3D/ ├── README.md ├── train # training set │ ├── CT-AMOS-1290 # source: https://era-ai-biomed.github.io/amos/ │ ├── CT-AMOS-Tr.json │ ├── CT-RATE-2000 # source:… See the full description on the dataset page: https://huggingface.co/datasets/FLARE-MedFM/FLARE26-MLLM-3D.

Task_categories:image-ClassificationLanguage:enMedical

49K 1

Updated 2026-06-29 Source available

codraja2006/tomato-leaves-dataset HF Unverified

Tomato Leaves Dataset Overview This dataset contains images of tomato leaves categorized into different classes based on the type of disease or health condition. The dataset is divided into training, validation, and test sets, with a ratio of 8:1:1. The classes include various diseases as well as healthy leaves. The dataset includes both augmented and non-augmented images. Dataset Structure The dataset is organized into three main splits: train validation test… See the full description on the dataset page: https://huggingface.co/datasets/codraja2006/tomato-leaves-dataset.

Task_categories:feature-ExtractionTask_categories:image-ClassificationLanguage:enSize_categories:n<1KModality:imageTomato

48K 0

Updated 2026-04-30 Source available

ChengyouJia/agentic-critic-dataset HF Unverified

Agentic Critic Dataset High-quality AIGC images with rich metadata for aesthetic evaluation. Metadata Fields Each entry in metadata.jsonl contains: prompt: Positive prompt negative_prompt: Negative prompt model: Model name and hash sampler: Sampling method steps: Generation steps cfg_scale: CFG scale seed: Random seed stats: Engagement metrics image_path: Relative path to image Usage from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/ChengyouJia/agentic-critic-dataset.

Task_categories:image-ClassificationTask_categories:text-To-ImageSize_categories:n<1KAigcCivitaiAesthetic

48K 0

Updated 2026-04-25 Source available

Ahnuf/Military_Aircraft_Detection_Classification_Image_Dataset HF Unverified

Military Aircraft Detection & Classification Dataset 88 Classes with Advanced Background Suppression Overview This dataset is a professionally curated resource for training high-performance object detection and image classification models such as YOLOv11.It contains 88 distinct military aircraft classes and is explicitly designed for real-world deployment, where false positives from civilian aircraft, birds, and small drones are common. To address this, the… See the full description on the dataset page: https://huggingface.co/datasets/Ahnuf/Military_Aircraft_Detection_Classification_Image_Dataset.

Task_categories:object-DetectionTask_categories:image-ClassificationMilitaryAircraftAerospaceYolo

48K 1

Updated 2026-05-08 Source available

zai-org/LongBench-v2 HF Unverified

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks 🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongBench-v2.

Task_categories:multiple-ChoiceTask_categories:question-AnsweringTask_categories:text-ClassificationTask_categories:table-Question-AnsweringLanguage:enSize_categories:n<1K

47K 48

Updated 2026-06-29 Source available

zalando-datasets/fashion_mnist HF Unverified

Dataset Card for FashionMNIST Dataset Summary Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing… See the full description on the dataset page: https://huggingface.co/datasets/zalando-datasets/fashion_mnist.

Task_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:expert-GeneratedLanguage_creators:foundMultilinguality:monolingualSource_datasets:original

43K 66

Updated 2026-05-08 Source available

deepguess/tornet-temporal HF Unverified

TorNet-Temporal: Temporal Dual-Pol NEXRAD Radar for Tornado Detection A large-scale dataset of storm-centered NEXRAD WSR-88D radar sequences for tornado detection and prediction, featuring 24-channel dual-polarimetric data across variable-length temporal sequences. Dataset Summary 24,862 storm events from NEXRAD Level-II radar archives (2013-2022) 8-22 consecutive radar scans per event (~4-5 min cadence, ~45-90 min total; median 13 frames) 24 channels: 6 dual-pol radar… See the full description on the dataset page: https://huggingface.co/datasets/deepguess/tornet-temporal.

Task_categories:image-ClassificationTask_categories:video-ClassificationSize_categories:10K<n<100KWeatherRadarTornado

42K 0

Updated 2026-05-05 Source available

anilbhujel/Gilt_posture_dataset HF Unverified

Gilt Posture Recognition Dataset Each RGB image has a matching depth image (same filename, .png extension). YOLO-format label files correspond to each image. 🐷 Annotated Postures Five postures are labeled using YOLO bounding boxes: Class Name Class ID feeding 0 lateral_lying 1 sitting 2 standing 3 sternal_lying 4 📊 Class Distribution Below is a histogram showing the distribution of posture classes across the dataset:… See the full description on the dataset page: https://huggingface.co/datasets/anilbhujel/Gilt_posture_dataset.

Task_categories:object-DetectionTask_categories:image-ClassificationTask_ids:multi-Class-Image-ClassificationAnnotations_creators:expert-AnnotatedMultilinguality:monolingualLanguage:en

41K 1

Updated 2026-06-29 Source available

krithik274/NOAA-PIFSC-ESD-CORAL-Bleaching-Dataset HF Unverified

Dataset Card for NOAA-ESD-CORAL-Bleaching Classification Dataset v1 Overview For the development of machine learning models to classify coral health, specifically identifying healthy hard coral (CORAL) and bleached hard coral (CORAL_BL).This dataset contains underwater imagery collected by NOAA's Ecosystem Sciences Division (ESD) and other benthic surveys. Labels Label Name Functional Group CORAL Healthy Hard Coral Hard Coral CORAL_BL Bleached… See the full description on the dataset page: https://huggingface.co/datasets/krithik274/NOAA-PIFSC-ESD-CORAL-Bleaching-Dataset.

Task_categories:image-ClassificationLanguage:enModality:imageCoralBleachingCoral-Reef

41K 0

Updated 2026-06-29 Source available

nguha/legalbench HF Unverified

Dataset Card for Dataset Name Homepage: https://hazyresearch.stanford.edu/legalbench/ Repository: https://github.com/HazyResearch/legalbench/ Paper: https://arxiv.org/abs/2308.11462 Dataset Description Dataset Summary The LegalBench project is an ongoing open science effort to collaboratively curate tasks for evaluating legal reasoning in English large language models (LLMs). The benchmark currently consists of 162 tasks gathered from 40… See the full description on the dataset page: https://huggingface.co/datasets/nguha/legalbench.

Task_categories:text-ClassificationTask_categories:question-AnsweringLanguage:enSize_categories:10K<n<100KFormat:csvModality:tabular

41K 177

Updated 2026-04-21 Source available

Forithmus/MR-RATE-atlas HF Unverified

MR-RATE: A Vision-Language Foundation Model and Dataset for Magnetic Resonance Imaging This is the MR-RATE-atlas repository, part of the MR-RATE dataset release. It contains atlas-registered MRI volumes in which all imaging sequences within each study have been spatially normalized to a standard atlas-space. For full dataset details, native-space MRI volumes, radiology reports, metadata, and data splits, please refer to… See the full description on the dataset page: https://huggingface.co/datasets/Forithmus/MR-RATE-atlas.

Task_categories:image-To-TextTask_categories:text-To-ImageTask_categories:image-ClassificationTask_categories:question-AnsweringTask_categories:visual-Question-AnsweringTask_categories:zero-Shot-Classification

41K 4

Updated 2026-06-29 Source available

MrigLabIITRopar/GroMo25 HF Unverified

GroMo25: Multiview Time-Series Plant Image Dataset for Age Estimation and Leaf Counting Dataset Summary GroMo25 is a multiview, time-series plant image dataset designed for plant age estimation (in days) and leaf counting tasks in precision agriculture. It contains high-quality images of four crop species — Wheat, Okra, Radish, and Mustard — captured over multiple days under controlled conditions. Each plant is photographed from 24 angles across 5 vertical levels per day… See the full description on the dataset page: https://huggingface.co/datasets/MrigLabIITRopar/GroMo25.

Task_categories:image-ClassificationTask_categories:text-To-ImageTask_categories:image-To-TextLanguage:enSize_categories:100K<n<1MFormat:csv

40K 0

Updated 2026-05-07 Source available

ComplexDataLab/OpenFake HF Unverified

Dataset Card for OpenFake OpenFake is a dataset and benchmark for detecting AI-generated images, with a focus on politically and socially salient content where misinformation risk is highest. It pairs real photographs with synthetic counterparts produced by a wide range of frontier proprietary generators, open-source diffusion models, and community fine-tunes. A separate in-the-wild test set is sourced from Reddit to evaluate detector performance on naturally circulated synthetic… See the full description on the dataset page: https://huggingface.co/datasets/ComplexDataLab/OpenFake.

Task_categories:image-ClassificationLanguage:enSize_categories:1M<n<10MFormat:parquetModality:imageModality:text

40K 29

Updated 2026-06-28 Source available

MahmoodLab/hest HF Unverified

Model Card for HEST-1k What is HEST-1k? A collection of 1,276 spatial transcriptomic profiles, each linked and aligned to a Whole Slide Image (with pixel size < 1.15 µm/px) and metadata. HEST-1k was assembled from 180 public and internal cohorts encompassing: 26 organs 2 species (Homo Sapiens and Mus Musculus) 398 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of >1.5 million expression/morphology pairs and >76 million nuclei… See the full description on the dataset page: https://huggingface.co/datasets/MahmoodLab/hest.

Task_categories:image-ClassificationTask_categories:feature-ExtractionTask_categories:image-SegmentationLanguage:enSize_categories:100B<n<1TSpatial-Transcriptomics

40K 86

Updated 2026-06-29 Source available

Voxel51/mvtec-ad HF Unverified

Dataset Card for MVTec AD This dataset originates from MVTec but is provided in a different format. You can easily load it using FiftyOne The total number of samples remains the same as the original: 5,354. Installation If you haven't already, install FiftyOne: pip install -U fiftyone Usage import fiftyone as fo import fiftyone.utils.huggingface as fouh # Load the dataset # Note: other available arguments include 'max_samples', etc dataset =… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/mvtec-ad.

Task_categories:image-ClassificationTask_categories:image-SegmentationLanguage:enSize_categories:1K<n<10KFormat:imagefolderModality:image

40K 8

Updated 2026-06-29 Source available

uoft-cs/cifar100 HF Unverified

Dataset Card for CIFAR-100 Dataset Summary The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class. There are 500 training images and 100 testing images per class. There are 50000 training images and 10000 test images. The 100 classes are grouped into 20 superclasses. There are two labels per image - fine label (actual class) and coarse label (superclass). Supported Tasks and Leaderboards image-classification: The… See the full description on the dataset page: https://huggingface.co/datasets/uoft-cs/cifar100.

Task_categories:image-ClassificationAnnotations_creators:crowdsourcedLanguage_creators:foundMultilinguality:monolingualSource_datasets:extended|other-80-Million-Tiny-ImagesLanguage:en

38K 64

Updated 2026-06-26 Source available

RichardErkhov/DASP HF Unverified

Dataset Card for DASP Dataset Description The DASP (Distributed Analysis of Sentinel-2 Pixels) dataset consists of cloud-free satellite images captured by Sentinel-2 satellites. Each image represents the most recent, non-partial, and cloudless capture from over 30 million Sentinel-2 images in every band. The dataset provides a near-complete cloudless view of Earth's surface, ideal for various geospatial applications. Images were converted from JPEG2000 to JPEG-XL to… See the full description on the dataset page: https://huggingface.co/datasets/RichardErkhov/DASP.

Task_categories:image-SegmentationTask_categories:image-ClassificationTask_categories:object-DetectionTask_categories:otherModality:geospatialSatellite-Imagery

37K 5

Updated 2026-05-08 Source available

tanganke/stanford_cars HF Unverified

Stanford Cars Dataset Dataset Overview Splits: Training: 8144 images used for model training. Test: 8041 images used for evaluation. Contrast: 8041 images with high contrast for robustness testing. Gaussian Noise: 8041 images corrupted by Gaussian noise for robustness testing. Impulse Noise: 8041 images corrupted by impulse noise for robustness testing. JPEG Compression: 8041 compressed images for robustness testing. Motion Blur: 8041 images with motion blur for… See the full description on the dataset page: https://huggingface.co/datasets/tanganke/stanford_cars.

Task_categories:image-ClassificationLanguage:enSize_categories:10K<n<100KFormat:parquetModality:imageLibrary:datasets

31K 27

Updated 2026-05-08 Source available

Showing 18 of 178 datasets (page 9 of 9)

Prev Next