Model Hub

Browse PQC-verified AI models, datasets, and tools

cais/mmlu HF Unverified

Dataset Card for MMLU Dataset Summary Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. This covers 57 tasks… See the full description on the dataset page: https://huggingface.co/datasets/cais/mmlu.

Task_categories:question-AnsweringTask_ids:multiple-Choice-QaAnnotations_creators:no-AnnotationLanguage_creators:expert-GeneratedMultilinguality:monolingualSource_datasets:original
C
CompVis/stable-diffusion-v1-4 HF PQC Verified

Text-to-ImageDiffusersSafetensorsStable-DiffusionStable-Diffusion-DiffusersDiffusers:StableDiffusionPipeline CRITICAL
F
facebook/seamless-m4t-v2-large HF Unverified

Speech RecognitionTransformersSafetensorsSeamless_m4t_v2Feature ExtractionAudio-To-Audio HIGH
nyu-mll/glue HF PQC Verified

Dataset Card for GLUE Dataset Summary GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. Supported Tasks and Leaderboards The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks: ax A manually-curated evaluation dataset for fine-grained analysis of system… See the full description on the dataset page: https://huggingface.co/datasets/nyu-mll/glue.

Task_categories:text-ClassificationTask_ids:acceptability-ClassificationTask_ids:natural-Language-InferenceTask_ids:semantic-Similarity-ScoringTask_ids:sentiment-ClassificationTask_ids:text-Scoring
allenai/ai2_arc HF Unverified

Dataset Card for "ai2_arc" Dataset Summary A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We are also including a corpus of over 14 million science sentences relevant to… See the full description on the dataset page: https://huggingface.co/datasets/allenai/ai2_arc.

Task_categories:question-AnsweringTask_ids:open-Domain-QaTask_ids:multiple-Choice-QaAnnotations_creators:foundLanguage_creators:foundMultilinguality:monolingual
C
cross-encoder/nli-deberta-v3-small HF Unverified

Zero-Shot ClassificationSentence-TransformersPyTorchONNXSafetensorsDeberta-V2 HIGH
XDOF/ABC-130k HF Unverified

ABC-130k ABC-130k is the largest open-source robot teleoperation dataset. It contains bimanual manipulation trajectories collected on two-arm YAM stations. Episodes are distributed as MCAP files, with subtask annotations kept as separate artifacts so they can be revised or extended independently of the underlying episode data. For details on the accompanying paper, see abc.bot. Please see the GitHub repo here for code to train and deploy with this dataset. Dataset… See the full description on the dataset page: https://huggingface.co/datasets/XDOF/ABC-130k.

Task_categories:roboticsLanguage:enSize_categories:n>1TRoboticsManipulationImitation-Learning
V
valhalla/distilbart-mnli-12-1 HF Unverified

Zero-Shot ClassificationTransformersPyTorchJAXBartText Classification HIGH
HuggingFaceFW/finephrase HF PQC Verified

Dataset Card for HuggingFaceFW/finephrase Dataset Summary Synthetic data generated by DataTrove: Model: HuggingFaceTB/SmolLM2-1.7B-Instruct (main) Source dataset: HuggingFaceFW/fineweb-edu, config sample-350BT, split train Generation config: temperature=1.0, top_p=1.0, top_k=50, max_tokens=2048, model_max_context=8192 Speculative decoding: {"method":"suffix","num_speculative_tokens":32} System prompt: None Input column: text Prompt families: faq prompt Rewrite the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finephrase.

Task_categories:text-GenerationTask_ids:language-ModelingAnnotations_creators:machine-GeneratedLanguage_creators:foundSource_datasets:HuggingFaceFW/fineweb-Edu/sample-350BTLanguage:en
F
facebook/mms-lid-256 HF Unverified

Audio-ClassificationTransformersPyTorchSafetensorsWav2vec2Mms HIGH
robbyant/mdm_depth HF Unverified

LingBot-Depth Dataset Self-curated RGB-D dataset for training LingBot-Depth, a masked depth modeling approach (arxiv:2601.17895). Each sample contains an RGB image, raw sensor depth, and ground truth depth. Total size: 2.71 TBDepth scale: millimeters (mm), stored as 16-bit PNGLicense: CC BY-NC-SA 4.0 Sub-datasets Name Description Samples RobbyReal Real-world indoor scenes captured with multiple RGB-D cameras 1,400,000 RobbyVla Real-world data collected… See the full description on the dataset page: https://huggingface.co/datasets/robbyant/mdm_depth.

Task_categories:depth-EstimationLanguage:enModality:3d3D3dDepth
mlfoundations/MINT-1T-PDF-CC-2023-40 HF PQC Verified

🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-40.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal
cadene/droid HF Unverified

This dataset was created using LeRobot. DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset One of the biggest open-source dataset for robotics with 27.044,326 frames, 92,223 episodes, 31,308 unique task description in natural language. Ported from Tensorflow Dataset format (2TB) to LeRobotDataset format (400GB) with the help from IPEC-COMMUNITY. Visualization: LeRobot Homepage: Droid Paper: Arxiv License: apache-2.0 Dataset Structure meta/info.json: {… See the full description on the dataset page: https://huggingface.co/datasets/cadene/droid.

Task_categories:roboticsLanguage:enSize_categories:10M<n<100MModality:videoLeRobotOpenx
C
cross-encoder/nli-deberta-v3-base HF Unverified

Zero-Shot ClassificationSentence-TransformersPyTorchONNXSafetensorsDeberta-V2 HIGH
G
google-t5/t5-large HF Unverified

TranslationTransformersPyTorchTfJAXSafetensors HIGH
HuggingFaceFW/fineweb-edu HF PQC Verified

📚 FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? 📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We then… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.

Task_categories:text-GenerationLanguage:enSize_categories:1B<n<10BFormat:parquetModality:tabularModality:text
SWE-bench/SWE-bench_Multilingual HF Unverified

Language:enSize_categories:n<1KFormat:parquetModality:textLibrary:datasetsLibrary:pandas
L
LiheYoung/depth-anything-large-hf HF PQC Verified

Depth-EstimationTransformersSafetensorsDepth_anythingVision HIGH
P
PekingU/rtdetr_v2_r50vd HF Unverified

Object-DetectionTransformersSafetensorsRt_detr_v2VisionEnglish MEDIUM
P
playgroundai/playground-v2.5-1024px-aesthetic HF PQC Verified

Text-to-ImageDiffusersSafetensorsPlaygroundDiffusers:StableDiffusionXLPipeline CRITICAL
Showing 20 of 664 items (page 17 of 34)