Model Hub
Browse PQC-verified AI models, datasets, and tools
Dataset Card for "jat-dataset-tokenized" More Information needed
Dataset Card for GLUE Dataset Summary GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. Supported Tasks and Leaderboards The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks: ax A manually-curated evaluation dataset for fine-grained analysis of system… See the full description on the dataset page: https://huggingface.co/datasets/nyu-mll/glue.
Dataset Card for "ai2_arc" Dataset Summary A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We are also including a corpus of over 14 million science sentences relevant to… See the full description on the dataset page: https://huggingface.co/datasets/allenai/ai2_arc.
LingBot-Depth Dataset Self-curated RGB-D dataset for training LingBot-Depth, a masked depth modeling approach (arxiv:2601.17895). Each sample contains an RGB image, raw sensor depth, and ground truth depth. Total size: 2.71 TBDepth scale: millimeters (mm), stored as 16-bit PNGLicense: CC BY-NC-SA 4.0 Sub-datasets Name Description Samples RobbyReal Real-world indoor scenes captured with multiple RGB-D cameras 1,400,000 RobbyVla Real-world data collected… See the full description on the dataset page: https://huggingface.co/datasets/robbyant/mdm_depth.
🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-40.
This dataset was created using LeRobot. DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset One of the biggest open-source dataset for robotics with 27.044,326 frames, 92,223 episodes, 31,308 unique task description in natural language. Ported from Tensorflow Dataset format (2TB) to LeRobotDataset format (400GB) with the help from IPEC-COMMUNITY. Visualization: LeRobot Homepage: Droid Paper: Arxiv License: apache-2.0 Dataset Structure meta/info.json: {… See the full description on the dataset page: https://huggingface.co/datasets/cadene/droid.