Model Hub
Browse PQC-verified AI models, datasets, and tools
Dataset Card for "hellaswag" Dataset Summary HellaSwag: Can a Machine Really Finish Your Sentence? is a new dataset for commonsense NLI. A paper was published at ACL2019. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances default Size of downloaded dataset files: 71.49 MB Size of the generated dataset: 65.32 MB Total amount of disk used: 136.81… See the full description on the dataset page: https://huggingface.co/datasets/Rowan/hellaswag.
arXiv Papers by Subject A reorganised version of the nick007x/arxiv-papers dataset, partitioned by subject code, year, and month for efficient selective access. Dataset Description This dataset contains metadata for over 2.5 million arXiv papers, organised into a hierarchical directory structure that allows users to download only the specific subjects and time periods they need, rather than the entire dataset. Motivation The original nick007x/arxiv-papers… See the full description on the dataset page: https://huggingface.co/datasets/permutans/arxiv-papers-by-subject.
Helsinki-NLP/fineweb-edu-translated fineweb-edu-tanslated is a collection of automatically translated documents from fineweb-edu. Translations are based on OPUS-MT and HPLT-MT models. The data covers 36,704,000 documents with over 28 billion space-searated tokens of English data translated into 36 languages. The total data set is incudes of over 960 billion tokens and the translated documents are aligned across all languages. More information about how the data has been produced can… See the full description on the dataset page: https://huggingface.co/datasets/Helsinki-NLP/fineweb-edu-translated.
Dataset Card for OpenAI HumanEval Dataset Summary The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models. Supported Tasks and Leaderboards Languages The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.
Boasting over 13,000 hours of cumulative data and 5 million+ clips, it ranks as the largest open-source embodied intelligence dataset in the industry. Update Notes:Stage 3 data upload completed. 13,000+ hours of pure dual-hand data with frame-level alignment latency < 1ms Full high-precision trajectory reconstruction, breaking the limit of superficial open source, fully ready-to-use 3,000+ contributors and 10,000+ real household scenarios with exceptional diversity Comprehensive… See the full description on the dataset page: https://huggingface.co/datasets/genrobot2025/10Kh-RealOmin-OpenData.
GIFT-Eval Pre-training Datasets Pretraining dataset aligned with GIFT-Eval that has 71 univariate and 17 multivariate datasets, spanning seven domains and 13 frequencies, totaling 4.5 million time series and 230 billion data points. Notably this collection of data has no leakage issue with the train/test split and can be used to pretrain foundation models that can be fairly evaluated on GIFT-Eval. 📄 Paper 🖥️ Code 📔 Blog Post 🏎️ Leader Board Ethical Considerations… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/GiftEvalPretrain.