Model Hub

Browse PQC-verified AI models, datasets, and tools

LongBench is a comprehensive benchmark for multilingual and multi-task purposes, with the goal to fully measure and evaluate the ability of pre-trained language models to understand long text. This dataset consists of twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, few-shot learning, synthetic tasks, and code completion.

Task_categories:question-AnsweringTask_categories:text-GenerationTask_categories:summarizationTask_categories:text-ClassificationLanguage:enLanguage:zh

60K 184

Updated 2026-06-29 Source available

zai-org/LongBench-v2 HF Unverified

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks 🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongBench-v2.

Task_categories:multiple-ChoiceTask_categories:question-AnsweringTask_categories:text-ClassificationTask_categories:table-Question-AnsweringLanguage:enSize_categories:n<1K

47K 48

Updated 2026-06-29 Source available

Showing 4 of 4 items (page 1 of 1)

Prev Next