Model Hub
Browse PQC-verified AI models, datasets, and tools
LongBench is a comprehensive benchmark for multilingual and multi-task purposes, with the goal to fully measure and evaluate the ability of pre-trained language models to understand long text. This dataset consists of twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, few-shot learning, synthetic tasks, and code completion.
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks π Project Page: https://longbench2.github.io π» Github Repo: https://github.com/THUDM/LongBench π Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k toβ¦ See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongBench-v2.