Model Hub

Browse PQC-verified AI models, datasets, and tools

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/wikitext.

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:crowdsourced

1.3M 727

Updated 2026-06-29 Source available

segment-any-text/sat-3l-sm HF Unverified

Token ClassificationTransformersONNXSafetensorsXlm-TokenMultilingual HIGH

476K 12

Updated 2026-06-29

Kazimir-ai/text-to-image-prompts HF Unverified

The dataset of the most popular text-to-image prompts. Dataset Details Dataset Description Curated by: kazimir.ai Funded by [optional]: [More Information Needed] Shared by [optional]: https://kazimir.ai License: apache-2.0 Dataset Sources [optional] Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed] Uses Free to use. Dataset Structure CSV file… See the full description on the dataset page: https://huggingface.co/datasets/Kazimir-ai/text-to-image-prompts.

Language:enSize_categories:10K<n<100KFormat:csvModality:textLibrary:datasetsLibrary:pandas

228K 9

Updated 2026-06-29 Source available

OpenSQZ/AutoMathText-V2 HF Unverified

🚀 AutoMathText-V2: A 2.46 Trillion Token AI-Curated STEM Pretraining Dataset   🎉 AutoMathText-v2 has surpassed 1.5 million downloads! We'd love to know how you're using it. Please take 1 minute to fill out our use case survey. Your feedback will directly shape the future roadmap of this dataset.👉 Share your use case here 📊 AutoMathText-V2 consists of 2.46 trillion tokens of high-quality, deduplicated text spanning web content, mathematics, code, reasoning, and… See the full description on the dataset page: https://huggingface.co/datasets/OpenSQZ/AutoMathText-V2.

Task_categories:text-GenerationTask_categories:question-AnsweringLanguage:enLanguage:zhSize_categories:100M<n<1BModality:tabular

125K 78

Updated 2026-06-29 Source available

Falconsai/text_summarization HF Unverified

SummarizationTransformersPyTorchCoremlONNXSafetensors HIGH

119K 297

Updated 2026-06-29

black-forest-labs/FLUX.1-Kontext-dev HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationFluxDiffusion-Single-File HIGH

116K 2,574

Updated 2026-03-26

ali-vilab/text-to-video-ms-1.7b HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:TextToVideoSDPipeline HIGH

107K 666

Updated 2026-06-29

Skylion007/openwebtext HF Unverified

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances plain_text Size of downloaded dataset files: 13.51 GB Size of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:found

73K 505

Updated 2026-04-26 Source available

Showing 11 of 11 items (page 1 of 1)

Prev Next