Model Hub

Browse PQC-verified AI models, datasets, and tools

N
nomic-ai/nomic-embed-text-v1.5 HF PQC Verified

Sentence SimilaritySentence-TransformersONNXSafetensorsNomic_bertFeature Extraction HIGH
N
nomic-ai/nomic-embed-text-v1 HF PQC Verified

Sentence SimilaritySentence-TransformersPyTorchONNXSafetensorsNomic_bert HIGH
F
facebook/fasttext-language-identification HF PQC Verified

Text ClassificationFasttextLanguage-Identification HIGH
Salesforce/wikitext HF PQC Verified

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/wikitext.

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:crowdsourced
S
segment-any-text/sat-3l-sm HF Unverified

Token ClassificationTransformersONNXSafetensorsXlm-TokenMultilingual HIGH
Kazimir-ai/text-to-image-prompts HF Unverified

The dataset of the most popular text-to-image prompts. Dataset Details Dataset Description Curated by: kazimir.ai Funded by [optional]: [More Information Needed] Shared by [optional]: https://kazimir.ai License: apache-2.0 Dataset Sources [optional] Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed] Uses Free to use. Dataset Structure CSV file… See the full description on the dataset page: https://huggingface.co/datasets/Kazimir-ai/text-to-image-prompts.

Language:enSize_categories:10K<n<100KFormat:csvModality:textLibrary:datasetsLibrary:pandas
OpenSQZ/AutoMathText-V2 HF Unverified

🚀 AutoMathText-V2: A 2.46 Trillion Token AI-Curated STEM Pretraining Dataset &nbsp; 🎉 AutoMathText-v2 has surpassed 1.5 million downloads! We'd love to know how you're using it. Please take 1 minute to fill out our use case survey. Your feedback will directly shape the future roadmap of this dataset.👉 Share your use case here 📊 AutoMathText-V2 consists of 2.46 trillion tokens of high-quality, deduplicated text spanning web content, mathematics, code, reasoning, and… See the full description on the dataset page: https://huggingface.co/datasets/OpenSQZ/AutoMathText-V2.

Task_categories:text-GenerationTask_categories:question-AnsweringLanguage:enLanguage:zhSize_categories:100M<n<1BModality:tabular
F
Falconsai/text_summarization HF Unverified

SummarizationTransformersPyTorchCoremlONNXSafetensors HIGH
B
black-forest-labs/FLUX.1-Kontext-dev HF PQC Verified

Image-To-ImageDiffusersSafetensorsImage GenerationFluxDiffusion-Single-File HIGH
A
ali-vilab/text-to-video-ms-1.7b HF Unverified

Text-To-VideoDiffusersSafetensorsDiffusers:TextToVideoSDPipeline HIGH
Skylion007/openwebtext HF Unverified

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances plain_text Size of downloaded dataset files: 13.51 GB Size of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:found
Showing 11 of 11 items (page 1 of 1)
Prev Next