Model Hub

Browse PQC-verified AI models, datasets, and tools

C
cagliostrolab/animagine-xl-4.0 HF PQC Verified

Text-to-ImageDiffusersSafetensorsStable-DiffusionStable-Diffusion-XlBase_model:stabilityai/stable-Diffusion-Xl-Base-1.0 CRITICAL
J
John6666/one-obsession-17-red-sdxl HF PQC Verified

Text-to-ImageDiffusersSafetensorsStable-DiffusionStable-Diffusion-XlNot-For-All-Audiences HIGH
HuggingFaceM4/the_cauldron HF Unverified

Dataset Card for The Cauldron Dataset description The Cauldron is part of the Idefics2 release. It is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-language model Idefics2. Load the dataset To load the dataset, install the library datasets with pip install datasets. Then, from datasets import load_dataset ds = load_dataset("HuggingFaceM4/the_cauldron", "ai2d") to download and load the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron.

Size_categories:1M<n<10MFormat:parquetModality:imageModality:textLibrary:datasetsLibrary:dask
S
speechbrain/emotion-recognition-wav2vec2-IEMOCAP HF Unverified

Audio-ClassificationSpeechbrainEmotionRecognitionWav2vec2PyTorch MEDIUM
G
google-t5/t5-3b HF PQC Verified

TranslationTransformersPyTorchTfSafetensorsT5 HIGH
C
cross-encoder/nli-MiniLM2-L6-H768 HF Unverified

Zero-Shot ClassificationSentence-TransformersPyTorchONNXSafetensorsOpenvino HIGH
J
John6666/diving-illustrious-real-asian-v50-sdxl HF PQC Verified

Text-to-ImageDiffusersSafetensorsStable-DiffusionStable-Diffusion-XlRealistic HIGH
M
MCG-NJU/videomae-base HF Unverified

Video-ClassificationTransformersPyTorchSafetensorsVideomaePretraining MEDIUM
abisee/cnn_dailymail HF Unverified

Dataset Card for CNN Dailymail Dataset Dataset Summary The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering. Supported Tasks and Leaderboards 'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.

Task_categories:summarizationTask_ids:news-Articles-SummarizationAnnotations_creators:no-AnnotationLanguage_creators:foundMultilinguality:monolingualSource_datasets:original
P
PekingU/rtdetr_r50vd_coco_o365 HF Unverified

Object-DetectionTransformersSafetensorsRt_detrVisionEnglish MEDIUM
G
google/pegasus-xsum HF Unverified

SummarizationTransformersPyTorchTfJAXPegasus HIGH
CohereLabs/xP3x HF Unverified

Dataset Card for xP3x Dataset Summary xP3x (Crosslingual Public Pool of Prompts eXtended) is a collection of prompts & datasets across 277 languages & 16 NLP tasks. It contains all of xP3 + much more! It is used for training future contenders of mT0 & BLOOMZ at project Aya @Cohere Labs 🧡 Creation: The dataset can be recreated using instructions available here together with the file in this repository named xp3x_create.py. We provide this version to save processing… See the full description on the dataset page: https://huggingface.co/datasets/CohereLabs/xP3x.

Task_categories:otherAnnotations_creators:expert-GeneratedAnnotations_creators:crowdsourcedMultilinguality:multilingualLanguage:afLanguage:ar
M
microsoft/VibeVoice-1.5B HF Unverified

Text-To-SpeechTransformersSafetensorsVibevoiceText GenerationPodcast HIGH
HuggingFaceFW/FineWeb HF PQC Verified

15T token dataset of cleaned English web data. Deduplicated and filtered from CommonCrawl, outperforms C4 and RefinedWeb for LLM pretraining.

DatasetPretrainingEnglish15T tokens CRITICAL
anisoleai/fineweb-tokenized HF Unverified

FineWeb Tokenized > 4 trillion tokens of the pre-tokenized data the 🌐 web has to offer What is it? This is a pre-tokenized version of the HuggingFaceFW/fineweb dataset (currently in-progress, tokenization of the ~15 trillion tokens corpus is ongoing). The data is being pre-processed and tokenized using the AnisoleAI BPE tokenizer (52,022 vocabulary size) and packed into compact uint16 Parquet shards. By distributing the pre-tokenized corpus, we eliminate… See the full description on the dataset page: https://huggingface.co/datasets/anisoleai/fineweb-tokenized.

Task_categories:text-GenerationLanguage:enSize_categories:n>1TModality:tabularModality:textTabular
A
autogluon/mitra-regressor HF Unverified

Tabular-RegressionSafetensors MEDIUM
allenai/objaverse HF Unverified

Objaverse Objaverse is a Massive Dataset with 800K+ Annotated 3D Objects. More documentation is coming soon. In the meantime, please see our paper and website for additional details. License The use of the dataset as a whole is licensed under the ODC-By v1.0 license. Individual objects in Objaverse are all licensed as creative commons distributable objects, and may be under the following licenses: CC-BY 4.0 - 721K objects CC-BY-NC 4.0 - 25K objects CC-BY-NC-SA 4.0 - 52K… See the full description on the dataset page: https://huggingface.co/datasets/allenai/objaverse.

Language:en
T
timpal0l/mdeberta-v3-base-squad2 HF Unverified

Question AnsweringTransformersPyTorchSafetensorsDeberta-V2Deberta HIGH
I
Intel/dpt-hybrid-midas HF PQC Verified

Depth-EstimationTransformersPyTorchDptVisionModel-Index MEDIUM
mlfoundations/MINT-1T-PDF-CC-2023-06 HF PQC Verified

🍃 MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 🍃 MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. 🍃 MINT-1T is designed to facilitate research in multimodal pretraining. 🍃 MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-06.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal
Showing 20 of 665 items (page 19 of 34)