Model Hub

Browse PQC-verified AI models, datasets, and tools

Sort: Most Downloaded Most Liked Recently Updated

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/wikitext.

Task_categories:text-GenerationTask_categories:fill-MaskTask_ids:language-ModelingTask_ids:masked-Language-ModelingAnnotations_creators:no-AnnotationLanguage_creators:crowdsourced

1.3M 727

Updated 2026-06-29 Source available

Salesforce/SFR-Embedding-2_R HF PQC Verified

State-of-the-art text embedding model. Top of MTEB leaderboard with strong retrieval and clustering.

TransformerEmbeddings7BRetrieval HIGH

1.2M 890

Updated 2026-03-26

Salesforce/GiftEvalPretrain HF Unverified

GIFT-Eval Pre-training Datasets Pretraining dataset aligned with GIFT-Eval that has 71 univariate and 17 multivariate datasets, spanning seven domains and 13 frequencies, totaling 4.5 million time series and 230 billion data points. Notably this collection of data has no leakage issue with the train/test split and can be used to pretrain foundation models that can be fairly evaluated on GIFT-Eval. 📄 Paper 🖥️ Code 📔 Blog Post 🏎️ Leader Board Ethical Considerations… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/GiftEvalPretrain.

Task_categories:time-Series-ForecastingSize_categories:1M<n<10MModality:timeseriesTimeseriesForecastingBenchmark

206K 39

Updated 2026-06-29 Source available

Showing 3 of 3 items (page 1 of 1)

Prev Next