Model Hub
Browse PQC-verified AI models, datasets, and tools
NVIDIA's optimized Llama 3.1 70B. Custom alignment for helpfulness with strong benchmark performance.
✨ Note: For all FineInstructions resources please visit: https://huggingface.co/fineinstructions This dataset is ~1B+ synthetic instruction-answer pairs or ~300B tokens created using the FineInstructions pipeline. The FineInstructions pipeline was run over the raw pre-training documents in the Nemotron-CC pre-training corpus (a subset of high-quality documents from CommonCrawl). See our paper for more details. Each .parquet file in the data folderhas a corresponding judge-*.json file that… See the full description on the dataset page: https://huggingface.co/datasets/fineinstructions/fineinstructions_nemotron.
Leopard-Instruct Paper | Github | Models-LLaVA | Models-Idefics2 Summaries Leopard-Instruct is a large instruction-tuning dataset, comprising 925K instances, with 739K specifically designed for text-rich, multiimage scenarios. It's been used to train Leopard-LLaVA [checkpoint] and Leopard-Idefics2 [checkpoint]. Loading dataset to load the dataset without automatically downloading and process the images (Please run the following codes with datasets==2.18.0)… See the full description on the dataset page: https://huggingface.co/datasets/wyu1/Leopard-Instruct.
LLaVA-OneVision-1.5 Instruction Data Paper | Code 📌 Introduction This dataset, LLaVA-OneVision-1.5-Instruct, was collected and integrated during the development of LLaVA-OneVision-1.5. LLaVA-OneVision-1.5 is a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. This meticulously curated 22M instruction dataset (LLaVA-OneVision-1.5-Instruct) is part of a comprehensive and… See the full description on the dataset page: https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data.