Model Hub

Browse PQC-verified AI models, datasets, and tools

mlfoundations/MINT-1T-PDF-CC-2023-40 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-40.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal
mlfoundations/MINT-1T-PDF-CC-2023-06 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-06.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal
mlfoundations/MINT-1T-PDF-CC-2024-18 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2024-18.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100B<n<1TMultimodal
mlfoundations/MINT-1T-PDF-CC-2023-23 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-23.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:1M<n<10MFormat:webdatasetModality:image
mlfoundations/MINT-1T-HTML HF Unverified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:100M<n<1BFormat:parquetModality:text
mlfoundations/MINT-1T-PDF-CC-2023-14 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-14.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:1M<n<10MFormat:webdatasetModality:image
mlfoundations/MINT-1T-PDF-CC-2023-50 HF PQC Verified

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-PDF-CC-2023-50.

Task_categories:image-To-TextTask_categories:text-GenerationLanguage:enSize_categories:1M<n<10MFormat:webdatasetModality:image
Showing 7 of 7 items (page 1 of 1)
Prev Next