{"id":528,"slug":"huggingfacem4--the_cauldron","name":"the_cauldron","author":"HuggingFaceM4","description":"\n\t\n\t\t\n\t\tDataset Card for The Cauldron\n\t\n\n\n\n\t\n\t\t\n\t\tDataset description\n\t\n\nThe Cauldron is part of the Idefics2 release.\nIt is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-language model Idefics2.\n\n\t\n\t\t\n\t\tLoad the dataset\n\t\n\nTo load the dataset, install the library datasets with pip install datasets. Then,\nfrom datasets import load_dataset\nds = load_dataset(\"HuggingFaceM4/the_cauldron\", \"ai2d\")\n\nto download and load the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron.","tags":"[\"Size_categories:1M<n<10M\",\"Format:parquet\",\"Modality:image\",\"Modality:text\",\"Library:datasets\",\"Library:dask\"]","license":null,"framework":null,"parameters":null,"downloads":205019,"likes":527,"verified":0,"created_at":"2026-05-05 09:21:20","updated_at":"2026-05-08 16:45:15","source_url":"https://huggingface.co/datasets/HuggingFaceM4/the_cauldron","source_platform":"huggingface","hf_repo_id":"HuggingFaceM4/the_cauldron","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":527,"model_id":528,"version":"v1.0.0","manifest_hash":"0e9db0ea8317079e20f4ae9eca635303e32db2766e7e404fe4c7e51c59e92a31","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/huggingfacem4--the_cauldron/v1.0.0.json","created_at":"2026-05-05 09:21:20"}],"files":[],"signatures":[{"id":1052,"version_id":527,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"d7fc816d17eb6aa8b7003d27d5d1a314abf94815ca3d40a35fd0dd15f724036e","attestation_type":"registry","signed_at":"2026-05-05 09:21:20"}],"hndl":null}