{"id":468,"slug":"amphion--emilia-dataset","name":"Emilia-Dataset","author":"amphion","description":"\n\t\n\t\t\n\t\tEmilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation\n\t\n\n\nThis is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. \n\n\n\n\t\n\t\t\n\t\tNews 🔥\n\t\n\n\n2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.","tags":"[\"Task_categories:text-To-Speech\",\"Task_categories:automatic-Speech-Recognition\",\"Language:zh\",\"Language:en\",\"Language:ja\",\"Language:fr\"]","license":null,"framework":null,"parameters":null,"downloads":79667,"likes":460,"verified":0,"created_at":"2026-04-23 10:48:19","updated_at":"2026-06-29 13:23:35","source_url":"https://huggingface.co/datasets/amphion/Emilia-Dataset","source_platform":"huggingface","hf_repo_id":"amphion/Emilia-Dataset","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":467,"model_id":468,"version":"v1.0.0","manifest_hash":"8ca8117ffc12301dc498d1587955d87cb80c98086d794ffa942f64879bd9e20b","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/amphion--emilia-dataset/v1.0.0.json","created_at":"2026-04-23 10:48:19"}],"files":[],"signatures":[{"id":989,"version_id":467,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"e5997f3c6922de24d9dfc64c2b61f23b0e9ab81cace9e8222679e7f796d471e1","attestation_type":"registry","signed_at":"2026-04-23 10:48:20"}],"hndl":null}