Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. News 🔥 2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.
Use this model
Pull with QuantumShield
quantumshield pull amphion/Emilia-Dataset Verify integrity
quantumshield verify amphion/Emilia-Dataset pip install
pip install quantumshield && quantumshield pull amphion/Emilia-Dataset Unverified Model
This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.
README.md
Emilia-Dataset
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. News 🔥 2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.
Intended Uses
This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.
Quick Start
# Install the CLI pip install quantumshield # Pull the model quantumshield pull amphion/Emilia-Dataset # Verify file integrity quantumshield verify amphion/Emilia-Dataset
About
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline. News 🔥 2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.
Get this model
Pull with QuantumShield
quantumshield pull amphion/Emilia-Dataset Verify signatures
quantumshield verify amphion/Emilia-Dataset