Model Hub

Multilingual Speech Commands Dataset (15 Languages, Augmented) This dataset contains augmented speech command samples in 15 languages, derived from multiple public datasets. Only commands that overlap with the Google Speech Commands (GSC) vocabulary are included, making the dataset suitable for multilingual keyword spotting tasks aligned with GSC-style classification. Audio samples have been augmented using standard audio techniques to improve model robustness (e.g., time-shifting… See the full description on the dataset page: https://huggingface.co/datasets/artur-muratov/multilingual-speech-commands-15lang.

Language:enLanguage:ruLanguage:kkLanguage:ttLanguage:arLanguage:tr

196K 2

Updated 2026-06-29 Source available

Williamsanderson/MedQA-Darija-MultiLingual HF Unverified

MedQA-Darija-MultiLingual The largest open trilingual medical Q&A dataset with directly-playable speech audio for English, French, and Moroccan Darija. A research dataset for the BRAIN HEALTH initiative, designed for multilingual medical NLP, low-resource speech recognition, healthcare chatbots, and clinical education tools targeting Morocco and the broader Maghreb region. Dataset is currently in scientific validation phase. After programmatic validation (Stage 1 LOF outlier… See the full description on the dataset page: https://huggingface.co/datasets/Williamsanderson/MedQA-Darija-MultiLingual.

Task_categories:question-AnsweringTask_categories:automatic-Speech-RecognitionTask_categories:text-To-SpeechLanguage:arLanguage:frLanguage:en

112K 4

Updated 2026-06-29 Source available

Showing 15 of 15 items (page 1 of 1)

Prev Next