Model Hub
Browse PQC-verified AI models, datasets, and tools
Multilingual Speech Commands Dataset (15 Languages, Augmented) This dataset contains augmented speech command samples in 15 languages, derived from multiple public datasets. Only commands that overlap with the Google Speech Commands (GSC) vocabulary are included, making the dataset suitable for multilingual keyword spotting tasks aligned with GSC-style classification. Audio samples have been augmented using standard audio techniques to improve model robustness (e.g., time-shifting… See the full description on the dataset page: https://huggingface.co/datasets/artur-muratov/multilingual-speech-commands-15lang.
MedQA-Darija-MultiLingual The largest open trilingual medical Q&A dataset with directly-playable speech audio for English, French, and Moroccan Darija. A research dataset for the BRAIN HEALTH initiative, designed for multilingual medical NLP, low-resource speech recognition, healthcare chatbots, and clinical education tools targeting Morocco and the broader Maghreb region. Dataset is currently in scientific validation phase. After programmatic validation (Stage 1 LOF outlier… See the full description on the dataset page: https://huggingface.co/datasets/Williamsanderson/MedQA-Darija-MultiLingual.