README.md
| 1 | --- |
| 2 | language: |
| 3 | - en |
| 4 | - hi |
| 5 | - or |
| 6 | - bn |
| 7 | - ta |
| 8 | - te |
| 9 | - kn |
| 10 | - ml |
| 11 | - mr |
| 12 | - gu |
| 13 | license: cc-by-nc-4.0 |
| 14 | pipeline_tag: audio-classification |
| 15 | library_name: transformers |
| 16 | tags: |
| 17 | - language-identification |
| 18 | - indian-languages |
| 19 | - multilingual |
| 20 | - speech |
| 21 | - asr-preprocessing |
| 22 | - callcenter-ai |
| 23 | - speech-analytics |
| 24 | - audio-classification |
| 25 | - wav2vec2 |
| 26 | - transformers |
| 27 | - pytorch |
| 28 | - huggingface |
| 29 | --- |
| 30 | **Model Name:** open-vakgyata |
| 31 | |
| 32 | **Model Overview:** |
| 33 | open-vakgyata is an open-source language identification model capable of detecting and classifying indian languages from speech inputs. |
| 34 | |
| 35 | **Supported Languages:** |
| 36 | | Language | Code | |
| 37 | |----------------------|-------| |
| 38 | | English (India) | en-IN | |
| 39 | | Hindi | hi-IN | |
| 40 | | Odia | or-IN | |
| 41 | | Bengali | bn-IN | |
| 42 | | Tamil | ta-IN | |
| 43 | | Telugu | te-IN | |
| 44 | | Kannada | kn-IN | |
| 45 | | Malayalam | ml-IN | |
| 46 | | Marathi | mr-IN | |
| 47 | | Gujarati | gu-IN | |
| 48 | |
| 49 | **Specification** |
| 50 | - Supported Sampling Rate: 16000 |
| 51 | - Recomonded Audio Format: 16kHz, 16bit PCM |
| 52 | |
| 53 | **Usage:** |
| 54 | |
| 55 | ```py |
| 56 | from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor |
| 57 | import torch |
| 58 | |
| 59 | device = "cpu" # "cuda" |
| 60 | |
| 61 | model_id = "onecxi/open-vakgyata" |
| 62 | |
| 63 | processor = AutoFeatureExtractor.from_pretrained(model_id) |
| 64 | model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device) |
| 65 | |
| 66 | ``` |
| 67 | |
| 68 | **Inference:** |
| 69 | |
| 70 | ```py |
| 71 | import torchaudio |
| 72 | |
| 73 | audio, sr = torchaudio.load("path/to/audio.wav") |
| 74 | |
| 75 | # Process the waveform and move to the appropriate device |
| 76 | inputs = processor(audio.flatten(), sampling_rate=sr, return_tensors="pt").to(device) |
| 77 | |
| 78 | # Perform inference |
| 79 | with torch.no_grad(): |
| 80 | logits = model(**inputs).logits |
| 81 | |
| 82 | # Get language probabilities |
| 83 | probs = logits.softmax(dim=-1).cpu().numpy() |
| 84 | language = model.config.id2label.get(probs.argmax()) |
| 85 | |
| 86 | print(language) |
| 87 | ``` |
| 88 | |
| 89 | --- |
| 90 | |
| 91 | ## **Citation** |
| 92 | |
| 93 | If you use this model in your research or application, please consider citing the model and its base source: |
| 94 | |
| 95 | ``` |
| 96 | @misc{vakgyata2024, |
| 97 | title={vakgyata: Language Identification for Indian Speech}, |
| 98 | author={OneCXI}, |
| 99 | year={2024}, |
| 100 | url={https://huggingface.co/onecxi/open-vakgyata} |
| 101 | } |
| 102 | ``` |
| 103 | |
| 104 | --- |