Hesgoal || TOTALSPORTEK|| F1 STREAMS || SOCCER STREAMS

NVIDIA AI has just released the largest open source voice AI dataset and the most advanced model for European languages

Nvidia has made a significant leap in the development of multilingual voice AI, unveiling granarythe largest open source voice dataset for European languages and two state-of-the-art models: Canary-1b-V2 and Parrot – tdt-0.6b-v3. This version sets a new standard for the accessible, high-quality resources (AST) standards for automatic speech recognition (ASR) and speech translation (AST), especially for underrepresented European languages.

Granary: The Basics of Multilingual Speaking AI

granary It is a huge multilingual corpus developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler. It can be delivered One million hours of audioand Voice recognition 650,000 hours and 350,000 voice translation. The dataset covers 25 European languages – representing nearly all official EU languages, as well as Russian and Ukrainian – and is critical to the attention to languages with limited annotation data, such as Croatian, Estonian and Maltese.

Key Features:

  • The largest open source voice dataset 25 European languages.
  • Pseudo-marking pipeline: Unlabeled public audio data data is processed using NVIDIA NEMO’s voice data processor, which increases structure and enhances quality, reducing the need for resource-intensive manual annotations.
  • Supports ASR and AST: Designed for transcription and translation tasks.
  • Open access: The global developer community can be used for flexible production-scale model training.

By leveraging clean, high-quality data, The granary can converge significantly faster. Research shows developers need Half of the granary data to achieve targeted accuracy compared to competing datasetsmaking it particularly valuable for resource-constrained languages and rapid prototyping.

Canary-1b-v2: Multilingual ASR + Translation (EN↔24 Language)

Canary-1b-V2 It’s one Billion Parameter Encoder Model Trained in the Granary to provide high-quality transcription and translation between English and 24 supported European languages.

Its accuracy and multitasking capabilities are architectural:

  • Supported languages: 25 European languages, double the coverage of canary 4.
  • The most advanced performance: The accuracy is comparable to the model that is three times larger, but Up to 10× Faster Inference.
  • Multitasking features: There are solid efforts in both ASR and AST missions.
  • feature: Automatic punctuation, capitalization, word and subdivision timestamps – even timestamp translation output.
  • architecture: FastConformer encoder using transformer decoder; unify vocabulary through all languages of sentence tokens.
  • robustness: Maintain strong performance in noisy conditions and resist output illusions.

Evaluation Highlights:

  • ASR Word Error Rate (WER): 7.15% (AMI dataset), 10.82% (LibrisPeech Clean).
  • AST Comet Score: 79.3 (x→English), 84.56 (English→x).
  • deploy: Obtained under CC by 4.0 license; optimized for NVIDIA GPU acceleration systems, allowing fast training and infer scalable production use.

Parakeet-TDT-0.6B-V3: Real-time multilingual ASR

Parrot – tdt-0.6b-v3 It’s one 600 million parameter multilingual ASR model Designed for high-throughput or high-volume transcription in all 25 languages. It extends the parakeet family (formerly English-centric) to all reports in Europe.

  • Automatic language detection: Transcribing input audio without additional prompts.
  • Real-time features: Effectively transcribe audio bands up to 24 minutes in a single reasoning pass.
  • Fast, scalable and commercially ready: Prioritize low latency, batch processing and accurate output, with word-level timestamps, punctuation and capitalization.
  • robustness: Reliable even on complex content (digital, lyrics) and challenging audio conditions.

Impact on the development of voice AI

NVIDIA’s suite of food datasets and modeling accelerates the democratization of European voice AI, thus enabling scalable development:

  • Multilingual chatbot
  • Customer Service Voice Agent
  • Near real-time translation service

Developers, researchers and businesses can now build inclusive, high-quality applications that support language diversity and open access to these cool models and datasets


Check granary, NVIDIA CANARY-1B-V2 and NVIDIA PARAKESET-TDT-0.6B-V3. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

You may also like...