How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python
In this tutorial , we build a speech recognition and translation workflow using NVIDIA Canary-1B-v2 . We begin by setting up the required audio, NeMo, NumPy, and SciPy dependencies, then load the Canary model on a GPU-enabled runtime for efficient inference. From there, we prepare audio into a clean 16 kHz mono format, perform English ASR, translate speech into multiple languages, generate word and segment timestamps, export translated subtitles as an SRT file, test long-form transcription, run batch processing, and benchmark inference speed. At the end, we have a complete multilingual ASR and speech translation pipeline that we can adapt for real audio files, subtitle generation, and large-scale transcription experiments. Installing NeMo, Audio Libraries, NumPy, and SciPy Dependencies Copy Code Copied Use a different Browser import os, subprocess, sys SENTINEL = "/content/.canary_setup_done" if not os.path.exists(SENTINEL): def sh(c): print("$",...
