Whisper を使ってみる

2023/01/01

OpenAI が公開した音声認識モデルを Windows10, Anaconda 環境で試してみた際の，やり方の覚え書き．

OpenAIのページ：https://openai.com/blog/whisper/
GitHub：https://github.com/openai/whisper

環境構築

まず，Anaconda Prompt で新しい環境を作成，pythonをインストールしておく．
今回は，仮想環境名を “whisper”, pythonのバージョンは3.10とした．

conda create -n whisper python=3.10

次に，whisper と whisper の動作に必要な ffmpeg をインストールする．

pip install git+https://github.com/openai/whisper.git
conda install ffmpeg -c conda-forge

これで，環境構築は完了となる．

使い方（コマンドライン）

音声認識したいファイルを hoge.mp3 とする．
次のようなコマンドを実行することで，音声認識ができる．

whisper hoge.mp3 --language Japanese --model small

--language オプションで，言語を指定できる．指定しなかった場合，自動で判別される．

--model オプションで，モデルのサイズを指定できる．
種類は，小さい方から順に，tiny, base, small, medium, large．デフォルトはsmallである．
言語が英語の場合に限り，largeモデルを除いて，medium.en のような .en モデルを使用できる．

自分の環境では，”UserWarning: FP16 is not supported on CPU; using FP32 instead” という警告が出た．
これを避けるためには，オプションで --fp16 False とする．

使い方（Python）

pyhon で whisper を実行するには，次のようにする．

import whisper

model = whisper.load_model("large")
result = model.transcribe("hoge.mp3", fp16=False, language='Japanese')
print(result["text"])

参考文献

OpenAI “Introducing Whisper” https://openai.com/blog/whisper/
GitHub “openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision” https://github.com/openai/whisper