How to Use the OpenAI Whisper Model to Convert Audio File Batch Into Text File via Python (Google Colab)?
Few days ago, my friend shared hundreds of audio files (.mp3) and asked me how to get the transcripts either manually or via online tools. I was thinking that we can leverage OpenAI’s Whisper model to handle this task so I am going to demonstrate how I converted them into text saved in a document.
Step 1: Store audio files in Google Drive.
Step 2: Create your own API key from OpenAI.
Step 3: Import packages and mount your drive. Use os.listdir to get the files names in the directory path (folder).
import os
from openai import OpenAI
from google.colab import drive
drive.mount('/content/drive')
client = OpenAI(api_key = 'Your API Key')
directory_path = 'Where you save your audio files'
directory_files = os.listdir(directory_path)
Step 4: Loop through the files and convert the audio files into transcription.text and then append (‘a’) it into Downloaded.txt.
for file in directory_files:
audio_file_path = '{}/{}'.format(directory_path, file)
print(audio_file_path)
audio_file= open(audio_file_path, "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
with open('/content/drive/MyDrive/Downloaded.txt', 'a') as writefile:
writefile.write("\n")
writefile.write(file)
writefile.write("\n")
writefile.write(transcription.text)
writefile.write("\n")
Notes
I was converting around 200 audio files (1–2 mins each) yesterday and here is the usage and charge for your reference.