How to Use the OpenAI Whisper Model to Convert Audio File Batch Into Text File via Python (Google Colab)?

2 min readMar 5, 2024

Few days ago, my friend shared hundreds of audio files (.mp3) and asked me how to get the transcripts either manually or via online tools. I was thinking that we can leverage OpenAI’s Whisper model to handle this task so I am going to demonstrate how I converted them into text saved in a document.

OpenAI is an American artificial intelligence (AI) research lab and conducts AI research with the declared intention of promoting and developing a friendly AI.

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Step 1: Store audio files in Google Drive.

Step 2: Create your own API key from OpenAI.

Step 3: Import packages and mount your drive. Use os.listdir to get the files names in the directory path (folder).

import os
from openai import OpenAI

from google.colab import drive
drive.mount('/content/drive')
client = OpenAI(api_key = 'Your API Key')
directory_path = 'Where you save your audio files'
directory_files = os.listdir(directory_path)

Step 4: Loop through the files and convert the audio files into transcription.text and then append (‘a’) it into Downloaded.txt.

for file in directory_files:
  audio_file_path = '{}/{}'.format(directory_path, file) 
  print(audio_file_path)
  audio_file= open(audio_file_path, "rb")
  transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
  )
  with open('/content/drive/MyDrive/Downloaded.txt', 'a') as writefile:
    writefile.write("\n")
    writefile.write(file)
    writefile.write("\n")
    writefile.write(transcription.text)
    writefile.write("\n")

Notes

I was converting around 200 audio files (1–2 mins each) yesterday and here is the usage and charge for your reference.

Thank you!

If you want to support Informula, you can buy us a coffee here :)

𝗕𝘂𝘆 𝗺𝗲 𝗮 𝗰𝗼𝗳𝗳𝗲𝗲

How to Use the OpenAI Whisper Model to Convert Audio File Batch Into Text File via Python (Google Colab)?

Written by DigNo Ape

No responses yet