Telegram bot for converting voice messages to text

Tr0jan_Horse · May 4, 2025

Hello!

In this article I will tell you how, not being a coder, I wrote a bot for Telegram. First, a little background. Actually, it is quite short.

My position is the following: You need to write letters in messengers!

Personally, I really don't like voice messages and people who constantly use them. For me, it is simply inconvenient, I don't always have headphones to listen to the message sent (listening through speakers, what was sent to you in a personal message is generally unacceptable), it is not always possible to hear it at all (for example, in transport, or on the street)... It takes a long time, after all. It is much easier and faster to read the letters sent than to listen to all these *eeeee*, *mmm*, *chmchavk* and background noise.

Every voice message sent to me became more and more annoying. And finally, I couldn't stand it any longer and decided to write myself a bot that would translate all this unpleasantness.

Python was chosen as the programming language, because I can at least write something in it, and Telegram was chosen as the platform for the bot, because the Telegram API for bots has a fairly low entry threshold.

Well, first you need to decide on the libraries to use and import them:

Python:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime

With the standard "os", "requests", "subprocess" and "datetime" everything is clear.
telebot is a package that provides a pure Python interface for Telegram Bots.
The Speech Recognition library is a tool for transmitting speech APIs from companies (google, microsoft, etc.), which, unlike others, has the ability to work offline. It is Speech Recognition that will be used for speech recognition.

ffmpeg will also be used. As the description from Wikipedia says, this is a set of free libraries with open source code that allow you to record, convert and transmit digital audio and video recordings in various formats. And this miracle is installed with a simple sudo apt-get install ffmpeg

Now let's create a couple of necessary variables:

Python:

logfile = str(datetime.date.today()) + '.log' # creating log file
token = 'YOUR_YOKEN' # ATTENTION! Do not save your tokens in main code files, use configuation files for this!
bot = telebot.TeleBot(token)

But, before converting something, you need to get something. You need to sketch out a function for receiving voice messages. It will only accept voice messages, it will not respond to others.

I tried to comment out everything possible in this function, instead of constantly breaking off and analyzing each line of code.

Python:

@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
# Get voice message
    try:
        print("Started recognition...")
        file_info = bot.get_file(message.voice.file_id)
        path = file_info.file_path # Full path to file (for example: voice/file_2.oga)
        fname = os.path.basename(path) # (for example: file_2.oga)
        doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path)) 
        with open(fname+'.oga', 'wb') as f:
            f.write(doc.content) # save here audio message
        process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])# using software ffmpeg for convert from .oga to .vaw
        result = audio_to_text(fname+'.wav') # Calling function for traslate audio to text
        bot.send_message(message.from_user.id, format(result)) # Send to user
    except sr.UnknownValueError as e:
        
        bot.send_message(message.from_user.id,  "Sorry, i can't translate this message")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
    except Exception as e:
        bot.send_message(message.from_user.id,  "I have trouble, developers are setuping this reason..")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
    finally:
        os.remove(fname+'.wav')
        os.remove(fname+'.oga')

bot.polling(none_stop=True, interval=0)

Well, and the function of converting audio to text:

Python:

def audio_to_text(dest_name: str):
    r = sr.Recognizer()
    message = sr.AudioFile(dest_name)
    with message as source:
        audio = r.record(source)
    result = r.recognize_google(audio, language="ru_RU")
    return result

Python:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime


logfile = str(datetime.date.today()) + '.log'
token = 'ваш_токен'
bot = telebot.TeleBot(token)

def audio_to_text(dest_name: str):

    r = sr.Recognizer()
    message = sr.AudioFile(dest_name)
    with message as source:
        audio = r.record(source)
    result = r.recognize_google(audio, language="ru_RU")
    return result


@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
    try:
        print("Started recognition...")
        file_info = bot.get_file(message.voice.file_id)
        path = file_info.file_path
        fname = os.path.basename(path)
        doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path))
        with open(fname+'.oga', 'wb') as f:
            f.write(doc.content)
        process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])
        result = audio_to_text(fname+'.wav')
        bot.send_message(message.from_user.id, format(result))
    except sr.UnknownValueError as e:
        bot.send_message(message.from_user.id,  "Прошу прощения, но я не разобрал сообщение, или оно поустое...")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
    except Exception as e:
        bot.send_message(message.from_user.id,  "Error")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
    finally:
        os.remove(fname+'.wav')
        os.remove(fname+'.oga')

bot.polling(none_stop=True, interval=0)

This whole hellish machine works like this:
- The user sends/forwards a voice message to the bot
- The bot conjures
- The bot sends the user a translated message

P.S. The bot works with quite high accuracy, translating even long messages, censoring indecent words. The bot also works on both Linux and Windows.

Actually, that's it... Personally, I am quite satisfied with the result. What do you think? Write what can be improved or changed, I will gladly listen to everything and take it into account. And once again, I remind you that with Python I work at the level of "scribbling a shitty script, as long as it works", so don't throw stones too much if something is wrong)

And of course, thank you for reading this article to the end.

howpar · Aug 7, 2025

daviabrantes · Aug 30, 2025

Glen_GaleGG · Sep 9, 2025

ReaperRefs · Sep 27, 2025

mahdipayping · Nov 23, 2025

respect

Sn4re · Dec 20, 2025

ni2 · Dec 31, 2025

niksinz · Jan 15, 2026

Nanddu · Jan 26, 2026

revtron · Feb 9, 2026

CELESTIAk · Feb 22, 2026

ahk290790 · Apr 2, 2026

HqComboSpace · Apr 5, 2026

homerfeicek · Apr 11, 2026

SpokeOner · Apr 28, 2026

ThaBoss8997 · Apr 30, 2026

DarkGeldaris · May 9, 2026

iq_drake · Jun 7, 2026

Fableweaver · Jun 15, 2026

Telegram bot for converting voice messages to text

Moderator

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Hacker

Newbie

Similar threads