Telegram bot for converting voice messages to text

Tr0jan_Horse

Moderator
Staff member
MODERATOR
ULTIMATE
PREMIUM
MEMBER
Joined
Oct 23, 2024
Messages
304
Reaction score
8,789
Deposit
0$
1746388967559.png
Hello!

In this article I will tell you how, not being a coder, I wrote a bot for Telegram. First, a little background. Actually, it is quite short.

My position is the following: You need to write letters in messengers!

Personally, I really don't like voice messages and people who constantly use them. For me, it is simply inconvenient, I don't always have headphones to listen to the message sent (listening through speakers, what was sent to you in a personal message is generally unacceptable), it is not always possible to hear it at all (for example, in transport, or on the street)... It takes a long time, after all. It is much easier and faster to read the letters sent than to listen to all these *eeeee*, *mmm*, *chmchavk* and background noise.

Every voice message sent to me became more and more annoying. And finally, I couldn't stand it any longer and decided to write myself a bot that would translate all this unpleasantness.

Python was chosen as the programming language, because I can at least write something in it, and Telegram was chosen as the platform for the bot, because the Telegram API for bots has a fairly low entry threshold.

Well, first you need to decide on the libraries to use and import them:
Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime

With the standard "os", "requests", "subprocess" and "datetime" everything is clear.
telebot is a package that provides a pure Python interface for Telegram Bots.
The Speech Recognition library is a tool for transmitting speech APIs from companies (google, microsoft, etc.), which, unlike others, has the ability to work offline. It is Speech Recognition that will be used for speech recognition.

ffmpeg will also be used. As the description from Wikipedia says, this is a set of free libraries with open source code that allow you to record, convert and transmit digital audio and video recordings in various formats. And this miracle is installed with a simple sudo apt-get install ffmpeg

Now let's create a couple of necessary variables:
Python:
logfile = str(datetime.date.today()) + '.log' # creating log file
token = 'YOUR_YOKEN' # ATTENTION! Do not save your tokens in main code files, use configuation files for this!
bot = telebot.TeleBot(token)
But, before converting something, you need to get something. You need to sketch out a function for receiving voice messages. It will only accept voice messages, it will not respond to others.

I tried to comment out everything possible in this function, instead of constantly breaking off and analyzing each line of code.
Python:
@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
# Get voice message
    try:
        print("Started recognition...")
        file_info = bot.get_file(message.voice.file_id)
        path = file_info.file_path # Full path to file (for example: voice/file_2.oga)
        fname = os.path.basename(path) # (for example: file_2.oga)
        doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path)) 
        with open(fname+'.oga', 'wb') as f:
            f.write(doc.content) # save here audio message
        process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])# using software ffmpeg for convert from .oga to .vaw
        result = audio_to_text(fname+'.wav') # Calling function for traslate audio to text
        bot.send_message(message.from_user.id, format(result)) # Send to user
    except sr.UnknownValueError as e:
        
        bot.send_message(message.from_user.id,  "Sorry, i can't translate this message")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
    except Exception as e:
        bot.send_message(message.from_user.id,  "I have trouble, developers are setuping this reason..")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
    finally:
        os.remove(fname+'.wav')
        os.remove(fname+'.oga')

bot.polling(none_stop=True, interval=0)
Well, and the function of converting audio to text:
Python:
def audio_to_text(dest_name: str):
    r = sr.Recognizer()
    message = sr.AudioFile(dest_name)
    with message as source:
        audio = r.record(source)
    result = r.recognize_google(audio, language="ru_RU")
    return result


Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime


logfile = str(datetime.date.today()) + '.log'
token = 'ваш_токен'
bot = telebot.TeleBot(token)

def audio_to_text(dest_name: str):

    r = sr.Recognizer()
    message = sr.AudioFile(dest_name)
    with message as source:
        audio = r.record(source)
    result = r.recognize_google(audio, language="ru_RU")
    return result


@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
    try:
        print("Started recognition...")
        file_info = bot.get_file(message.voice.file_id)
        path = file_info.file_path
        fname = os.path.basename(path)
        doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path))
        with open(fname+'.oga', 'wb') as f:
            f.write(doc.content)
        process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])
        result = audio_to_text(fname+'.wav')
        bot.send_message(message.from_user.id, format(result))
    except sr.UnknownValueError as e:
        bot.send_message(message.from_user.id,  "Прошу прощения, но я не разобрал сообщение, или оно поустое...")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
    except Exception as e:
        bot.send_message(message.from_user.id,  "Error")
        with open(logfile, 'a', encoding='utf-8') as f:
            f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
    finally:
        os.remove(fname+'.wav')
        os.remove(fname+'.oga')

bot.polling(none_stop=True, interval=0)
This whole hellish machine works like this:
- The user sends/forwards a voice message to the bot
- The bot conjures
- The bot sends the user a translated message
1746389936148.png
1746389918825.png
P.S. The bot works with quite high accuracy, translating even long messages, censoring indecent words. The bot also works on both Linux and Windows.

Actually, that's it... Personally, I am quite satisfied with the result. What do you think? Write what can be improved or changed, I will gladly listen to everything and take it into account. And once again, I remind you that with Python I work at the level of "scribbling a shitty script, as long as it works", so don't throw stones too much if something is wrong)

And of course, thank you for reading this article to the end.
 
Top Bottom