
Hello!
In this article I will tell you how, not being a coder, I wrote a bot for Telegram. First, a little background. Actually, it is quite short.
My position is the following: You need to write letters in messengers!
Personally, I really don't like voice messages and people who constantly use them. For me, it is simply inconvenient, I don't always have headphones to listen to the message sent (listening through speakers, what was sent to you in a personal message is generally unacceptable), it is not always possible to hear it at all (for example, in transport, or on the street)... It takes a long time, after all. It is much easier and faster to read the letters sent than to listen to all these *eeeee*, *mmm*, *chmchavk* and background noise.
Every voice message sent to me became more and more annoying. And finally, I couldn't stand it any longer and decided to write myself a bot that would translate all this unpleasantness.
Python was chosen as the programming language, because I can at least write something in it, and Telegram was chosen as the platform for the bot, because the Telegram API for bots has a fairly low entry threshold.
Well, first you need to decide on the libraries to use and import them:
Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime
With the standard "os", "requests", "subprocess" and "datetime" everything is clear.
telebot is a package that provides a pure Python interface for Telegram Bots.
The Speech Recognition library is a tool for transmitting speech APIs from companies (google, microsoft, etc.), which, unlike others, has the ability to work offline. It is Speech Recognition that will be used for speech recognition.
ffmpeg will also be used. As the description from Wikipedia says, this is a set of free libraries with open source code that allow you to record, convert and transmit digital audio and video recordings in various formats. And this miracle is installed with a simple sudo apt-get install ffmpeg
Now let's create a couple of necessary variables:
Python:
logfile = str(datetime.date.today()) + '.log' # creating log file
token = 'YOUR_YOKEN' # ATTENTION! Do not save your tokens in main code files, use configuation files for this!
bot = telebot.TeleBot(token)
I tried to comment out everything possible in this function, instead of constantly breaking off and analyzing each line of code.
Python:
@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
# Get voice message
try:
print("Started recognition...")
file_info = bot.get_file(message.voice.file_id)
path = file_info.file_path # Full path to file (for example: voice/file_2.oga)
fname = os.path.basename(path) # (for example: file_2.oga)
doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path))
with open(fname+'.oga', 'wb') as f:
f.write(doc.content) # save here audio message
process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])# using software ffmpeg for convert from .oga to .vaw
result = audio_to_text(fname+'.wav') # Calling function for traslate audio to text
bot.send_message(message.from_user.id, format(result)) # Send to user
except sr.UnknownValueError as e:
bot.send_message(message.from_user.id, "Sorry, i can't translate this message")
with open(logfile, 'a', encoding='utf-8') as f:
f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
except Exception as e:
bot.send_message(message.from_user.id, "I have trouble, developers are setuping this reason..")
with open(logfile, 'a', encoding='utf-8') as f:
f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
finally:
os.remove(fname+'.wav')
os.remove(fname+'.oga')
bot.polling(none_stop=True, interval=0)
Python:
def audio_to_text(dest_name: str):
r = sr.Recognizer()
message = sr.AudioFile(dest_name)
with message as source:
audio = r.record(source)
result = r.recognize_google(audio, language="ru_RU")
return result
Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import telebot
import requests
import speech_recognition as sr
import subprocess
import datetime
logfile = str(datetime.date.today()) + '.log'
token = 'ваш_токен'
bot = telebot.TeleBot(token)
def audio_to_text(dest_name: str):
r = sr.Recognizer()
message = sr.AudioFile(dest_name)
with message as source:
audio = r.record(source)
result = r.recognize_google(audio, language="ru_RU")
return result
@bot.message_handler(content_types=['voice'])
def get_audio_messages(message):
try:
print("Started recognition...")
file_info = bot.get_file(message.voice.file_id)
path = file_info.file_path
fname = os.path.basename(path)
doc = requests.get('https://api.telegram.org/file/bot{0}/{1}'.format(token, file_info.file_path))
with open(fname+'.oga', 'wb') as f:
f.write(doc.content)
process = subprocess.run(['ffmpeg', '-i', fname+'.oga', fname+'.wav'])
result = audio_to_text(fname+'.wav')
bot.send_message(message.from_user.id, format(result))
except sr.UnknownValueError as e:
bot.send_message(message.from_user.id, "Прошу прощения, но я не разобрал сообщение, или оно поустое...")
with open(logfile, 'a', encoding='utf-8') as f:
f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) + ':Message is empty.\n')
except Exception as e:
bot.send_message(message.from_user.id, "Error")
with open(logfile, 'a', encoding='utf-8') as f:
f.write(str(datetime.datetime.today().strftime("%H:%M:%S")) + ':' + str(message.from_user.id) + ':' + str(message.from_user.first_name) + '_' + str(message.from_user.last_name) + ':' + str(message.from_user.username) +':'+ str(message.from_user.language_code) +':' + str(e) + '\n')
finally:
os.remove(fname+'.wav')
os.remove(fname+'.oga')
bot.polling(none_stop=True, interval=0)
- The user sends/forwards a voice message to the bot
- The bot conjures
- The bot sends the user a translated message


P.S. The bot works with quite high accuracy, translating even long messages, censoring indecent words. The bot also works on both Linux and Windows.
Actually, that's it... Personally, I am quite satisfied with the result. What do you think? Write what can be improved or changed, I will gladly listen to everything and take it into account. And once again, I remind you that with Python I work at the level of "scribbling a shitty script, as long as it works", so don't throw stones too much if something is wrong)
And of course, thank you for reading this article to the end.