“To have another language is to possess a second soul.”
— Charlemagne
Imagine you are traveling to a new country and had the ability to seamless have a conversation in their local language. That is what we will be trying to achieve in this article by building a simple text-to-audio converter app using Python, googletrans
API and gTTS
for text-to-speech conversion. We will go over the complete code, how the different components work, and how to leverage the different APIs to accomplish different tasks like converting text from English to any language and then converting it to audio in that specific language
The different components
The are three sections to this
- Translation –
googletrans
the Python library which uses Google Translation to help with language translation - Text-to-speech –
gTTS
(Google Text-to-Speech) which will help convert text to audio format in the language of our choice - Audio playback –
pygame
which is primarily used for developing games, but we will be using it here to playback the audio that’s generated bygTTS
Prerequisites
We can use pip command in terminal to install the needed libraries:
pip install gTTS googletrans==4.0.0-rc1 pygame
Note: Sometimes you might encounter the below error when running the actual Python code –
AttributeError: 'coroutine' object has no attribute 'text'
sys:1: RuntimeWarning: coroutine 'Translator.translate' was never awaited
Fix – Make sure you have the correct version of googletrans
installed. The version 4.0.0-rc1
is known to work well for synchronous operations.
Implementation
translate_text
The translate_text
function uses the googletrans
for text translation. It takes two parameters: text
, the actual string that needs to be translated, and dest_language
the target language code (e.g., 'es'
for Spanish). Inside the function, we create a Translator
object and call the translate
method which returns the translated text.
text_to_audio
The text_to_audio
function helps convert the text to audio using gTTS
and pygame
. It takes two parameters: text
and language
, this would be the same as the dest_language
input as we want the audio to be in the same language as the one it’s translated to. The function creates an audio file using gTTS
and stores it as an MP3 file. Then we initialize pygame.mixer
to handle audio playback, load the MP3, and then play it. We have a loop to ensure the audio fully finishes playing after which we can clean up the audio file if needed by setting should_clean_up_file
to True
Below is the complete code –
from gtts import gTTS
from googletrans import Translator
import pygame
import os
def translate_text(text, dest_language):
translator = Translator()
translation = translator.translate(text, dest=dest_language)
return translation.text
def text_to_audio(text, language):
mp3_file = f'{language}_output.mp3'
should_clean_up_file = True
try:
tts_file = gTTS(text=text, lang=language, slow=False)
tts_file.save(mp3_file)
pygame.mixer.init()
pygame.mixer.music.load(mp3_file)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(15)
finally:
if os.path.exists(mp3_file) and should_clean_up_file:
os.remove(mp3_file)
def main(english_text, target_language='en'):
translated_text = translate_text(english_text, target_language)
print(f"English Text: {english_text}")
print(f"Translated Text: {translated_text}")
text_to_audio(translated_text, target_language)
if __name__ == "__main__":
english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'es' # Spanish
main(english_text, target_language)
Input1 – English to Spanish:
english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'es' # Spanish
main(english_text, target_language)
Output:
Audio output:
Spanish Audio file
This would have created an es_output.mp3 in your current folder which would be played by pygame
Input2 – English to Japanese:
english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'ja' # Japanese
main(english_text, target_language)
Output:
Audio output:
Japanese Audio file
This would have created an ja_output.mp3 in your current folder which would be played by pygame
Applications and Use Cases
- Accessibility – This can be easily integrated into a Tourism app or a website which can greatly help people who want to explore a foreign country where they don’t speak the native language, to travel with confidence
- Language Learning – If someone is interested in learning a new language, we can leverage this tool to self-teach. We simply input the text we want translated and we get the converted text along with audio which can also help with pronunciation
- Content Consumption – For people who want to multi-task, say listening to an audiobook while driving, this tool would be handy as it can read out the contents in a pace that you prefer
- Multilingual Communication – In today’s world where multinational deals are common, having the power to articulate your thoughts, and business proposals to anyone in any language is a powerful asset that can make or break deals
Conclusion
There isn’t a space that can’t be benefited by this application. It’s simple to build but its benefits are vast. By developing this tool we not only have solved a real-world problem that many people face but have also learnt how we can use Python to make API calls,
initialize objects, invoke methods, functional programming, and try catch and clean up files after its use. Once you have mastered these and want a challenge you can try building an interactive GUI and host it in a web server to make it more user-friendly and add features like – the option to change pronunciation, pace, etc. The possibilities are endless and hope you keep pushing the boundaries of how we can use technology/coding to advance humankind.