visit
Speech Recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to textual information.
You have probably seen it on Sci-fi, and personal assistants like Siri, Cortana, and Google Assistant, and other virtual assistants that interact with through voice.
In order to understand your voice these virtual assistants need to do speech recognition.Speech Recognition is a complex process, so I'm not going to teach you how to train a Machine Learning/Deep Learning Model to do that. Instead, I will instruct you how to do it using google speech recognition API.As long as you have the basics of Python you can successfully complete this tutorial and build your own fully functioning speech recognition programs in Python.
Installation
pip install PyAudio
pip install SpeechRecognition
Below are some of the supported Engines
Steps involved
Below is a sample app.py code, it is pretty straight forward
app.py
import speech_recognition as sr
recognizer = sr.Recognizer()
''' recording the sound '''
with sr.Microphone() as source:
print("Adjusting noise ")
recognizer.adjust_for_ambient_noise(source, duration=1)
print("Recording for 4 seconds")
recorded_audio = recognizer.listen(source, timeout=4)
print("Done recording")
''' Recorgnizing the Audio '''
try:
print("Recognizing the text")
text = recognizer.recognize_google(
recorded_audio,
language="en-US"
)
print("Decoded Text : {}".format(text))
except Exception as ex:
print(ex)
import speech_recognition as sr
recognizer = sr.Recognizer()
''' recording the sound '''
with sr.AudioFile("./sample_audio/speech.wav") as source:
recorded_audio = recognizer.listen(source)
print("Done recording")
''' Recorgnizing the Audio '''
try:
print("Recognizing the text")
text = recognizer.recognize_google(
recorded_audio,
language="en-US"
)
print("Decoded Text : {}".format(text))
except Exception as ex:
print(ex)
Output
kalebu@kalebu-PC:~$ python3 app_audio.py
Done recording
Recognizing the text
Decoded Text: python programming is the best of all by Jordan
$~ pip install pydub
import os
from pydub import AudioSegment
import speech_recognition as sr
from pydub.silence import split_on_silence
recognizer = sr.Recognizer()
def load_chunks(filename):
long_audio = AudioSegment.from_mp3(filename)
audio_chunks = split_on_silence(
long_audio, min_silence_len=1800,
silence_thresh=-17
)
return audio_chunks
for audio_chunk in load_chunks('./sample_audio/long_audio.mp3'):
audio_chunk.export("temp", format="wav")
with sr.AudioFile("temp") as source:
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio)
print("Chunk : {}".format(text))
except Exception as ex:
print("Error occured")
print(ex)
print("++++++")
Output
$ python long_audio.py
Chunk : by the time you finish reading this tutorial you have already covered several techniques and natural then
Chunk : learn more
Chunk : forgetting to subscribe to be updated on upcoming tutorials
++++++
Previously published at