visit
Today, has prepared a tutorial for you that will help you get to know AI voice technologies better! Let’s dive in!
It is one of the most exciting times for software development, what with the emergence of various "generative AI" tools in the market. Just name it, cover letter generation? Check! E-mail generation? Check! Automatic code comment generation? Check! Even outside coding and software development, the use case possibilities are enormous.
It is only natural that the software we build will incorporate voices as one of the features. Which is why, in this tutorial, we will showcase the "Speech Synthesis" feature offered by ElevenLabs in a simple app, which generates random words and have it spell it. To build the UI for this Python-based app, we will use Streamlit, a new UI library to share data science projects.
ElevenLabs is a voice technology research company which offers speech synthesis solution. With easy to use API, it allows developers to generate high-quality speeches using AI. It is made possible by the AI model which has been trained on a vast amount of audiobooks and also podcasts. The training allows the AI to deliver predictable and high-quality results in speech generation.
There are two main features that ElevenLabs has to offer, the first one is VoiceLab, where users can clone voices from recorded audio and/or existing pre-made voices, and also "design" voices based on gender, ages, ethnicities and races. Once users have the voices to work with, they can move on to the next feature, Speech Synthesis, where they can generate speeches using their designed voices or just using the pre-made ones.
Claude is the latest AI model developed by Anthropic, an AI research organization which focuses on improving the interoperability, robustness and safety of artificial intelligence systems.
The Claude model is designed to generate human-like responses, making it a powerful tool for a wide range of applications, from content creation, legal, to customer service. Just like any other AI models in the market, Claude is also trained on a diverse range of internet text. However, unlike most AI models, it has focus on "safety", which makes it possible to refuse outputs that it considers "harmful" or "untruthful" for the users.
Streamlit is an open-source Python library that makes it easy and possible for developers and data scientists to create and share visually appealing and customized web apps. Developers can use Streamlit to build and deploy fully featured data science apps in minutes. It is made possible by the simple and intuitive API that can be used to turn data scripts into UI components.
Next, we start adding more features, beginning with engineering prompt to get Claude model to give us a randomized word that is commonly misspelled. After that, we'll add text-to-voice generation provided by ElevenLabs to demonstrate how the multilingual model spell the words. Finally, we're going to test the simple app.
mkdir randomwords
cd randomwords
Next, we're going to use this directory as the basis of our Streamlit project. Because a Streamlit project is essentially a Python project, we need to do some steps to initialize our Python project, such as defining and activating our virtual environment.
# Creating the virtual environment
python -m venv env
# Activate the virtual environment
# On Linux/Mac
source env/bin/activate
# On Windows:
.\env\Scripts\activate
Once activated, the output of our terminal should show the name of the virtual environment (env), like so:
Next, it's time to install the libraries we need for this project! let's use the pip
command to install the streamlit
, anthropic
, and elevenlabs
library. Note that we also install a version-locked pydantic
library to prevent a Pydantic-related error in one of the elevenlabs
function.
pip install streamlit anthropic elevenlabs "pydantic==1.*"
With all the project's requirements out of the way, now let's dive into the coding part! Create a new file inside our project directory, let's call it randomwords_app.py
.
touch randomwords_app.py
import streamlit as st
st.title("Random Words Generator")
st.text("Hello, this is a random words generator app")
To wrap up our project initialization part, let's try test running the app. Make sure that our current working directory is still inside our project and our virtual environment is already activated. When everything is ready, use the streamlit run <app-name>
to run the app.
streamlit run randomwords_app.py
The app should open automatically in our default browsers! it should show the title and text for now. Next, we're going to add random word generation feature using Anthropic's Claude model.
One last thing though, we'll have to provide our app with the API keys for the services that we're going to use, namely Anthropic's Claude model and ElevenLabs' Speech Synthesis feature. As these keys are considered sensitive, we should keep them in a safe and isolated place.
However, this time we don't store them in a .env
file. This is because Streamlit deal with environment variables differently. According to the , we need to create a secret configuration file inside a .streamlit
directory. We can create the directory inside our project and then create the file.
mkdir .streamlit
touch .streamlit/secrets.toml
Let's edit the TOML file we created, note that TOML file uses different formatting from the usual .env
file.
xi_api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
claude_key = "sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
First of all, let's add some import statements. We're going to import the anthropic
library to generate our random words.
import streamlit as st
import anthropic
def generate_word():
prompt = (f"{anthropic.HUMAN_PROMPT} Give me one non-English word that's commonly misspelled and the meaning. Please strictly follow the format! example: Word: Schadenfreude; Meaning: joy at other's expenses."
f"{anthropic.AI_PROMPT} Word: Karaoke; Meaning: a form of entertainment where people sing popular songs over pre-recorded backing tracks."
f"{anthropic.HUMAN_PROMPT} Great! just like that. Remember, only respond following the pattern.")
c = anthropic.Anthropic(api_key=st.secrets["claude_key"])
resp = c.completions.create(
prompt=f"{prompt} {anthropic.AI_PROMPT}",
stop_sequences=[anthropic.HUMAN_PROMPT],
model="claude-v1.3-100k",
max_tokens_to_sample=900,
)
print(resp.completion)
return resp.completion
In this function, the most heavy lifting is done by **Anthropic's Claude mode**l (Thanks, Claude! 😉). However, our part in this function is how to make Claude return the exact format consistently. So we need to both instruct Claude to "strictly follow the format" and give it an example response by adding it after our initial prompt.
Finally, we make sure that Claude comply with our agreements by ask it to "Remember to only respond following the pattern". The function ends by returning the response from Claude.
Next, let's get back to editing the UI!st.title("Random Words Generator")
with st.container():
st.header("Random Word")
random_word = st.subheader("-")
word_meaning = st.text("Meaning: -")
st.write("Click the `Generate` button to generate new word")
if st.button("Generate"):
result = generate_word()
# Split the string on the semicolon
split_string = result.split(";")
# Split the first part on ": " to get the word
word = split_string[0].split(": ")[1]
# Split the second part on ": " to get the meaning
meaning = split_string[1].split(": ")[1]
print(f"word result: {word}")
random_word.subheader(word)
word_meaning.text(f"Meaning: {meaning}")
This time, we added a container with some elements inside it. The header, subheader for displaying the random word, and the text element to show the meaning of the word. We also have a text to show the hint on how to use the app, as well as a button beneath it.
In Streamlit, we can declare click event handler by using a conditional statement, where it returns True
when the button is clicked. In this code, we invoke the generate_word()
function which returns the generated word and the meaning, and split the result into separate variables for the word and the meaning, respectively. Finally, we update the subheader and the text element to display the word and the meaning.
Let's double check our code once again! It should contains the import statements, the function for generating the random word, and the updated UI which contains subheader, and text elements as well as button that generate the word by invoking the generate_word()
function.
import streamlit as st
import anthropic
def generate_word():
prompt = (f"{anthropic.HUMAN_PROMPT} Give me one non-English word that's commonly misspelled and the meaning. Please strictly follow the format! example: Word: Schadenfreude; Meaning: joy at other's expenses."
f"{anthropic.AI_PROMPT} Word: Karaoke; Meaning: a form of entertainment where people sing popular songs over pre-recorded backing tracks."
f"{anthropic.HUMAN_PROMPT} Great! just like that. Remember, only respond following the pattern.")
c = anthropic.Anthropic(api_key=st.secrets["claude_key"])
resp = c.completions.create(
prompt=f"{prompt} {anthropic.AI_PROMPT}",
stop_sequences=[anthropic.HUMAN_PROMPT],
model="claude-v1.3-100k",
max_tokens_to_sample=900,
)
print(resp.completion)
return resp.completion
st.title("Random Words Generator")
with st.container():
st.header("Random Word")
random_word = st.subheader("-")
word_meaning = st.text("Meaning: -")
st.write("Click the `Generate` button to generate new word")
if st.button("Generate"):
result = generate_word()
# Split the string on the semicolon
split_string = result.split(";")
# Split the first part on ": " to get the word
word = split_string[0].split(": ")[1]
# Split the second part on ": " to get the meaning
meaning = split_string[1].split(": ")[1]
print(f"word result: {word}")
random_word.subheader(word)
word_meaning.text(f"Meaning: {meaning}")
Let's run the app once again with the same command. We can also just rerun the app by clicking the upper right menu and click "Rerun" if we've had the app running before.
It should show this updated user interface.
Now, let's try clicking the Generate
button!
One of the sweet things about Streamlit is that it handled loading and provided the loading indicator out of the box. We should see the indicator in the upper-right corner, as well as the option to "stop" the operation. Neat, huh?
After a few seconds, the result should be showed in the UI.
Perfect! notice that the app correctly split the generated text from the Claude model into word and the meaning. However, if the result doesn't come out according to the expected format, we can always click the Generate
button again.
The first step of this section is, as you've probably guessed, is to add more import statement! So, let's add some functions from elevenlabs
that we'll use for the speech generation feature.
import streamlit as st
import anthropic
++ from elevenlabs import generate, set_api_key
def generate_speech(word):
set_api_key(st.secrets['xi_api_key'])
audio = generate(
text=word,
voice="Bella",
model='eleven_multilingual_v1'
)
return audio
Thanks to the simplicity and readability of Python, and also ElevenLabs easy-to-use API, we can generate the speech by using this code alone! The function accepts the random word which we use to generate the speech. We also specifically use "eleven_multilingual_v1" model which is a multilingual model, perfect for our use case to demonstrate the spelling and pronounciation of foreign and commonly misspelled words! Finally, we use the "Bella" voice for this tutorial, which is one of the pre-made voice provided by ElevenLabs.
print(f"word result: {word}")
random_word.subheader(word)
word_meaning.text(f"Meaning: {meaning}")
++ speech = generate_speech(word)
++ st.audio(speech, format='audio/mpeg')
Just below our latest code from earlier, we add the variable to store the generated speech, and run the speech using audio player provided by st.audio
function from Streamlit. At this point, our app should work as expected, only showing the audio player when there is a random word available to "read".
Let's run the app again using streamlit run
or just rerun the app if we have it running already. It should look exactly the same as the last time we left it. However, let's try to click the "Generate" button this time!
In this short tutorial, hopefully we've explored the capabilities of speech generation technology offered by ElevenLabs. With the multilingual model, it's easy to generate speeches that is intended for non-English listener. In our use case, we need multilingual model to aid us in finding the correct way to pronounce and spell non-English words that are commonly misspelled.
On July 28, in where you can create your own voice AI app with models! (Additionally, you can leverage other AI models such as large language models, image and video generative models, etc., as long as they are not in direct competition with the hackathon technology).
*Your final submission should consist of a ready-to-play working prototype of your idea, a video pitch, and a presentation showcasing your solution.
You can find more tutorials and you can other hackathons to build with cutting-edge technologies!
And big thanks to the Septian Adi Nugraha - the Author of this article. 💚