One of our missions at HackerNoon is to make our platform available to all readers and writers across the globe. That means optimizing for niche use cases and making our platform as accessible as possible. As such, we're excited to launch a brand new AI-powered feature that will make our stories more accessible to the visually impaired, and make it easier to enjoy our content while you're on the go. We've added an audio player that uses a synthetic voice to read aloud a full transcription of the story. Now, you can listen to HackerNoon stories while you're commuting, exercising, folding laundry, or whatever else life has you doing.
What Does the Synthetic Audio Player Mean for Writers?
While the main goal was to improve reader accessibility, this feature should do wonders for our writers too. Firstly, adding an audio player gives your readers the ability to listen to the story and turns the active reading experience into an active or passive listening experience.
Sometimes readers get fatigued and stop reading your story halfway or even sooner than that. On the other hand, listening to the story takes less attention and could improve a reader's time on site.
Basically, a longer time on site = a more valuable story in the eyes of search engines. This audio player should keep readers on your stories longer, improving time on site and improving your rankings on Google and other search engines.
Learn more about the feature from the lead developer himself, Marcos Fabian, below...
This thread by Limarc Ambalina and Marcos Fabian occurred in slogging's official #software-development channel, and has been edited for readability.
Maybe some of you have noticed and maybe some of you haven't but there is a little green icon on some HackerNoon stories. I wonder what the icon means?
Well, if you haven't noticed by now, HackerNoon has added full synthetic audio transcriptions to our stories!!!! Every story published from now on can be read to you aloud from one of our synthetic voices.
Today I'm joined by the sole developer on this project Marcos Fabian to talk a bit about how he made it.
Marcos, tell us a bit about this awesome feature! How did you build it? How many voices are there? How does it woork?
Hey Limarc Ambalina, this feature is awesome. I personally use it a lot and I am glad it is now live. I simply provides Audio Readability to our stories. It comes with 4 different voices, 2 in English (M-F), 1 with European GB accent (F), and another one with the IN accent (M). It is really amazing how natural these voices sound.
If an article has the audio icon, it means that it has voices already produced for that article. You simply click on the play button, and it will read it to you. User's can change the voice to the voice of their choice. As you scroll down an article, there is also a play button on the nav bar, which gives the flexibility to pause and play the article as well.
Thanks for the intro Marcos Fabian. Having personally worked with you on this project, I know the ride we went through to get where we are now :rolling_on_the_floor_laughing:. Without naming names, we looked at companies that provide this software on a SaaS basis, and we looked into a few of these providers quite thoroughly.
Can you tell people a bit about why we felt it’d be best to make this in house instead of paying for a pre-built solution on the market, and what APIs we used to make this happen?
Yessss, it was a long road but we got here. To be honest when I took this project, I always felt like there was a gap in our articles. Just like myself, many people are active listener and not so much active readers, so I knew I wasn't the only one, and so working on this was sort for personal as I understood the importance of this future for our users and myself.
Hackernoon is unique, its colors, its people, its users. We are not like everyone and so we explored multiple options when it comes to developing this feature. We initiated conversations with a few companies that we believe were going to provide almost 0% of their way and almost 100% of hackernoon's way 😬. Unfortunately, it didn't work out as we are really picky ✌I meant we were looking for more customization and for thinks to look a bit more with the hackernoon styled. So partnerships did not go as expected and I sort of took this project and found google's api Text-to-Speech which was cheaper and provides a good way of transcribing text into audio. So I used this api to get the audio to our articles.
The main challenge was literally converting the actual stories into readable text, as when we go to one of our stories we see the stories but in reality all there is is a bunch of DOM elements. Thinks got trickier because we have two editors, so I had to convert one into the other to then extract the text out of it and pass it to the api and finally get the audio file. This was the challenge and then saving the audio and hosting it, but the actual conversion was really intense in terms of work.
The coolest part was designing our own audio player, which is built with plain css, no extra library. This audio player is not 100% as I still want to fix a few styling but overall looking good and doing the job.
And I'm loving your design! Especially the use of HackerNoon's robot characters.
So would you say Marcos Fabian that if you're not looking for customization and just want audio on your site ASAP, a 3rd party provider could be a good option, but if your website wants customization for the player and its features, it's probably faster and cheaper to build it yourself?
And in your research Google text to speech API was the cheapest out there that you could find?
Yes, there is a little more to it. If we were to have this feature from a partner:
- we would not be 100% owners of it
- we will have to accept their designs, layout, colors, conditions, etc.
- It is just not the way we do stuff at hn
We love our partners, but when it comes to dropping a feature, we want to be as unique as possible so that the feature can be best used by our users. It shows dedication and that someone cares. Us developers also read and navigate the web looking for information, so we understand what is a good experience and I wanted to provide that in a personalized way. 3rd parties are good, depending on the interest. We are grateful to have a great team that guide us to make features like this, including you Limarc Ambalina. 👍
Google API Text-To-Speech uses great synthetic voices, with different accents, genders, and languages. It was cheaper compared to our previous partner. I Also love the challenge that comes with building this feature from scratch 😁 without involving a 3rd party.
Awesome Marcos Fabian! Final question: what iterations do you think our readers can expect in the future? What do we have in the works? I'm assuming we'll be working on adding voices with other accents and things like that?
Yes, we definitely want to start by providing audio to every single story that we have, right now we are only providing audio to recent stories. Once that is done, we will then personalize and optimize the audio player a little more so that it looks more smooth. Lastly, providing more languages is almost a done deal because of the impact this feature has bring, but it is still up for discussion. We have different stages that we want to complete before getting there, but certainly, we will love user to read stories with multiple accents. Maybe it is done via requests, or integrated on user's settings, we will find a way to expand this feature to make sure users get the best out of it.