Close

Automating DECtalk

A project log for 80speak - Online DECTalk Speech Synthesis

I've always been a fan of Stephen Hawking's signature speech synthesizer, so I made a publicly available version of it!

lixie-labsLixie Labs 03/24/2017 at 23:360 Comments

The next step was to automate 80speak:

Before I continue, the Python/Flask API is located on port 5000 of 80speak.com, and the endpoint http://80speak.com:5000/send_text accepts a POST request containing the following JSON:

{"message":"Your text to speak!"}

Returned is a link to an MP3 containing your synthesized speech! Feel free to use this to make embedded systems like Raspberry Pi speak!

Cheating Mobile Devices

To make this process happen behind the scenes in a smooth way, I first wanted the user to be able to stay on a single page. This meant using jQuery and some clever HTML5 tricks to return an auto-playing result on the same page you requested it from.

To do this was pretty simple: make a POST request, wait for the MP3 link back, and spawn a hidden auto-playing HTML5 Audio player on the page. However, mobile devices don't allow auto-playing by default, they require the user to interact directly with an Audio object before it can be controlled with code. This is to prevent malicious websites from spawning a hundred auto-playing, silent MP3s off-screen just to eat your data/bandwidth in a small-scale DDOS attack. Or something like that.

I found out that HTML5 audio CAN be auto-played on mobile by JS AFTER the user has already manually started a previous audio object playing. To cheat this system and allow mobile devices to participate in the same way, a split-second silent MP3 is played when the user presses the "SAY IT!" button. That button is designated as the play/pause control for the silent audio. The silent file plays quickly in the background, and the mobile browser now allows us to play the speech audio automatically after it returns! Aha! There's a good Hack of the Day here.

Server Side

The Flask API parses the text POSTed to it by the website, and passes it to SAY.EXE for synthesis. Because SAY.EXE is a Windows binary, it has to be run under WINE. Easy enough.

$ wine say.exe
Application tried to create a window, but no driver could be loaded.
Make sure that your X server is running and that $DISPLAY is set correctly.

Ah, okay. Needs something for a display to run. Xvfb to the rescue! We create a fake display at 1024x768 for WINE to use.

Xvfb :0 -screen 0 1024x768x16 &
This was added as an "@reboot" to the crontab to make sure the display runs when we start the machine. Now our WINE command looks like this:
$ DISPLAY=:0.0 wine say.exe -w WAVE_FILE.wav "Our message goes here!"

It works! WAVE_FILE.wav now contains a recording of DECtalk saying "Our message goes here!". Time to automate with Python's os.system() command:

def convert_to_speech(message):
        mid = str(uuid.uuid4()).replace("-","")

        print "----------------------------------------"
        print "SPEECH CONVERSION\n"
        print "MID: "+mid
        print "MESSAGE: "+message
        print "Converting to speech..."

        wav_file = "/wav/"+mid+".wav"
        out_file = "/mp3/"+mid+".mp3"       # DOES NOT EXIST YET
        mp3_file = "/var/www/html"+out_file # DOES NOT EXIST YET

        try_rm(wav_file) # Deletes if exists
        try_rm(mp3_file)

        command = "DISPLAY=:0.0 wine say.exe -w "+wav_file+" "+shellquote(message)

        print command
        os.system(command)

        print "Converting to mp3..."
        sound = AudioSegment.from_file(wav_file, format="wav")
        loud = sound+3;
        loud.export(mp3_file, bitrate='64k', format="mp3")

        print "DONE!"
        print "----------------------------------------"

        return out_file
This function is called by the Flask API "send_text" endpoint, and returns an MP3 version of the speech. This mp3 is spawned in the user's page as a hidden Audio object, and automatically plays the result on both desktop and mobile thanks to the audio button cheat!

Discussions