Cross-platform speech synthesis

A forum to share your demonstrations stacks, fun stacks, games, etc.
Post Reply
tperry2x
Posts: 3534
Joined: Tue Dec 21, 2021 9:10 pm
Location: webtalk.tsites.co.uk
Contact:

Cross-platform speech synthesis

Post by tperry2x »

You can mainly use revSpeak on MacOs and Windows, so this stack does just that. It's little more than a demo of what's already included in the IDE, as far as MacOS and Windows are concerned.

Where it's useful though, is if you want to do speech synthesis on Linux.
(download link 9.8MB appimage)
screenshot.png
screenshot.png (19.67 KiB) Viewed 9073 times
Why bother? I'm just trying to give a comparable set of features across MacOS, Windows and Linux.

So far we have:
Video working on all 3,
Sound working on all 3,
and now Speech synthesis working on all 3.

(* when I mention 'all 3', I mean Linux, Windows and MacOS.)
The point of this is so one platform isn't disadvantaged heavily over the other.
The only thing I'm really missing now is browser-widget-support for Linux.
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

tperry2x wrote: Sun Nov 24, 2024 10:02 pm Where it's useful though, is if you want to do speech synthesis on Linux.
(download link 9.8MB appimage)

screenshot.png

Why bother? I'm just trying to give a comparable set of features across MacOS, Windows and Linux.
I agree with this so much that two years ago I built an extension that uses the same library that your .appImage contains — eSpeak: https://github.com/OpenXTalk-org/OpenXTalk-eSpeak That should be usuable on Linux (and Windows) if the library is included or installed, but I built and tested this on macOS (using Homebrew to build eSpeak for macOS).
I like the idea of using a Linux .appImage running in a separate process for this, mostly because eSpeak also has some language files it installs that it looks for in the users home directory. One problem is that the .appImage has to be marked executable before this will work.

Also for Mac I made this extension which uses the same older Apple speech API that revSpeak uses on Mac, but with one extra capability that it can generate speech to a sound file instead of sending it to a audio out.
https://github.com/PaulMcClernan/OpenXT ... SSpeechLib

There is also a community built AVSpeech Extension (can't find that online at the moment). AVSpeech is Apple's newer Text-to-speech API on both macOS (since around 10.7 Lion) and iOS.

Emscripten Engine (and HyperSim) can use HTML5 WebSpeech API to do TTS, you can try that out in the OXT WebPlayground

I would think it would be fairly easy to build an Android (Java-FFI) Extension that does TTS.

The commercial version from LC has a 'unified speech library' that I assume collects various TTS methods into a single extension library. I think we should have 'unified libraries' for things like that as well.

One thing about eSpeak is it sounds like 1970s text-to-speech, like a 'speak-n-spell' toy voice! There must be better sounding options for TTS on Linux, no?
tperry2x
Posts: 3534
Joined: Tue Dec 21, 2021 9:10 pm
Location: webtalk.tsites.co.uk
Contact:

Re: Cross-platform speech synthesis

Post by tperry2x »

OpenXTalkPaul wrote: Tue Dec 10, 2024 11:32 pm One thing about eSpeak is it sounds like 1970s text-to-speech, like a 'speak-n-spell' toy voice! There must be better sounding options for TTS on Linux, no?
There was, but they are now just dead links.
I'd be happy to make an appimage of a better sounding one, if one exists.
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

tperry2x wrote: Sat Dec 14, 2024 10:25 am
OpenXTalkPaul wrote: Tue Dec 10, 2024 11:32 pm One thing about eSpeak is it sounds like 1970s text-to-speech, like a 'speak-n-spell' toy voice! There must be better sounding options for TTS on Linux, no?
There was, but they are now just dead links.
I'd be happy to make an appimage of a better sounding one, if one exists.
There may be TTS voice built into the CEF engine (but I could be wrong).
I believe Java has its own speech synthesis as well (TTS works in OpenXION).
TerryL
Posts: 144
Joined: Sat Oct 16, 2021 5:05 pm
Contact:

Re: Cross-platform speech synthesis

Post by TerryL »

Paul, I stumbled on this today. Any use for us?
Open Source SAPI5 Text-To-Speech Voices: https://zero2000.com/free-text-to-speec ... oices.html
TerryL
Posts: 144
Joined: Sat Oct 16, 2021 5:05 pm
Contact:

Re: Cross-platform speech synthesis

Post by TerryL »

I also noticed revSpeechVoices() is only returning two of four voices in win11. Would that be in the source code and can't be fixed?
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

TerryL wrote: Fri May 02, 2025 8:32 pm I also noticed revSpeechVoices() is only returning two of four voices in win11. Would that be in the source code and can't be fixed?
Oh thanks for letting us know, I've yet to run test on Win11. Maybe there's some work-around? Does passing the unlisted-by-revSpeechVoices names work and it's just that the voices are missing from the returned list ? Is it only listing voices of a certain gender? I'm sure this could be fixed. revSpeech is an external (.dll) on Win, so it should be that only that external needs to be patched and recompile. Alternatively perhaps, an Extension for Win could replace it. On macOS there's TWO Extension alternatives to using revSpeech.
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

TerryL wrote: Thu May 01, 2025 5:32 pm Paul, I stumbled on this today. Any use for us?
Open Source SAPI5 Text-To-Speech Voices: https://zero2000.com/free-text-to-speec ... oices.html
FOSS Voices that sound better than 1970s toy is a good thing for sure, but a FOSS small cross-platform library that can use them would be even better. It looks like some builds of the FOSS eSpeak-ng can support the MS format voices, but I'm not certain.
TerryL
Posts: 144
Joined: Sat Oct 16, 2021 5:05 pm
Contact:

Re: Cross-platform speech synthesis

Post by TerryL »

o revspeechvoices() returns only 2 of 3 voices, David and Zira.
Win11 comes with three voices (settings > time & language > speech) in this order:
Microsoft David Desktop - English (United States)
Microsoft Zira Desktop - English (United States)
Microsoft Mark Desktop - English (United States)
I tried revSetSpeechVoice followed by revSpeak (the full name must be used). David and Zira work fine, Mark uses Zira voice.
You thought the problem might be a 'dll'. I know nothing of this stuff. Can you come up with a viable solution?
tperry2x
Posts: 3534
Joined: Tue Dec 21, 2021 9:10 pm
Location: webtalk.tsites.co.uk
Contact:

Re: Cross-platform speech synthesis

Post by tperry2x »

Please see this other post.
You can see how the available speech voices not only vary by system version, but also the UK-EN version of Windows 10 has voices that the US-EN version of Windows 10 does not. The same for Windows 11.
Also, there's a bewildering array on Ubuntu linux, none on other distros, and only a handful that sound 'OK' on MacOS.
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

On macOS, iOS and Android users can download additional voices to use by the system, like I have Siri as an Australian female voice. However as far as I can tell the list of voices that are returned by Webkit (Safari) via JavaScript does NOT match, neither by name nor by sound, the Siri voices or the traditional (MacInTalk descendant) macOS voices.
I suppose it's ultimately up to a web browser's implementation of WebSpeech API to decide what voices are returned as the voice list via Javascript, or even whether or not to implement speech at all.

Think of the visually impaired, it is better to have some Text-To-Speech than none at all, even if it sounds like a 1970s 'Speak-N-Spell' toy. But yeah, the underlying Speech-synth or Speech API can be very different from platform to platform. This is why I liked the idea of 'unifying' libraries, like LC's 'Unified Speech Library, that abstract the platform and implementation away from the script. In theory Scripts written against such a library won't ever need to worry about underlying implementation differences, and at the same time new implementations can be added so that it 'just works' as in as may possible environments.
tperry2x
Posts: 3534
Joined: Tue Dec 21, 2021 9:10 pm
Location: webtalk.tsites.co.uk
Contact:

Re: Cross-platform speech synthesis

Post by tperry2x »

OpenXTalkPaul wrote: Sat May 10, 2025 3:44 am On macOS, iOS and Android users can download additional voices to use by the system, like I have Siri as an Australian female voice. However as far as I can tell the list of voices that are returned by Webkit (Safari) via JavaScript does NOT match, neither by name nor by sound, the Siri voices or the traditional (MacInTalk descendant) macOS voices.
I suppose it's ultimately up to a web browser's implementation of WebSpeech API to decide what voices are returned as the voice list via Javascript, or even whether or not to implement speech at all.

Think of the visually impaired, it is better to have some Text-To-Speech than none at all, even if it sounds like a 1970s 'Speak-N-Spell' toy. But yeah, the underlying Speech-synth or Speech API can be very different from platform to platform. This is why I liked the idea of 'unifying' libraries, like LC's 'Unified Speech Library, that abstract the platform and implementation away from the script. In theory Scripts written against such a library won't ever need to worry about underlying implementation differences, and at the same time new implementations can be added so that it 'just works' as in as may possible environments.
That was also why I liked the idea of an appimage, but I'd like the equivalent self-contained exe version for Windows (which is not at the mercy of other system changes) and the same for MacOS (which doesn't break with an Apple update).

It seems like a lot of the text-to-speech (never mind the speech to text, which is also something I'd love to get sorted for speech recognition) - a lot of those projects seem to have folded (at least on Linux and a few on Windows). I'm not sure why.
User avatar
richmond62
Posts: 5315
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Cross-platform speech synthesis

Post by richmond62 »

Surely the bottom line is that an operating system contains at least one speech-for-synthesis voice (however nauseating it sounds) so that programmers know that when they set their stack to say, "Hello, Fishface." when it is opened the end-user WILL hear that instead of a dull thud or nothing.

Back in 2003/4 building my decision tree stack for my Master's degree revSpeak relied on QuickTime (Meaning the voices worked superbly on MacOS 10.1): but as the schools where I was doing my fieldwork ran Millenium or Vista (Windows), and there was NO guaranteed of their machines having (QT) installed, I had to make my Poser-rendered 'bloke' silent.
https://richmondmathewson.owlstown.net/
User avatar
OpenXTalkPaul
Posts: 2849
Joined: Sat Sep 11, 2021 4:19 pm
Contact:

Re: Cross-platform speech synthesis

Post by OpenXTalkPaul »

2003/4 building my decision tree stack for my Master's degree revSpeak relied on QuickTime (Meaning the voices worked superbly on MacOS 10.1)
Mac TTS never relied on QuickTime, in 2003/04 it relied on either the classic TTS or on NSSpeech API which is still present today, but there's now a newer API (AVSpeech Synth) in place to replace it. I have in my collection of LCB Extensions, modules for using either NSSpeech or AVSpeech. The cool thing about my NSSpeech wrapper extension (vs. revSpeak), is that it can synthesize the speech to a sound file, which would then be a cross-platform media file, not requiring any speech API thereafter.

I know this doesn't help with Linux or WIndows, but Speech Recognition API is built into recent versions of macOS, it's available in the dictation system wide input service, and it works fine in OXT
Screen Shot 2025-05-11 at 11.29.25 PM.png
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest