HTML5 - Text to Speech

html5 - done reading - done reading - done reading - done reading - done reading - done reading - done reading - done reading - done reading - done reading

What is the basic usage?

var msg = new SpeechSynthesisUtterance('Hello World');

How can we alter parameters to effect the volume, speech rate, pitch, voice, and language?

var msg = new SpeechSynthesisUtterance();
var voices = window.speechSynthesis.getVoices();
msg.voice = voices[10]; // Note: some voices don't support altering params
msg.voiceURI = 'native';
msg.volume = 1; // 0 to 1
msg.rate = 1; // 0.1 to 10
msg.pitch = 2; //0 to 2
msg.text = 'Hello World';
msg.lang = 'en-US';

msg.onend = function(e) {
  console.log('Finished in ' + event.elapsedTime + ' seconds.');

  • text – A string that specifies the utterance (text) to be synthesized.
  • lang – A string representing the language of the speech synthesis for the utterance (for example “en-GB” or “it-IT”).
  • voiceURI – A string that specifies the speech synthesis voice and the location of the speech synthesis service that the web application wishes to use.
  • volume – A number representing the volume for the text. It ranges from 0 (minimum) to 1 (maximum) inclusive, and the default value is 1.
  • rate – A number representing the speaking rate for the utterance. It is relative to the default rate for the voice. The default value is 1. A value of 2 means that the utterance will be spoken at twice the default speed. Values below 0.1 or above 10 are disallowed.
  • pitch – A number representing the speaking pitch for the utterance. It ranges from 0 (minimum) to 2 (maximum) inclusive. The default value is 1.
  • onstart – Sets a callback that is fired when the synthesis starts.
  • onpause – Sets a callback that is fired when the speech synthesis is paused.
  • onresume – Sets a callback that is fired when the synthesis is resumed.
  • onend – Sets a callback that is fired when the synthesis is concluded.
  • onerror – The error event is fired if an error occurs that prevents the utterance from being spoken.
  • onresume – The resume event is fired if a paused utterance resumes being spoken.
  • onboundary – The boundary event is fired whenever a word or sentence boundary is reached while the utterance is being spoken.
  • onmark – The mark event is fired when a ‘mark’ tag is reached in a Speech Synthesis Markup Language (SSML) file. We haven’t covered SSML in this post. Just know that it’s possible to pass your speech data to an utterance using an XML-based SSML document. The main advantage of this being that it makes it easier to manage speech content when building applications that have large amount of text that need to be synthesised.

You can listen out for these events on an instance of SpeechSynthesisUtterance by attaching a function to the event or by using the addEventListener() method.

var utterance = new SpeechSynthesisUtterance('Hello Treehouse');

utterance.onstart = function(event) {
    console.log('The utterance started to be spoken.')


How can we pause speaking?

The SpeechSynthesis object doesn’t need to be instantiate. It belongs to the window object, and can be used directly. This object exposes several methods such as:

  • speak() – Accepts a SpeechSynthesisUtterance object as its only parameter. This method is used to synthesize an utterance. It will then add this to the queue of utterances that need to be spoken.
  • cancel() – This method will remove all utterances from the queue. If an utterance is currently being spoken, it will be stopped.
  • stop() – Immediately terminates the synthesis process.
  • pause() – Pauses the synthesis process.
  • resume() – Resumes the synthesis process.
  • getVoices() – This method returns a list of all the voices that are supported by the browser.

Another interesting method is getVoices(). It doesn’t accept any arguments, and is used to retrieve the list (an array) of voices available for the specific browser. Each entry in the list provides information such as name, a mnemonic name to give developers a hint of the voice (for example “Google US English”), lang, the language of the voice (for example it-IT), and voiceURI, the location of the speech synthesis service for this voice. n Chrome and Safari, the voiceURI property is named voice instead. So, the demo we’ll build in this article uses voice instead of voiceURI.

As well as these methods, the speechSynthesis interface also includes a number of attributes that can be useful for checking the current state of speech synthesis in the browser.

  • pending – This attribute will be set to true if there are utterances in the queue that have not yet started speaking.
  • speaking – This attribute will be true if an utterance is currently being spoken.
  • paused – This attribute will be true if an utterance is currently paused.

How can we detect speech synthesis capability?

if ('speechSynthesis' in window) {
 // Synthesis support. Make your web apps talk!

if ('SpeechRecognition' in window) {
  // Speech recognition support. Talk to your apps!

What are the attributes of each voice?

  • name – A human-readable name that describes the voice.
  • voiceURI – A URI specifying the location of the speech synthesis service for this voice.
  • lang – The language code for this voice.
  • default – Set to true if this is the default voice used by the browser.
  • localService – The API can use both local and remote services to handle speech synthesis. If this attribute is set to true the speech synthesis for this voice is handled by a local service. If it’s false a remote service is being used. This attribute can be useful if you’re building an app that needs to work offline. You could use a remote service when an internet connection is present, and fallback to a local service if a connection is not available.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License