Nowadays, AI is being used for almost anything and everything which often saves us time, and delivers much better results. Using AI also involves costing. Imagine doing a basic thing such as converting audio to text, which costs you thousands of dollars because you used AI.
Did you know you can get the same results without needing AI or spending a dime? Even better, you won't need an extra server. Most modern browsers already have a built-in audio-to-text feature, so they handle everything right there in your browser. This handy tool is available through the Web Speech API.
The World Wide Web Consortium (W3C) put forward the Web Speech API as a draft in 2012. It has since become widely supported in modern browsers, making voice capabilities more accessible to web developers.
The Web Speech API is a powerful tool built right into modern browsers, allowing web apps to use voice interactions. It works by tapping into the hardware and software of the user’s device to process and understand spoken words.
This API has two main parts: Speech Recognition and Speech Synthesis. Speech Recognition lets websites listen to what you say and turn your words into text, making hands-free, voice-controlled interactions possible. This opens up new ways to create user-friendly interfaces and makes websites more accessible.
The Speech Synthesis component, in contrast, provides the capability to generate synthetic speech from written text. This text-to-speech functionality allows web applications to audibly convey information to users, further enhancing the multimodal experience and accessibility of web-based experiences.
Together, the Speech Recognition and Speech Synthesis capabilities of the Web Speech API offer developers a robust set of tools to incorporate voice-driven features and functionality into their web applications, catering to a wide range of user needs and preferences.
Here's a closer look at how it works:
Now let’s guide you through setting up and using the SpeechRecognition interface.
Key Interfaces and Methods:
// Creating a new SpeechRecognition instance
const recognition = new SpeechRecognition();
// Set properties (optional)
recognition.lang = 'en-US';
recognition.interimResults = true;
// Event listeners
recognition.addEventListener('result', (event) => {
const transcript = Array.from(event.results)
.map((result) => result[0].transcript)
.join('');
console.log('Transcript:', transcript);
// Do something
});
recognition.addEventListener('error', (event) => {
console.error('Speech recognition error:', event.error);
});
// Start the speech recognition
recognition.start();
You can work with grammar and configure recognition properties to enhance speech recognition functionality.
Working with Grammars:
// Creating a new SpeechGrammarList
const grammar = new SpeechGrammarList();
// Grammar
const phrase = '#JSGF V1.0; grammar phrase; public <phrase> = hello | goodbye;';
const newGrammar = new SpeechGrammar(phrase);
grammar.addFromString(newGrammar, 1);
// Configure the SpeechRecognition to use the grammar
recognition.grammars = grammar;
recognition.start();
Speech synthesis, also known as text-to-speech (TTS), is a powerful feature of the Web Speech API that allows web applications to convert text into spoken words. This capability opens up a range of possibilities for enhancing user experience and accessibility.
Basic Usage: To use speech synthesis, you'll work with the SpeechSynthesis interface and SpeechSynthesisUtterance object. Here's a basic example:
const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance("Hello, world!");
synth.speak(utterance);
Customizing Voice Properties: You can customize various aspects of the synthesized speech:
utterance.volume = 0.8; // 0 to 1
utterance.rate = 1.2; // 0.1 to 10
utterance.pitch = 1.1; // 0 to 2
utterance.lang = 'en-US';
Choosing Voices: Modern browsers often provide multiple voices to choose from:
const voices = synth.getVoices();
utterance.voice = voices[0]; // Choose the first available voice
Handling Events: The SpeechSynthesisUtterance object emits various events you can listen for:
utterance.onstart = () => console.log('Speech started');
utterance.onend = () => console.log('Speech ended');
utterance.onerror = (event) => console.error('Speech error:', event.error);
utterance.onpause = () => console.log('Speech paused');
utterance.onresume = () => console.log('Speech resumed');
Managing Speech Queue: The speech synthesis interface allows you to manage multiple utterances:
synth.cancel(); // Stop current speech and clear queue
synth.pause(); // Pause speaking
synth.resume(); // Resume speaking
Handling Long Text: For longer text, you might want to break it into smaller chunks:
function speakLongText(text) {
const maxLength = 200;
const chunksArr = text.match(new RegExp(`.{1,${maxLength}}(\\s|$)`, 'g'));
chunksArr.forEach((chunk, index) => {
const utterance = new SpeechSynthesisUtterance(chunk);
utterance.onend = () => {
if (index === chunks.length - 1) {
console.log('Finished speaking all text');
}
};
synth.speak(utterance);
});
}
Accessibility Considerations: When using speech synthesis, consider the following:
Browser Support and Fallbacks: While speech synthesis is widely supported, it's good practice to check for support and provide fallbacks:
if ('speechSynthesis' in window) {
// This says, Speech synthesis is supported
} else {
console.log('Speech synthesis not supported');
// Provide alternative feedback method
}
Combining with Speech Recognition: You can create a conversational interface by combining speech synthesis with speech recognition:
recognition.onresult = (event) => {
const text = event.results[0][0].transcript;
console.log('You said:', text);
const response = generateResponse(text); // Main logic here
const utterance = new SpeechSynthesisUtterance(response);
synth.speak(utterance);
};
To demonstrate the integration of the Web Speech API in a complete web application, let's walk through a sample React-based implementation.
First, let's create a React component VoiceEnabledApp:
import React, { useState, useEffect } from 'react';
function VoiceEnabledApp() {
const [recognition, setRecognition] = useState(null);
const [transcript, setTranscript] = useState('');
useEffect(() => {
// Initialize the SpeechRecognition instance
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const newRecognition = new SpeechRecognition();
newRecognition.lang = 'en-US';
newRecognition.interimResults = true;
setRecognition(newRecognition);
// Event listeners
newRecognition.addEventListener('result', (event) => {
const spoken = Array.from(event.results)
.map((result) => result[0].transcript)
.join('');
setTranscript(spoken);
});
newRecognition.addEventListener('end', () => {
newRecognition.start();
});
// Starting the speech recognition service
newRecognition.start();
return () => {
// Cleanup the event listeners and stop the recognition
newRecognition.removeEventListener('result', () => {});
newRecognition.removeEventListener('end', () => {});
newRecognition.stop();
};
}, []);
return (
<div>
<h1>Voice-Enabled Application</h1>
<p>Transcript: {transcript}</p>
{/* Add other UI components and functionality */}
</div>
);
}
export default VoiceEnabledApp;
To use this VoiceEnabledApp component in the application, we can import and render it:
import React from 'react';
import VoiceEnabledApp from './VoiceEnabledApp';
function App() {
return (
<div>
<VoiceEnabledApp />
</div>
);
}
export default App;
Find more detailed React App on GitHub.
The Web Speech API gives developers a strong built-in way to add voice features to websites. Using this API has an impact on making sites easier to use keeping users interested, and boosting productivity. It also sparks new ideas in web projects. As this tech keeps getting better, the Web Speech API opens up exciting chances for developers to build websites that are easier to use and more interactive. With the example code and tips we've shared in this post, you can begin to explore what the Web Speech API can do and add voice features to your own websites.
The Web Speech API is widely supported in modern browsers, but it's best to check compatibility and provide fallbacks for unsupported browsers.
You can use JavaScript to access the API's features, like SpeechRecognition for speech-to-text and SpeechSynthesis for text-to-speech, in your web application's code.
I excel in optimizing performance, reducing costs, and creating innovative web applications. Passionate about coding and continuous learning, I share insights on cutting-edge tech solutions.
Redirection Loops: A Beginner’s Guide
Ever clicked a link only to find your browser stuck in an endless loop? Welcome to the world of redirection loops. These pesky web gremlins can frustrate users and harm your site's performance. In this beginner's guide, we'll...
From Razorpay To Global: Our Payment Gateway Journey
When we initiated our project, our vision was clear: create an application tailored for the Indian audience. Naturally, we chose RazorPay as our payment gateway, given its popularity in India. Based on our initial target audi...
Why Do React Native & Flutter Outperform Kotlin & Swift?
As the mobile app development landscape continues to evolve, businesses are faced with a daunting decision: should they stick with native languages like Kotlin and Swift, or take the hybrid route with React Native and Flutter...