Voice Control APIs

With almost everyone on the planet having a smartphone and/or a desktop computer with a microphone, platform companies are adding Voice Control APIs to their technology offerings. Developers have multiple choices to add speech control for their applications. Consumers have been experiencing speech recognition systems when calling companies (airlines, department stores, etc.) and using voice commands instead of having to hit keys on their phones. Major platform vendors, online services and others have opened their APIs for developers to add voice control and conversational user experiences in their applications. Developer program SDKs and APIs are available from Microsoft, Apple, Google, Amazon, SoundHound, and others. In the background, powerful AI, machine learning and natural language processing systems are helping with the “heavy lifting” of voice control and recognition. A challenge for developers is choosing which APIs to support (probably all of them). How can these voice platform vendors quickly help developers integrate Voice Control APIs for their applications? These developer innovations are also allowing other developer programs to integrate with and provide added value on top of these APIs for their own platforms, products, and services. This blog post lists a few of the many Voice Control APIs that developers can use.

2541.W10_Cortana_Lockup_blue_4AD44544 Amazon_Alexa_App_Logo Google_Assistant_logo Siri voice control

Microsoft – Cortana / Skills – Microsoft leverages the Bing Speech API and Microsoft Cognitive services to power Windows and Android applications like Cortana, Skype Translator and Bing Torque. According to Microsoft, “Cortana connects users to your services, across platforms and devices”. The Cortana Developer Center provides the skills kit, documentation, and samples. You can sign up for the preview to arrive in early 2017. The Cortana developer page also provides guidance for programmers with existing code: “Re-use your custom skill code built for Amazon Alexa”, “Using the Microsoft Bot Framework? Cortana brokers connections between users and bots using the skills kit and the Cortana channel “, and “Import Cortana voice commands from Windows 10 apps”.

Google – Assistant / Actions – Google recently announced the opening of Google Home and Google Assistant for developers. For years we’ve been saying “OK Goggle” into our smartphones. The Google Assistant APIs allow developers to create Actions. The Google Actions site says “Actions on Google let you build for the Google Assistant. Your integrations can help you engage users through Google Home today, and in the future, through Pixel, Allo, and many other experiences where the Google Assistant will be available.” Developers can learn how to quickly integrate voice control into their apps using the Conversation API and Actions SDK. Developer guides, samples, reference documentation and a Web Simulator are available on the Actions on Google developer site.

Amazon – Alexa / Skills – Amazon opened up their Alexa voice service to developers. Alexa is also supported on Amazon’s devices including Echo, Tap and Dot. The Amazon Alexa developer page answers the question “Why Alexa?” with: “Alexa, the voice service that powers Echo, provides capabilities, or skills, that enable customers to interact with devices in a more intuitive way using voice. Examples of these skills include the ability to play music, answer general questions, set an alarm or timer and more. Alexa is built in the cloud, so it is always getting smarter. The more customers use Alexa, the more she adapts to speech patterns, vocabulary, and personal preferences.” With Alexa, developers use APIs to create skills for application voice recognition and operations. Developers can find additional development information on the Alexa Skills Kit site.

Apple – Siri / Domains and Intents – With the release of iOS 10, Apple opened up Siri to iOS application developers with the introduction of SiriKit. According to Apple’s SiriKit site: “SiriKit enables your iOS 10 apps to work with Siri, so users can get things done with your content and services using just their voice. In addition to extending Siri’s support for messaging, photo search and phone calls to more apps, SiriKit also adds support for new services, including ride booking and personal payments.” Developers can learn how to integrate voice control using the SikiKit Programming Guide.

API.AI – Agents / Entities / Intents / Actions / Contexts – natural language platform for developers to create conversation UIs for apps, web applications, devices and bots. SDKs and libraries are available for Android, iOS, watchOS, macOS, Ruby, Javascript, Node.JS, HTML5, Python, C++, C#, Java, PHP and more. “Our goal is to make the process of creating and integrating sophisticated conversational user interfaces as simple as possible.” You can find the SDKs, APIs, documentation, etc. on the API.AI developer site.

Houndify – Domains – “Houndify is a platform that allows anyone to add smart, voice enabled, conversational interfaces to anything with an internet connection. Once you integrate with Houndify, your product will instantly understand a wide variety of questions and commands.” Houndify, by SoundHound, has developer SDKs for iOS, Android, C++, C#, Java, Javascript, Python, and other platforms via HTTP/REST/JSON. You can join the Houndify developer program for free (there is also a fee level for higher API call volumes) and gain access to the APIs, SDKs, documentation, tutorials, etc.

Facebook Jarvis – Voice Control, the “Voice of God” and someday an API

This week Mark Zuckerberg introduced the world to his year long AI development project called Jarvis – “It uses several artificial intelligence techniques, including natural language processing, speech recognition, face recognition, and reinforcement learning, written in Python, PHP and Objective C.” According to news reports, Zuckerberg personally contacted Morgan Freeman so that Jarvis would have the “voice of God”. You can read about how Zuckerberg built Jarvis and watch the introduction on Faceboook. Zuck also built a Facebook Messenger Bot for Jarvis – You can learn about the bot framework at messenger.com/platform.

Uber and Voice Control Integration

The Uber app for iOS now allows you to use your voice and Siri to launch the ride app. First you need to go to your iPhone’s Settings and click on Siri. On Siri’s settings page, choose “App Support” and turn on Siri support for Uber (and other apps that are listed). I looked in the Uber Developers Ride Requests documentation to see if there was an API related to Siri, but did not find anything yet. Would it be cool if the Uber API had extensions for several Voice Control APIs for custom application development on iOS and Android? I did find an article on the Uber blog titled “Hound and Uber — The voice interface future is here” that talks about the SoundHound Hound app’s integration with Uber. The blog post starts with “We’re on the brink of a voice interface revolution. In an increasingly connected world, we will speak to the products and services around us.” The post also goes on to talk about “Hound, a consumer voice search and assistant app, and Houndify, a developer platform that enables any developer to add a natural, conversational voice interface to their products.”

Do you build apps that use APIs on top of other Voice Control APIs?

Post a comment if you use voice control API extensions for one or more of the above platform vendor APIs. If you use other developer Voice Control APIs post a comment with the name and URL to share with other developers.

David Intersimone “David I”
Vice President of Developer Communities
Evans Data Corporation
davidi@evansdata.com
Blog: https://devnet.evansdata.org/
Skype: davidi99
Twitter: @davidi99

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30