Spoken Word: Bringing Read-Along Speech Synthesis to the Web

Back in December 2009 I did a hackathon to create an HTML5 Audio Read-Along (demo) which highlighted the text of words spoken in the corresponding audio being played. To introduce the project I wrote:

When I was in college, my most valuable tool for writing papers was a text-to-speech (TTS) program [ReadPlease 2003]. I could paste in a draft of my paper and it would highlight each word as it was spoken, so I could give my proof-reading eyes a break and do proof-listening while I read along; I caught many mistakes I would have missed. Likewise, for powering through course readings I would copy the material into the TTS program whenever possible and speed up the reading rate; because the words are highlighted, it’s easy to re-find your place if you look away and just listen for awhile. (I constantly use OS X’s selected-text speech feature, but unfortunately it does not highlight words). A decade after my college days, I would have hoped that such TTS read-alongs would have become common on the Web (though there is work-in-progress Chrome API and a W3C draft spec now under development), even as read-along apps are prolific in places like the Apple App Store for kids books.

As I further note in the project’s readme, the process I used to create this read-along demo was extremely tedious. It took me four hours to manually find the indices for a couple minutes of speech. I painstakingly obtained time indices for each word in a segment of speech audio to align with its corresponding text so that the text could be highlighted. Naturally my project was just intended as a demo and it is unreasonable to expect anyone else to go through the same process. Nevertheless, I think my proof of concept is compelling. I won second place in the HTML5 audio Dev Derby by Mozilla back in 2012.

Several years later I made Listenability which was an open source DIY clone of the now-defunct “SoundGecko” service. It allowed for you to create a podcast of articles that you sent to your blog and leveraged your system’s own speech synthesis to generate the podcast audio enclosure asynchronously. Daniel Bachhuber created SimpleTTS which integrates WordPress with the Amazon Polly text-to-speech to create the MP3 files and attached them to posts. His work was then followed-up with another Polly solution, this time being developed directly by AWS in partnership with WP Engine. These Polly integrations provide great ways to integrate speech synthesis into the publishing workflow.

Publishing text content in audio form provides key value for users because it introduces another mode for reading the content, but instead of reading with your eyes, you can read with your ears, such as while you are doing dishes or riding a bike. Speech synthesis makes audio scalable by automating the audio creation; it introduces your content into domains normally dominated by music, audiobooks, podcasts, and (oh yeah) radio.

The Amazon Polly solutions are great for when you want to publish audio as an alternative to the text. What they aren’t as great for is publishing audio alongside the text as I set out to demonstrate in the read-along experience in December 2009. (It is possible to implement a read-long with Polly using Speech Marks, but the aforementioned integrations don’t yet do so.) If there is an audio player sitting at the top of an article any you hit play, you can quickly lose your place in the text if you’re trying to read along since the currently-spoken words are not highlighted. Additionally, if you are reading the article with your eyes and then decide you want to switch to audio while you do the dishes, it is difficult to seek the audio content to the place where you last read in the text content. What I want to see is a multi-modal reading experience.

So in December 2017 I worked on another Christmas vacation project. Since Chrome, Firefox, and Safari now support an (experimental) Web Speech API with speech synthesis, you can now do text-to-speech in browsers using just the operating system’s own installed TTS voices (which are now excellent). With this it is possible to automate the read-along interface that I had created manually before. I call this new project Spoken Word. Here’s a video showing an example:

Here’s a full rundown of the features:

  • Uses local text-to-speech engine in user’s browser. Directly interfaces with the speechSynthesis browser API. Zero external requests or dependencies, so it works offline and there is no network latency.
  • Words are selected/highlighted as they are being spoken to allow you to read along.
  • Skips speaking elements that should not be read, including footnote superscripts (the sup element). These elements are configurable.
  • Pauses of different length added are between headings versus paragraphs.
  • Controls remain in view during playback, with each the current text being spoken persistently being scrolled into view. (Requires browser support for position:sticky.)
  • Back/forward controls allow you to skip to the next paragraph; when not speaking, the next paragraph to read will be selected entirely.
  • Select text to read from that point; click on text during speech to immediately change position.
  • Multi-lingual support, allowing embedded text with [lang] attribute to be spoken by the appropriate voice (assuming the user has it installed), switching to language voices in the middle of a sentence.
  • Settings for changing the default voice (for each language), along with settings for the rate of speech and its pitch. (Not supported by all engines.) Changes can be made while speaking.
  • Hit escape to pause during playback.
  • Speech preferences are persistently stored in localStorage, with changes synced across windows (of a given site).
  • Ability to use JS in standalone manner (such as in bookmarklet). Published on npm. Otherwise, it is primarily packaged as a WordPress plugin.
  • Known to work in the latest desktop versions of Chrome, Firefox, and Safari. (Tested on OSX.) It does not work reliably in mobile/touch browsers on Android or iOS, apparently due both to the (still experimental) speechSynthesis API not being implemented well enough on those systems and/or programmatic range selection does not work the same way as on desktop. For these reasons, the functionality is disabled by default on mobile operating systems.

Screenshots of the WordPress plugin with the Twenty Seventeen theme active:

You can try it out on a standalone example with some test content, or install the WordPress plugin on your own site (as it is installed here on my blog for this very article, but you need a desktop browser currently to see it).

For more details, see the GitHub project. Pull requests are welcome and the code is MIT licensed. I hope that this project inspires multi-modal read-along experiences to become common on the Web.

ECMAScript Proposal: Named Function Parameters

I recently ran across the ES wiki which is documenting proposals and features for new versions of ECMAScript (JavaScript). I was excited to see the spread operator...” which basically brings Perl-style lists to JavaScript. I was also excited to see the spread-related rest parameters which basically implement Python’s positional parameter glob *args; however, I did not see something equivalent to Python’s named parameter glob **kwargs (see on Python Docs).

I’ve been giving thought to passing in named arguments to function calls in JavaScript, eliminating the need for the current pattern of wrapping named arguments in an object, like:

function foo(kwargs){
    if(kwargs.z === undefined)
        kwargs.z = 3; //default value
    return [kwargs.x, kwargs.y, kwargs.z];
}
foo({z:3, y:2, x:1}); //=> [1, 2, 3]

Instead of this, I’d love to be able discretely define what arguments I’m expecting in a positional list, but then also allow them to be passed in as named arguments:

function foo(x, y, z){
    return [x, y, z];
}
foo(1, 2, 3) === foo(z:3, x:1, y:2); //=> [1, 2, 3]

This could also work with the ES proposal for parameter default values:

function foo(x, y, z = 3){
    return [x, y, z];
}
foo(1, 2) === foo(y:2, x:1); //=> [1, 2, 3]

Although the current proposal has a requirement that only trailing formal parameters may have default values specified, this shouldn’t be necessary when named parameters are used:

function foo(x = 1, y, z){
    return [x, y, z];
}
foo(z:3, y:2); //=> [1, 2, 3]

Furthermore, although the current spread proposal only works with arrays, it could also work with objects for named parameters:

var kwargs = {y:2, x:1, z:3};
foo(...kwargs); //=> [1,2,3]

Finally, the rest parameters proposal uses spread to implement the rough equivalent of Python’s positional parameter glob (*args), but I’m not sure how it could also be applied to named parameters to support Python’s **kwargs at the same time—I’m not sure how named rest parameters would work. For example in Python:

def foo(x, *args, **kwargs):
    return [x, args[0], kwargs['z']]
foo(1, 2, z=3); //=> [1, 2, 3]

Python has the * to prefix positional parameter globs and ** to prefix named parameter globs, but so far ECMAScript only has one prefix the “...” spread operator.

These are just some rough ideas about how JavaScript could support named parameters. I’m sure there things I’ve missed and implications I haven’t thought of, but I wanted to get my thoughts out there. What do you think?

Programming Languages I’ve Learned In Order

Update: See also list on MY TECHNE.

What follows are the programming languages I’ve learned in the order of learning them; their relative importance is marked up with big, and small indicates I didn’t fully learn or actually use the language.

  1. Perl 5
  2. JavaScript / ECMAScript
  3. PHP 4 & 5
  4. SQL
  5. Visual Basic 6
  6. Java
  7. Classic ASP: VBScript & JScript
  8. (Visual) C/C++
  9. XSLT
  10. Ruby
  11. Python (I expect/hope that this will supplant PHP in the next couple years, getting an extra big or two.)

While Python is now my server-side language of choice, I would be much happier to use JavaScript end-to-end. Thanks to the CommonJS initiative, this is becoming a reality.

Not included in the list above are markup languages and other related technologies: (X)HTML 4 & 5, XML, CSS, DOM, RSS, Atom, RDF, XML Schema, XPath, JSON, JSON/XML-RPC, SVG, VML, OSIS, iCal, Microformats, MathML, etc.

Idea via and prompted by James Tauber, via Dougal Matthews.

Proposal for Customizing Google’s Crawlable Ajax URLs

On the Shepherd Interactive site, we have a dynamic navigation menu in Flash. In order to prevent it from having to reload each time a page is changed, I implemented Ajax loading so that the SWF only has to load once. This is similar to what Lala and Facebook do. So if your browser is Ajax-enabled, upon visiting:

http://shepherdinteractive.com/portfolio/interactive/

You will get redirected to site root / with the old path supplied as the URL hash fragment which is then loaded in via JavaScript as the page content:

http://shepherdinteractive.com/#portfolio/interactive/

However, according to Google’s Making AJAX Applications Crawlable specification, Ajax pretty URLs are any containing a hash fragment beginning with !, for example:

http://shepherdinteractive.com/#!portfolio/interactive/

The purpose of the ! is merely to inform Googlebot that such a URL is for an Ajax page whose content can be fetched via:

http://shepherdinteractive.com/?_escaped_fragment_=portfolio/interactive/

The problem I have with Google’s specification is that the pretty URL Ajax fragment prefix (!) is mandated; it is not customizable. I should be able to tell Googlebot which fragment identifiers are for Ajax content and which are not. Therefore, instead of requiring authors to to conform to Google’s Ajax specification, I propose that Google adopt an extension to robots.txt which allows site owners to let Googlebot know what to look for. The current specification’s mandate for ! could be indicated via:

Ajax-Fragment-Prefix: !

Or it could be changed to anything else, such as “ajax:“. If the Ajax fragment doesn’t have a prefix at all (as in the case of Shepherd Interactive’s website above), a regular expression pattern match could be specified in robots.txt, for example:

Ajax-Fragment-Pattern: .+/

This would tell Googlebot that a URL with a fragment containing a slash should be fetched via the _escaped_fragment_ query parameter, and that the Ajax URL itself (including the fragment identifier) should be indexed and returned verbatim in the search results.

It’s true that the Shepherd Interactive site implements Hijax (progressive enhancement with Ajax) techniques, so every Ajax URL has a corresponding non-Ajax URL; so in this sense, Google can still access all of the content. The problem, however, is with links to Ajax URLs from around the Web. I assume for Googlebot, every link to an Ajax URL without the obligatory prefix ! is interpreted as referring to the home page (site root /):

  • http://shepherdinteractive.com/#portfolio/interactive/harmer-steel/
  • http://shepherdinteractive.com/#about-us/our-company/
  • http://shepherdinteractive.com/#services/web-development/

So Google then assigns no additional PageRank to our Ajax URLs since it gets assigned to the home page. If, however, Googlebot could be told that those are actually Ajax URLs, then the PageRank could be properly assigned. (My assumptions here could be incorrect.)

Thoughts?

Update 2010-04-22: “Cowboy” Ben Alman brought up the excellent point that customizable Ajax-crawlable URLs need to work on a per-path basis, as different single page web apps on the same server (or even in the same folder) might have different URI requirements. I wonder if in addition to (or instead of) using robots.txt to tell Googlebot the format of the Ajax-crawlable URLs, why not allow that information to be placed in meta tags? Their specification already includes:

<meta name="fragment" content="!">

Why not make Googlebot support additional meta tags for recognizing the prefix (if any) or pattern that characterizes an Ajax-crawlable URL on a given page? For example:

Current default behavior:
<meta name="crawlable-fragment-prefix" content="!">
Starting with “/” like …about/#/us/:
<meta name="crawlable-fragment-prefix" content="/">
All fragments are crawlable:
<meta name="crawlable-fragment-prefix" content="">
Any fragment ending with a slash:
<meta name="crawlable-fragment-pattern" content=".+/">

With these meta tags, authors would be able to have complete page-level control over the structure of Ajax-crawlable URLs (hash fragments).

Browser Detection Fail

Being prompted on the Google home page to “Install Google Chrome” for “A faster way to browse the web” while I am already using Google Chrome for Mac: epic browser detection fail.

Anyway, Chrome definitely is a faster way to browse the Web. It’s amazing!

Multiple Borders via CSS box-shadow

Update : Multiple box-shadows now work in Chrome 5 as well as in Firefox 3.5+, although in the examples with border radius, each successive shadow becomes more and more squared off; see Chrome screenshots 1 & 2 below.

To make multiple borders appear around an element, the traditional approach has been to nest multiple elements and apply a different border to each. For example, to put a rainbow border on an element:

I have rainbow borders!

This approach uses seven div elements. However, there is an alternative that avoids the need for so many superfluous elements: the new CSS3 box-shadow property. A key feature of this property is that it allows for multiple shadows to be supplied at once:

box-shadow: none | <shadow> [ , <shadow> ]*

When multiple shadows are supplied, they can be positioned on top of each other to create a multiple-border effect. Using the box-shadow property in this way requires support for the "spread radius" value, and this is not currently supported by WebKit (includes Safari and Chrome). However, Firefox 3.5 (now available as a preview release) does support the box-shadow spread radius, and if using that browser, the following example should look identical to the example above:

I have rainbow borders!

Using box-shadow for multiple borders not only eliminates the need for extra markup, it also allows the multiple borders to be easily rounded on corners simply by adding the border-radius property (see screenshot from Firefox 3.5, and another from Chrome 5):

I have rainbow borders with rounded corners!

To accomplish the same above with multiple nested elements, you would have to tediously increase the border-radius for each wrapped element in order to get the corner arcs to fit together properly.

I suppose using box-shadow to implement borders is a kind of hack (the technique requires setting margins because the shadows don't effect page flow), but we're celebrating the compeltion of Firefox 3.5 by showcasing hacks like these! It would be great if multiple borders could simply be specified on a border property just as is possible with box-shadow. Each subsequent border defined could wrap around the previously defined borders. The property could be defined:

border: none | <border> [ , <border> ]*

In any case, the box-shadow property still has a benefit of allowing the borders to blend together by specifying the box-shadow's blur radius (see screenshot from Firefox 3.5 and another from Chrome 5):

I have blended rainbow borders!

It's exciting to see how Mozilla is implementing bleeding-edge CSS features in Firefox as we have become accustomed to from the WebKit team. Great work everyone!

Detecting Support for data: URIs

Updates: Added note at end to respond to V1’s comment, and fixed the “awkward CSS syntax” which was actually a big typo (thanks Harmen).

The data: URI scheme is now supported by the most current version of every major browser, including Internet Explorer. Because of this I wanted to use CSS background images encoded with data: URIs in a current project at Shepherd Interactive. Why? The first rule of High Performance Websites is to Minimize HTTP Requests. By storing background images directly in the stylesheet only one HTTP request is then necessary to fetch the stylesheet and the images all at once. Furthermore, by giving that stylesheet a far-future cache expiration date the browser will never need to request it again.

However, while every browser vendor’s current version supports data: URIs, older browsers like MSIE 7 do not. Therefore it is necessary to also have fallback CSS rules to serve externally-referenced background images. I came up a way of doing this by utilizing the following script which I place as an inline script in the document head:

var data = new Image();
data.onload = data.onerror = function(){
	if(this.width != 1 || this.height != 1){
		document.documentElement.className += " no-data-uri";
	}
}
data.src = "";

This script attempts to load a 1×1 pixel via a data: URI. If the onload callback fires and it detects that the image’s width and height are each a single pixel, then the browser supports data: URIs. Otherwise the browser does not support such URIs and a class name is added to the document root so that fallback CSS rules can be written, for example:

#foo {
    background-image:url("data:image/png;base64,<...>");
}
html.no-data-uri #foo {
    background-image:url("images/foo.png"); /* fallback */
}

In this way, CSS rules can define background images that utilize data: URIs only for browsers that support them, while at the same time older browsers still render the content properly due to the fallback rules. Note that for browsers that support data: URIs, the fallback CSS rule above will never be applied and thus its externally-referenced background image will never be downloaded. However, if the browser does not support data: URIs, then the resources will essentially be downloaded twice, but that’s the price you pay for using an old browser. In any case, when the images are tiny (<1KB) then the overhead is minimal for non-supporting browsers, and your development process is greatly eased since you don’t have to hassle with creating image sprites (which are sometimes impossible anyway); those who use current browser versions will be rewarded as will the generations of Internet users to come.

“Old” Mainly Linguistics Stuff from College Days

This morning I got inspired to go over some relatively old stuff that I worked on in college. There are several linguistics projects that I think are pretty interesting (the first three especially):

And then I found an old chart I made of my racial makup.

And then there is a little web app for SPU’s Nickerson Signal which attempts to model a traffic signal so that upcoming crossing times can be predicted. (Didn’t end up accurately modeling reality.)

And finally there is the Semantic Linking tool (mentioned in my last post) which is a prototype for an upcoming tool which enables contributors to link the semantic units between manuscripts and translations.