<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: HTML5 Audio Read-Along</title>
	<atom:link href="http://weston.ruter.net/projects/html5-audio-read-along/feed/" rel="self" type="application/rss+xml" />
	<link>http://weston.ruter.net</link>
	<description>Web application developer in Portland, Oregon</description>
	<lastBuildDate>Fri, 09 Dec 2011 18:07:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: Harry Pannu</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-42991</link>
		<dc:creator>Harry Pannu</dc:creator>
		<pubDate>Fri, 09 Dec 2011 18:07:07 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-42991</guid>
		<description>Weston,

I have already skimmed through most of your posts as I was looking for a solution for myself. I just added a message of my own about how I got MP3 working with Sphinx for everyone&#039;s good.

See https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4787012?message=10898990</description>
		<content:encoded><![CDATA[<p>Weston,</p>
<p>I have already skimmed through most of your posts as I was looking for a solution for myself. I just added a message of my own about how I got MP3 working with Sphinx for everyone&#8217;s good.</p>
<p>See <a href="https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4787012?message=10898990" rel="nofollow">https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4787012?message=10898990</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weston Ruter</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-42987</link>
		<dc:creator>Weston Ruter</dc:creator>
		<pubDate>Fri, 09 Dec 2011 17:57:41 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-42987</guid>
		<description>@Harry:
The cue points are not reliable enough yet for me. Maybe things have improved since last time I tried, but compare the results of aligning John 1 and John 3:
https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.1
https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.3

John 3 gets extremely misaligned. See more information in the last post on the Sphinx forums: http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550</description>
		<content:encoded><![CDATA[<p>@Harry:<br />
The cue points are not reliable enough yet for me. Maybe things have improved since last time I tried, but compare the results of aligning John 1 and John 3:<br />
<a href="https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.1" rel="nofollow">https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.1</a><br />
<a href="https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.3" rel="nofollow">https://github.com/westonruter/esv-text-audio-aligner/blob/master/reports/John.3</a></p>
<p>John 3 gets extremely misaligned. See more information in the last post on the Sphinx forums: <a href="http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550" rel="nofollow">http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harry Pannu</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-42985</link>
		<dc:creator>Harry Pannu</dc:creator>
		<pubDate>Fri, 09 Dec 2011 17:41:27 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-42985</guid>
		<description>Did you ever need to tweak the cue points generated by Sphinx? I am working on a web app that is suppose to auto generate cue points for a given script and audio file on the click of a button in the web interface. On top it should have an &quot;easy-to-use&quot; interface for editors to manually correct any errors.

After struggling for several days, at the end, it turned out to be pretty simple to get the MP3 decoding working with Sphinx. I would be happy to provide you the specifics if interested in trying it. It will be comforting to know that it works for you as well.</description>
		<content:encoded><![CDATA[<p>Did you ever need to tweak the cue points generated by Sphinx? I am working on a web app that is suppose to auto generate cue points for a given script and audio file on the click of a button in the web interface. On top it should have an &#8220;easy-to-use&#8221; interface for editors to manually correct any errors.</p>
<p>After struggling for several days, at the end, it turned out to be pretty simple to get the MP3 decoding working with Sphinx. I would be happy to provide you the specifics if interested in trying it. It will be comforting to know that it works for you as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weston Ruter</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-42974</link>
		<dc:creator>Weston Ruter</dc:creator>
		<pubDate>Fri, 09 Dec 2011 16:13:29 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-42974</guid>
		<description>@Harry:
In my Sphinx4 attempt, I did indeed convert from MP3 to WAV first. I wasn&#039;t able to get Sphinx to work with MP3. However, my build script automatically converts the MP3s to WAV format before passing into Sphinx, so it&#039;s no extra labor. See https://github.com/westonruter/esv-text-audio-aligner/blob/master/align.py#L120</description>
		<content:encoded><![CDATA[<p>@Harry:<br />
In my Sphinx4 attempt, I did indeed convert from MP3 to WAV first. I wasn&#8217;t able to get Sphinx to work with MP3. However, my build script automatically converts the MP3s to WAV format before passing into Sphinx, so it&#8217;s no extra labor. See <a href="https://github.com/westonruter/esv-text-audio-aligner/blob/master/align.py#L120" rel="nofollow">https://github.com/westonruter/esv-text-audio-aligner/blob/master/align.py#L120</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harry Pannu</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-42973</link>
		<dc:creator>Harry Pannu</dc:creator>
		<pubDate>Fri, 09 Dec 2011 16:09:43 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-42973</guid>
		<description>Weston,

In your attempt to use Sphinx4, were you able to get word cue points directly from MP3 or did you convert MP3 to WAV format first? I am trying to make Sphinx4 use MP3 directly to save the labor of WAV conversion. I am getting good results. I would like to exchange information with you hoping it will benefit both of us. Looking forward to hear from you.

Thanks,
Harry Pannu</description>
		<content:encoded><![CDATA[<p>Weston,</p>
<p>In your attempt to use Sphinx4, were you able to get word cue points directly from MP3 or did you convert MP3 to WAV format first? I am trying to make Sphinx4 use MP3 directly to save the labor of WAV conversion. I am getting good results. I would like to exchange information with you hoping it will benefit both of us. Looking forward to hear from you.</p>
<p>Thanks,<br />
Harry Pannu</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weston Ruter</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-34832</link>
		<dc:creator>Weston Ruter</dc:creator>
		<pubDate>Thu, 01 Sep 2011 04:28:52 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-34832</guid>
		<description>The CMU Sphinx project has improved their code to work with aligning long audio with text. I&#039;ve created a new project which uses it to align the ESV text with the audio: https://github.com/westonruter/esv-text-audio-aligner</description>
		<content:encoded><![CDATA[<p>The CMU Sphinx project has improved their code to work with aligning long audio with text. I&#8217;ve created a new project which uses it to align the ESV text with the audio: <a href="https://github.com/westonruter/esv-text-audio-aligner" rel="nofollow">https://github.com/westonruter/esv-text-audio-aligner</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weston Ruter</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-29111</link>
		<dc:creator>Weston Ruter</dc:creator>
		<pubDate>Fri, 20 May 2011 19:21:02 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-29111</guid>
		<description>I filed a feature request for Google Chrome to extend their experimental TTS API to facilitate read-along apps: http://code.google.com/p/chromium/issues/detail?id=83404

&lt;blockquote&gt;I was excited to learn about Chrome&#039;s experimental TTS API. An application that I am very keen to develop is a read-along, where the page highlights the text corresponding with the words as they are being spoken. To do this, an API would need to be exposed for determining when each word is spoken. Currently, the Chrome TTS API only has events for onSpeak and onStop. To do a read-along, however, something like a &quot;onSayWord&quot; or “onUtter” event would be needed, where the event handler would be passed an Event object indicating the actual word being spoken and maybe a word index (the original text passed in would need to get broken up into individual utterances). It would also be useful to be able to seek to a specific time position within TTS audio given a word index (or time index)—this would allow you to navigate the audio via selecting words in the text. See the URL example provided for the kind of application I&#039;d love to build utilizing such extensions to the TTS API.&lt;/blockqupte&gt;</description>
		<content:encoded><![CDATA[<p>I filed a feature request for Google Chrome to extend their experimental TTS API to facilitate read-along apps: <a href="http://code.google.com/p/chromium/issues/detail?id=83404" rel="nofollow">http://code.google.com/p/chromium/issues/detail?id=83404</a></p>
<blockquote><p>I was excited to learn about Chrome&#8217;s experimental TTS API. An application that I am very keen to develop is a read-along, where the page highlights the text corresponding with the words as they are being spoken. To do this, an API would need to be exposed for determining when each word is spoken. Currently, the Chrome TTS API only has events for onSpeak and onStop. To do a read-along, however, something like a &#8220;onSayWord&#8221; or “onUtter” event would be needed, where the event handler would be passed an Event object indicating the actual word being spoken and maybe a word index (the original text passed in would need to get broken up into individual utterances). It would also be useful to be able to seek to a specific time position within TTS audio given a word index (or time index)—this would allow you to navigate the audio via selecting words in the text. See the URL example provided for the kind of application I&#8217;d love to build utilizing such extensions to the TTS API.</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weston Ruter</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-28875</link>
		<dc:creator>Weston Ruter</dc:creator>
		<pubDate>Sat, 14 May 2011 22:47:59 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-28875</guid>
		<description>@Dmitry:
I think I used &lt;a href=&quot;http://audacity.sourceforge.net/&quot; rel=&quot;nofollow&quot;&gt;Audacity&lt;/a&gt; and a Google Spreadsheet to manually obtain the timings. I literally stepped-through the audio second-by-second finding the start and end time for each word in the audio and then added them to the spreadsheet. It took a few hours just for this passage.

I know it is possible to automatically obtain the time indices for the words in audio given a transcript. The closest I got to doing this myself was utilizing the &lt;a href=&quot;http://cmusphinx.sourceforge.net/&quot; rel=&quot;nofollow&quot;&gt;CMU Sphinx&lt;/a&gt; project which includes the ability to align text and audio. I had some success, but hit a roadblock. You can read all about my efforts and see the code on a thread on the Sphinx forum: http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550

If you&#039;re able to tweak &lt;a href=&quot;https://gist.github.com/940801#file_align_esv.py&quot; rel=&quot;nofollow&quot;&gt;my code&lt;/a&gt; to get the desired results, please let me know!</description>
		<content:encoded><![CDATA[<p>@Dmitry:<br />
I think I used <a href="http://audacity.sourceforge.net/" rel="nofollow">Audacity</a> and a Google Spreadsheet to manually obtain the timings. I literally stepped-through the audio second-by-second finding the start and end time for each word in the audio and then added them to the spreadsheet. It took a few hours just for this passage.</p>
<p>I know it is possible to automatically obtain the time indices for the words in audio given a transcript. The closest I got to doing this myself was utilizing the <a href="http://cmusphinx.sourceforge.net/" rel="nofollow">CMU Sphinx</a> project which includes the ability to align text and audio. I had some success, but hit a roadblock. You can read all about my efforts and see the code on a thread on the Sphinx forum: <a href="http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550" rel="nofollow">http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4503550</a></p>
<p>If you&#8217;re able to tweak <a href="https://gist.github.com/940801#file_align_esv.py" rel="nofollow">my code</a> to get the desired results, please let me know!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dmitry</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-28847</link>
		<dc:creator>Dmitry</dc:creator>
		<pubDate>Sat, 14 May 2011 08:03:37 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-28847</guid>
		<description>Hi, Weston,
You&#039;ve mentioned that you manually traversed the audio and set all the timings. How exactly did you do that? What software did you use?

So, we have both sound file read by a human (not TTS) and the transcript. We want to do the slicing (that process of recovering word timings) automatically. Did you think if that is possible?
I was trying to do something similar and tried to employ Free-TTS... But I did manage to do smth usefull.
I&#039;m doing some study-English software and that exciting read-along feature is what I&#039;m really looking for. Can you advise?</description>
		<content:encoded><![CDATA[<p>Hi, Weston,<br />
You&#8217;ve mentioned that you manually traversed the audio and set all the timings. How exactly did you do that? What software did you use?</p>
<p>So, we have both sound file read by a human (not TTS) and the transcript. We want to do the slicing (that process of recovering word timings) automatically. Did you think if that is possible?<br />
I was trying to do something similar and tried to employ Free-TTS&#8230; But I did manage to do smth usefull.<br />
I&#8217;m doing some study-English software and that exciting read-along feature is what I&#8217;m really looking for. Can you advise?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gerardo Capiel</title>
		<link>http://weston.ruter.net/projects/html5-audio-read-along/comment-page-1/#comment-17535</link>
		<dc:creator>Gerardo Capiel</dc:creator>
		<pubDate>Fri, 28 May 2010 22:32:07 +0000</pubDate>
		<guid isPermaLink="false">http://weston.ruter.net/?page_id=236#comment-17535</guid>
		<description>You should check out a JavaScript based TTS implementation at http://scotland.proximity.on.ca/dxr/tmp/audio/tts/ .  It would be interesting to link that work with your work.  Keep me posted, if you do anything with this.</description>
		<content:encoded><![CDATA[<p>You should check out a JavaScript based TTS implementation at <a href="http://scotland.proximity.on.ca/dxr/tmp/audio/tts/" rel="nofollow">http://scotland.proximity.on.ca/dxr/tmp/audio/tts/</a> .  It would be interesting to link that work with your work.  Keep me posted, if you do anything with this.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

