Text2Go 3.0 Released – Pronunciation correction done right!
August 7, 2008 at 9:19 pm | Posted in text to speech, Text2Go | Leave a commentFinally! It’s taken a lot longer than I expected. Software estimation proves once again to be an elusive art. The major new feature can be summed up as ‘Pronunciation correction done right’. Ever since I discovered text to speech technology I’ve been bugged by mispronunciations. Although quite rare, they tend to stand out in a document that’s being narrated. They’re especially grating if they occur multiple times in the same document. For this reason, most text to speech applications provide a way to enter corrections. The previous release of Text2Go provided this ability but it required the user to edit XML files and restart Text2Go each time. Not very user-friendly! It was a stop-gap solution until could find the time to implement a proper solution.
That time has come. When I first designed Text2Go I had a lot of ideas on how to efficiently identify and correct mispronunciations. With this release I’ve been able to put these ideas into practice. This has been very satisfying.
One of the first challenges is finding a way of efficiently identifying mispronunciations. Pronunciation errors are actually quite rare. The naive approach is to listen to a document from start to finish, noting down any mispronunciations as you go. You can then come back and enter corrections for the next time the offending words are encountered. There are a couple of major problems with this approach.
The first is that you end up listening to the entire document, complete with mispronunciations. You’ll only get the benefit of the corrections you’ve entered the next time these words occur.
The second problem is the approach is incredibly inefficient. All documents are filled with high frequency words such as ‘a, is, the, and, in’ etc. These are never mispronounced but you have to listen to them over and over.
I wanted an approach that could identify and correct mispronunciations before listening to a document and was quick and efficient. So I came up with the following.
First, extract a list of words from the document and remove all duplicates. This single step means you only have to listen to a word at most once, no matter how many times it appears in the document.
Taking this one step further, once you’ve listened to a word and verified it to be correctly pronounced, it would be nice to be able to remember this so that you never have to check it again. This is particularly useful for eliminating the high frequency words mentioned above. Therefore Text2Go maintains a ‘white-list’ of correctly pronounced words. These are filtered from the document being checked, again significantly reducing the number of words requiring checking.
Of the remaining words, it would be nice to be able to identify the most likely to be mispronounced. The approach I’ve chosen is to spell-check the remaining words. Misspelt (or unrecognized) words are then placed on the top of the list. The reason is that brand names, jargon and slang that haven’t made it into the dictionary are more likely to be mispronounced. Of course correctly spelt words can also be mispronounced and unrecognized words correctly pronounced. It’s just a way of increasing the likelihood of identifying mispronunciations.
Another strategy is to identify compound words (i.e. two words run together) as I’ve discovered these are more likely to be mispronounced. The way I identify compound words is to find all words that are made up of exactly two correctly spelt words. Unfortunately this generates a number of false positives (e.g. ration = rat + ion). It’s still a useful strategy but I could make it more effective if I could find a better way of identifying compound words.
Once you have a list of words you wish to check, Text2Go will speak each word in turn. If you do nothing, the word will be marked as correct. These words can then be added to the ‘white-list’ so they need never be checked again.
If you hear a word that is mispronounced, you can mark it as such with a click of the mouse. Once all words have been spoken, each will be either marked as correct or incorrect. Now all you need to do is enter corrections for each of the mispronounced words. These will then be added to the pronunciation dictionary.
This approach makes it very easy to check just a few words or a large list. You can watch a video of this in action here.
Once you’ve gone to the effort of identifying and correcting the pronunciation of a set of words or even if you’ve just verified a list of words, it would be nice if you could share this information with other Text2Go users. Others will gain the benefits of your corrections and you will gain the benefit of theirs. A win-win situation. This will result in a much larger pronunciation dictionary and in turn lead to more accurate text to speech.
To achieve this I wanted the sharing to require no extra effort on the part of the user. Therefore I’ve created an automatic-update like service that runs every couple of days. It runs completely in the background, requiring no interaction from the user. In fact you can continue to use Text2Go while it runs. First it downloads new pronunciation entries and white-listed words form the Text2Go web server. Then it uploads any corrections and white-listed words you entered locally. These are then merged and made ready for distribution in the next update.
The other major area of functionality I’ve enhanced for this release is Text Cleanup Rules. A Text Cleanup Rule is a power search and replace operation (using regular expressions) that gets applied to a document before it’s converted to text.
One example where Text Cleanup Rules can be useful is in identifying breaks in a document and inserting a pause. For example, a row of ******** or ————- is often used to denote a break in a document. By default these breaks would be pronounced as asterisk, asteriskk, asterisk…. and minus, minus, minus… This very quickly becomes tiresome.
Text2Go includes a rule to identify these breaks and replace them with a pause. A single rule can handle both forms of break and will match two or more *’s or -’s, with or without spaces in between.
In the previous version of Text2Go you could only create these by editing XML files. For this release I’ve added a built-in editor. The editor allows you to test your rule on a sample block of text as you edit it. Text Cleanup rules are also shared in the same way as pronunciation corrections. You can watch a video of the new editor in action here.
Finally I’ve added a few minor enhancements.
Clipboard Monitor. When you turn on the Clipboard Monitor, Text2Go will automatically add any text copied to the clipboard to the current document. Very convenient when converting text from PDFs, Word documents, email, etc.
Motor-Mouth. Works the same way as the Clipboard Monitor, except that instead of adding text to the current document, it speaks it aloud.
Status Display in the System Tray. In addition to displaying the current Text2Go status on the toolbar in Internet Explorer, it’s also displayed in the icon in the system tray (icon in the bottom right of the screen near the time).
Option to control Whether Text2Go is Started at PC Startup Time. By default Text2Go is started when you boot your PC, but for those who only use Text2Go occasionally, you may prefer not to have it started every time.
This release has been very satisfying to me personally. However I’m afraid that it may have been a little self-indulgent. To ensure this is not the case for the next release, I’m running a 10 Second Poll so you can vote on the next major feature you’d like to see added to Text2Go. Please take the time to vote.
You can download Text2Go 3.0 here.
Important Tip for RealSpeak Samantha, Serena and Tom Voice Users
July 30, 2008 at 9:48 pm | Posted in RealSpeak, text to speech, Text2Go, Uncategorized | Leave a commentThe other day I needed to splice some voice samples together for my post on RealSpeak Voice Pronunciation. I was using the free audio editing tool Audacity and happened to notice something disturbing about the waveform that had been generated. I was using the RealSpeak Samantha voice and it was quite clear that a certain amount of audio clipping had occurred.
You can see this in the regions I’ve highlighted in red, where the natural shape of the waveform looks to be cutoff or clipped.
If we zoom right in so the individual waves are visible, you can clearly see that each peak has been chopped off.
Does this matter?
Yes. I’m no audio expert but we’re actually throwing away part of the signal and this will produce some audio distortion.
Can it be fixed?
Yes. The fix is as simple as adjusting the volume of the voice (don’t confuse this with the volume on your PC). You can adjust the volume of an individual voice using the Text2Go Options page. By default, the volume of all voices is set to Normal. By lowering this a couple of notches, the output for Samantha will no longer be clipped.
Converting the same text to speech produced the following waveform.
You can see that the waveform is no longer clipped at the top or the bottom.
Similarly, when we zoom in, each peak is nicely rounded and no longer chopped off.
Do other RealSpeak Voices suffer the same problem?
Serena and Tom also suffer some clipping, so if you use these voices make sure you adjust the volume setting down one or two notches. The other RealSpeak voices are not clipped at the Normal volume setting and don’t need to be adjusted.
Vista Woes – Again!
June 20, 2008 at 10:00 pm | Posted in Microsoft, Text2Go, Why was it so hard | 6 CommentsI’ve started into my Vista testing in preparation for the next release of Text2Go. Once again, UAC (User Account Control) issues have reared their ugly head. Although not insurmountable, they have required a certain number of contortions.
Problem 1
The first issue discovered involved a new option I’ve added to Text2Go. When you install Text2Go, it places a shortcut in the Windows Startup menu so that it’s launched every time you boot your PC. This great for those who use Text2Go regularly. It means it’s already running when you go to use it. However for those who only use Text2Go occasionally, it’s not worth launching it every time you start your PC.
If you don’t want Text2Go to run at startup, you can easily remove the shortcut from the Startup menu. However this is not very intuitive. It assumes
1. you know Text2Go is launched using a shortcut in the Startup menu.
2. you know how to remove shortcuts from the Windows menu.
Therefore I’ve added a simple checkbox in the Text2Go options that lets you choose whether Text2Go is launched at startup or not (thanks to Stephane Grenier founder of LandLordMax for this suggestion). Behind the scenes all it does is add or remove a shortcut from the Windows Startup menu. This is very simple, except on Vista when you’ve installed Text2Go for all users. Given that the default is to install for all users, I suspect that 99% of people leave it this way. It means Text2Go will be available for all users of the PC. It also means that the current user doesn’t have permission to add/remove this shortcut. In order to successfully perform this operation, the current user must be briefly elevated to Administrator. Fortunately Windows provides a simple way of achieving this. By using the ShellExecute command and passing ‘runas’ as the verb parameter, Vista will run the command elevated, prompting the user for permission as necessary. We use the DOS ‘del’ command to remove the shortcut and ‘copy’ to reinstall the shortcut. Problem 1 solved.
Problem 2
The second problem was a lot subtler and required greater contortions. During every install, the installation process needs to run elevated. If it didn’t it couldn’t even write files to standard locations such as the Program Files folder. At the end of the Text2Go installation process, Internet Explorer is used to display some quickstart tutorials. This serves two purposes. Firstly it ensures that the Text2Go toolbar is made visible in IE. Secondly it makes it very easy for new users to complete the quickstart tutorials and gain an understanding of Text2Go.
One consequence of launching IE during installation is IE also ends up running elevated. This is not ideal from a security perspective. IE will not be running in protected mode and if the user happens to browse to a dodgy website after completing the tutorials, they could potentially be vulnerable to a security exploit. It’s important to note that this only occurs for the one instance of IE that is started during installation. In all other cases Text2Go runs with standard privileges and works quite happily with IE in protected mode.
Security issues aside, this behaviour introduces one subtle problem. As you will recall, Text2Go is also running elevated. The problem occurs when it writes out any of its settings files. Writing the settings files works fine. It’s the security permissions that the file is created with that’s the problem. The current user is given read/write access and all other users are given read only access. This is still correct, except that the current user is actually the administrator, due to the fact that Text2Go is running elevated. This means that the next time Text2Go is started as a standard user, it will not be able to write to its own settings file, effectively freezing all options to their values at install time. They cease to become options at all. Definitely not what was intended.
The solution? IE must be launched as a non-elevated process at install time.
This will ensure that Text2Go is in turn launched as non-elevated and the settings file will be created with read/write access for the correct user. The side benefit is that best practice security-wise will also be observed.
The problem is that Microsoft don’t provide an API call to launch a non-elevated process from an elevated process. Thankfully, Andrei Belogortseff founder of WinAbility Software has come up with a neat solution which he has demonstrated in his VistaElevator 2.0 application. I’ve modified this application so that it can be used to launch any application non-elevated. At the end of the Text2Go installation I use this to launch IE non-elevated.
Problem 3
The third problem is once again related to file permissions. The major new feature in the next release of Text2Go is the ability to edit pronunciation dictionaries from within Text2Go. Not only that, you can share your corrections with other Text2Go users. An automatic update like service is used to upload your contributions to the Text2Go server and download contributions from the rest of the Text2Go community. This means that your local dictionary files will be updated on a regular basis. Unfortunately a standard user doesn’t have permission to write to these files (due to the default permissions set at install time). Now it would be possible to run an elevated instance of Text2Go to perform the update. However this would require the user to confirm this for every update. Not something you want to do on a daily basis. The update service has been designed to run in the background, requiring no user interaction.
The approach I’ve taken is to modify the permissions on the dictionary files at the end of the installation. There doesn’t seem to be a way of doing this in MSI, so I run the CACLS.exe utility at the end of the installation to grant all users read/write access to these files.
Problem 4
The final problem turned out not to be specifically Vista related but because I found it during Vista testing, I’m going to blame Vista anyway:) Normally when testing, I run Vista in a Virtual Machine and launch my setup from a shared drive on the ‘virtual’ network. This works well. However I was finding that if I first copied the setup file to Vista’s local hard drive and then launched the setup, the MSI installer would abort immediately and report an error about an invalid MSI package. Very strange. I’d made minimal changes to the installation script, none of which could cause such a catastrophic failure. When I tried a setup from a build a few weeks earlier, it installed without problem.
The only major change I could think of during this period was an upgrade from VS2005 to VS2008. This had been a very straightforward upgrade, with no problems encountered. Digging into the problem a little further, I was able to find the msiexec installation log. The last line contained an entry stating that the MSI package couldn’t be found. This would be the problem.
The Text2Go installation is packaged as a single Winzip self-extracting archive. This contains two files, a setup.exe and Text2GoSetup.msi. Setup.exe is a special bootstrap application provided by Microsoft that first checks for the necessary installation prerequisites (e.g. the appropriate MSI installer and .NET framework versions) and then launches the MSI. When I created the Winzip self-extracting archive, I told it to automatically run setup.exe. It then waits for setup.exe to complete before deleting setup.exe and Text2GoSetup.msi. This ensures no temporary files are left around after the installation and has worked nicely up until now.
It seems that the behaviour of setup.exe has changed subtly in VS2008. It now exits as soon as it’s issued the command to run the MSI. The Winzip self-extracting archive was detecting this and immediately cleaning up the archive contents, deleting the MSI before it had a chance to run.
The solution – have the Winzip self-extractor wait for the completion of Text2GoSetup.msi rather than setup.exe. You can tell wzse to watch a different application using the -wait argument. Unfortunately it only accepts a filename up to 8 characters in length. Will Windows ever throw off its DOS legacy? The good news is this change fixed the problem. Text2GoSetup.exe once again works from any location.
I suspect that it was a timing issue and just happened to work when the setup was on a shared network drive. Perhaps the network delays allow msiexec just enough time to get a lock on the file before wzse could delete it.
Update: This solution only fixed it for me. The first beta tester I gave the setup to had it fail in exactly the way I described. The real solution – revert back to the VS2005 version of setup.exe
This is nicely described here
http://forums.microsoft.com/msdn/ShowPost.aspx?siteid=1&postid=3528121
Finally, Text2Go is once again in harmony with Windows Vista. Whew!
I hate to admit it, but watching my child’s swimming lesson is becoming tedious.
March 20, 2008 at 6:55 pm | Posted in iPod, MP3 Player, text to speech, Text2Go | Leave a comment
My daughter currently has a swimming lesson once a week on a Tuesday evening. At first I was very excited, as I was able to knock off work a little early and be home in time to take her to her lesson. A few weeks in and I’m ashamed to admit it but it’s becoming a little tedious.
The problem is that it’s a group lesson with 3 other kids, so my daughter is only in action 25% of the time. It’s only a ½ hour lesson but then I let her have another ½ hour after the lesson to muck around in the pool.
Some of the other parents must also be finding it tough going. They come armed with an array of reading material. Books, magazines and newspapers are commonplace.
The thing is I really like to show my daughter support. If my head is buried in a newspaper when she looks up to see if I’m watching (which she does regularly – perhaps too often if truth be told), she’s going to be very disappointed. I’m also genuinely interested in watching her progress. I find it one of the most satisfying things as a parent, watching my kids develop over time.
Then it hit me. Text2Go is just perfect for this situation. I can listen to an eBook, article or collection of blog posts on my iPod while I keep my eyes on my daughter at all times. During the lesson I can offer encouragement and after the lesson I can just keep an eye on her so that she’s safe while playing in the pool.
This is definitely going to be the plan for next week’s lesson. The biggest problem I foresee is going to be the background noise level. The noise generated by 100 kids in an enclosed pool with concrete walls that bounce the sound back and forth is extreme. The standard iPod earbuds don’t actually block out a lot of outside noise. I know that Sennheiser make some iPod earbuds that include a set of earfit rings of varing sizes. These allow you to find the size that best fits your ear, creating a tighter seal between ear and earbud, and hence blocking out more background noise.
The next level up is to buy some earbuds or headphones that have active noise cancellation. These may be necessary in this situation. I’ll find out next week.
Are you Frustrated When you Don’t get a Reply from Technical Support?
February 22, 2008 at 10:27 am | Posted in Text2Go | Leave a commentWe pride ourselves on providing great support for our Text2Go users. However sometimes things go wrong. For instance.
1. A technical support request is received via email.
2. We send a timely reply.
3. We receive an automated response that states the customer’s email Inbox is full and our reply couldn’t be delivered. Horror of horrors – we can’t contact them by email.
4. We start to receive further requests from the same customer, who is becoming increasingly frustrated.
What to do?
Try to contact them using old-fashioned alternative means. Unfortunately the customer didn’t supply a phone number but they did supply a mail address.
Therefore we’ve sent them a reply via snail mail with an answer to their problem and a plea to clear out their Inbox.
Unfortunately this is going to take a few days to arrive, all the while their frustration levels are going to continue to rise. I can feel my ears burning right now
Therefore I’m really hoping that the said customer will read this and clear our their Inbox.
Update
It seems that snail mail is still more reliable than email. The customer received the letter we sent, cleared out their inbox and we were able to resolve their support issue.
4 Quick Tips When Converting eBooks from Text to Speech
December 18, 2007 at 9:47 pm | Posted in eBook, text to speech, Text2Go, Uncategorized | 2 Comments
Today I purchased a new eBook ‘As the Mirror Cracks’by Steve Jordan and I thought I’d share a few tips on converting eBooks from text to speech.
1. Check the DRM permissions. In a perfect world people would trust each other and all eBooks would be DRM free. Thankfully Steve Jordan publishes all his books in multiple formats, none of which have any DRM protection. However the majority of eBooks available for sale are DRM-protected and they will cause you a world of pain. DRM-protected works place all sorts of restrictions on how and where you can view your eBook. When converting an eBook to speech, the DRM protection must allow the text to speech operation. Check very carefully before purchasing the eBook that you are granted this right. If it’s not explicitly stated, assume text to speech has been disabled. Even if the eBook allows text to speech, it will only allow it to be performed from within the authorized eBook reader. If this runs on your PC, then you will only be able to listen to the eBook while sitting at your computer. To use a product such as Text2Go to convert an eBook to an MP3 file that you can listen to on the go, the eBook will need to grant you ‘Copy and Paste’ rights. Most don’t, so it’s best just to say no to DRM-protected works.
2. Don’t convert an eBook in one single chunk or you’ll end up with one enormous track. If you lose your place during playback, it will be very hard to find it again as you will need to seek through an enormous file. Instead I create a playlist for the eBook and then split it up chapter by chapter and store each chapter as a track within the playlist. If I lose my place during playback, it’s easy to find the chapter I was up to and then do a quick seek within the corresponding track.
3. Don’t convert an entire eBook upfront. Instead I convert and listen to the first couple of chapters. This allows me to quickly identify any problem areas during the text to speech process. These may be mispronounced words (most common when the eBook contains a lot of jargon, slang or terminology specific to a particular field), or formatting specific to the eBook (e.g. special characters used to denote pauses, or dividers between sections, chapters, etc). I can then add corrections for the mispronounced words to the pronunciation dictionaries and create text cleanup rulesto handle the eBook’s specific formatting. With these in place I will convert the remaining chapters of the eBook.
4. Don’t use the free Microsoft voices. Listening to an entire eBook with one of these voices will not be a particularly pleasant experience. Instead purchase a high quality, natural-sounding voice.
That’s it. Do you have any tips of your own? Stay tuned for a review of ‘As the Mirror Cracks’.
Text2Go 1.5 Released
December 2, 2007 at 8:37 pm | Posted in iPhone, iPod, MP3 Player, text to speech, Text2Go | 8 Comments
Today Tumbywood Software is pleased to announce a significant upgrade to Text2Go, the Windows text to speech program that lets you listen to text from the web on your iPod. This is a free upgrade for all existing Text2Go owners as per our Lifetime License. The major new features are:
Support for all MP3 Players, not just the iPod. Text2Go will now create MP3 files that can be used on any MP3-capable device, such as mobile phones, PDAs and of course dedicated MP3 players, such as the Zune, iRiver, Creative Zen, SanDisk, etc. Details of the article source will be embedded in the generated MP3 file, including the URL, domain, and a screenshot for use as album art.
Dictionaries to correct mispronunciations. Commonly mispronounced words, such as brand names, acronyms and industry-specific jargon can now be corrected. Text2Go ships with a Technology dictionary containing corrections for over 200 mispronunciations. Further dictionaries will be rolled out over time and users can compile their own dictionaries. Compare the text to speech results of this (highly contrived) passage of text with and without the Techonology dictionary enabled.
A recent Wikipedia entry lists a number of influential technologies including itunes, myspace, ebooks, Wimax, xbox, facebook and antispyware.
Sample without the dictionary Sample with the dictionary
Text Cleanup Rules. A text cleanup rule will automatically remove or replace specific text from a document prior to the text to speech operation. Complex pattern matching criteria are specified using regular expressions so that only the intended text is removed or replaced. Text cleanup rules are best illustrated with a couple of examples.
A recent study showed that between 60%-65% of people preferred the colour green over blue.
Without a text cleanup rule, percentage ranges will not be spoken as the author intended.
Sample without text cleanup rule Sample with text cleanup rule
Removing references from research papers is another common use of text cleanup rules. For example.
A recent survey of blog topics indicate that the most popular are ‘Blogging for profit’,15 28 72 followed by ‘Blogging about blogging’,20 38 ‘Blogging about other bloggers’,109 127 ‘Blogging how to’,69 ‘Full time blogging’, and ‘Blog review’.19 24 115
Sample without text cleanup rule Sample with text cleanup rule
Like the pronunciation dictionaries, Text2Go ships with a number of Text Cleanup Rules and users can also create their own. As the creation of a text cleanup rule requires an understanding of regular expressions, we are providing a service to create text cleanup rules for users on request.
Finally, the following minor features have also been added
- Create playlists from within Text2Go.
- Display the current voice and playlist in the tooltip of the Text2Go command.
- ‘Speak from cursor’ in the View and Edit document window.
- Minor bug fixes.
Download a free 30-day trial of Text2Go 1.5 today
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.






