Pronunciation Correction Competition Results

August 30, 2008 at 4:20 pm | Posted in text to speech, Text2Go | 2 Comments

During the recent beta for Text2Go 3.0 we ran a competition to see who could submit the most number of corrections. The winner was Brad Isaac, author of the popular goal setting blog Persistence Unlimited. Brad has been sending in a steady stream of corrections since the beta went live. Thanks very much for your contributions. Brad received a set of Sennheiser Headphones for his efforts.

Today I thought I’d take a look at the pronunciation correction dictionaries to see how many new entries have been added since the official launch of Text2Go 3.0.

Here are the statistics.

108 new corrections

4 new text cleanup rules

1180 new words identified as correctly pronounced

Not bad, considering Text2Go 3.0 has only been released for 3 weeks. Thanks once again to all those who have contributed corrections, especially Brad!

Text2Go 3.0 Released – Pronunciation correction done right!

August 7, 2008 at 9:19 pm | Posted in text to speech, Text2Go | Leave a comment

Finally! It’s taken a lot longer than I expected. Software estimation proves once again to be an elusive art. The major new feature can be summed up as ‘Pronunciation correction done right’. Ever since I discovered text to speech technology I’ve been bugged by mispronunciations. Although quite rare, they tend to stand out in a document that’s being narrated. They’re especially grating if they occur multiple times in the same document. For this reason, most text to speech applications provide a way to enter corrections. The previous release of Text2Go provided this ability but it required the user to edit XML files and restart Text2Go each time. Not very user-friendly! It was a stop-gap solution until could find the time to implement a proper solution.

That time has come. When I first designed Text2Go I had a lot of ideas on how to efficiently identify and correct mispronunciations. With this release I’ve been able to put these ideas into practice. This has been very satisfying.

One of the first challenges is finding a way of efficiently identifying mispronunciations. Pronunciation errors are actually quite rare. The naive approach is to listen to a document from start to finish, noting down any mispronunciations as you go. You can then come back and enter corrections for the next time the offending words  are encountered. There are a couple of major problems with this approach.

The first is that you end up listening to the entire document, complete with mispronunciations. You’ll only get the benefit of the corrections you’ve entered the next time these words occur.

The second problem is the approach is incredibly inefficient. All documents are filled with high frequency words such as ‘a, is, the, and, in’ etc. These are never mispronounced but you have to listen to them over and over.

I wanted an approach that could identify and correct mispronunciations before listening to a document and was quick and efficient. So I came up with the following.

First, extract a list of words from the document and remove all duplicates. This single step means you only have to listen to a word at most once, no matter how many times it appears in the document.

Taking this one step further, once you’ve listened to a word and verified it to be correctly pronounced, it would be nice to be able to remember this so that you never have to check it again. This is particularly useful for eliminating the high frequency words mentioned above. Therefore Text2Go maintains a ‘white-list’ of correctly pronounced words. These are filtered from the document being checked, again significantly reducing the number of words requiring checking.

Of the remaining words, it would be nice to be able to identify the most likely to be mispronounced. The approach I’ve chosen is to spell-check the remaining words. Misspelt (or unrecognized) words are then placed on the top of the list. The reason is that brand names, jargon and slang that haven’t made it into the dictionary are more likely to be mispronounced. Of course correctly spelt words can also be mispronounced and unrecognized words correctly pronounced. It’s just a way of increasing the likelihood of identifying mispronunciations.

Another strategy is to identify compound words (i.e. two words run together) as I’ve discovered these are more likely to be mispronounced. The way I identify compound words is to find all words that are made up of exactly two correctly spelt words. Unfortunately this generates a number of false positives (e.g. ration = rat + ion).  It’s still a useful strategy but I could make it more effective if I could find a better way of identifying compound words.

Once you have a list of words you wish to check, Text2Go will speak each word in turn. If you do nothing, the word will be marked as correct. These words can then be added to the ‘white-list’ so they need never be checked again.

If you hear a word that is mispronounced, you can mark it as such with a click of the mouse. Once all words have been spoken, each will be either marked as correct or incorrect. Now all you need to do is enter corrections for each of the mispronounced words. These will then be added to the pronunciation dictionary.

This approach makes it very easy to check just a few words or a large list. You can watch a video of this in action here.

Watch Text2Go pronunciation correction in action

 

Once you’ve gone to the effort of identifying and correcting the pronunciation of a set of words or even if you’ve just verified a list of words, it would be nice if you could share this information with other Text2Go users. Others will gain the benefits of your corrections and you will gain the benefit of theirs. A win-win situation. This will result in a much larger pronunciation dictionary and in turn lead to more accurate text to speech.

To achieve this I wanted the sharing to require no extra effort on the part of the user. Therefore I’ve created an automatic-update like service that runs every couple of days. It runs completely in the background, requiring no interaction from the user. In fact you can continue to use Text2Go while it runs. First it downloads new pronunciation entries and white-listed words form the Text2Go web server. Then it uploads any corrections and white-listed words you entered locally. These are then merged and made ready for distribution in the next update.

The other major area of functionality I’ve enhanced for this release is Text Cleanup Rules. A Text Cleanup Rule is a power search and replace operation (using regular expressions) that gets applied to a document before it’s converted to text. 

One example where Text Cleanup Rules can be useful is in identifying breaks in a document and inserting a pause. For example, a row of ******** or ————- is often used to denote a break in a document. By default these breaks would be pronounced as asterisk, asteriskk, asterisk…. and minus, minus, minus… This very quickly becomes tiresome.

Text2Go includes a rule to identify these breaks and replace them with a pause. A single rule can handle both forms of break and will match two or more *’s or -’s, with or without spaces in between.

In the previous version of Text2Go you could only create these by editing XML files. For this release I’ve added a built-in editor. The editor allows you to test your rule on a sample block of text as you edit it. Text Cleanup rules are also shared in the same way as pronunciation corrections. You can watch a video of the new editor in action here.

Watch the Text Cleanup Rule editor in action

 

Finally I’ve added a few minor enhancements.

Clipboard Monitor. When you turn on the Clipboard Monitor, Text2Go will automatically add any text copied to the clipboard to the current document. Very convenient when converting text from PDFs, Word documents, email, etc.

Motor-Mouth. Works the same way as the Clipboard Monitor, except that instead of adding text to the current document, it speaks it aloud.

Status Display in the System Tray. In addition to displaying the current Text2Go status on the toolbar in Internet Explorer, it’s also displayed in the icon in the system tray (icon in the bottom right of the screen near the time).

Option to control Whether Text2Go is Started at PC Startup Time. By default Text2Go is started when you boot your PC, but for those who only use Text2Go occasionally, you may prefer not to have it started every time.

This release has been very satisfying to me personally. However I’m afraid that it may have been a little self-indulgent. To ensure this is not the case for the next release, I’m running a 10 Second Poll so you can vote on the next major feature you’d like to see added to Text2Go. Please take the time to vote.

You can download Text2Go 3.0 here.

Important Tip for RealSpeak Samantha, Serena and Tom Voice Users

July 30, 2008 at 9:48 pm | Posted in RealSpeak, text to speech, Text2Go, Uncategorized | Leave a comment

The other day I needed to splice some voice samples together for my post on RealSpeak Voice Pronunciation. I was using the free audio editing tool Audacity and happened to notice something disturbing about the waveform that had been generated. I was using the RealSpeak Samantha voice and it was quite clear that a certain amount of audio clipping had occurred.

You can see this in the regions I’ve highlighted in red, where the natural shape of the waveform looks to be cutoff or clipped.

If we zoom right in so the individual waves are visible, you can clearly see that each peak has been chopped off.

Does this matter?

Yes. I’m no audio expert but we’re actually throwing away part of the signal and this will produce some audio distortion.

Can it be fixed?

Yes. The fix is as simple as adjusting the volume of the voice (don’t confuse this with the volume on your PC). You can adjust the volume of an individual voice using the Text2Go Options page. By default, the volume of all voices is set to Normal. By lowering this a couple of notches, the output for Samantha will no longer be clipped.

Ideal Volume Setting for RealSpeak Samantha

Converting the same text to speech produced the following waveform.

Samantha Waveform with Correct Volume Setting

You can see that the waveform is no longer clipped at the top or the bottom.

Samantha Waveform Closeup with Correct Volume Setting

Similarly, when we zoom in, each peak is nicely rounded and no longer chopped off.

Do other RealSpeak Voices suffer the same problem?

Serena and Tom also suffer some clipping, so if you use these voices make sure you adjust the volume setting down one or two notches. The other RealSpeak voices are not clipped at the Normal volume setting and don’t need to be adjusted.

Evaluating RealSpeak Voice Pronunciation

July 29, 2008 at 10:30 pm | Posted in RealSpeak, text to speech | 2 Comments

Evaluating RealSpeak PronunciationDuring the course of adding a pronunciation editor to Text2Go, I’ve discovered some of the strengths and weaknesses of the RealSpeak voices when it comes to pronunciation. Pronunciation errors are quite rare, making it hard to build up a large collection of mispronounced words. Text2Go’s new pronunciation editor makes this very easy.

Now that I’ve identified an extensive list of mispronounced words, it’s possible to spot some trends and discover which voice is the most accurate.

Firstly, I’ve found that compound words can cause problems (e.g. afterword, longterm, screenshot ). Most common compound words are fine but often brand names that are made up of two words run together can be mispronounced. It’s very easy to correct these mispronunciations – you just separate the two words with a hypen or space (e.g. after-word, long term, screen-shot ). This occurs often enough that I’ve added a way to identify compound words in the pronunciation editor. I’ve found that Samantha is significantly better at pronouncing compound words that all the other RealSpeak voices.

A similar problem occurs with words having the prefix re-. For example reprogram, repurposed, rereleased . In these cases the re- is not identified as the re- prefix. Again the solution is simple, just add a hypen after the re (e.g. re-program, re-purposed, re-released ). Once again, Samantha does a better job of pronouncing re- prefixed words.

In order to hear the differences for yourself, I’ve chosen 10 mispronounced words and 4 voices. The table below contains each voice’s attempt to speak the word without any correction applied. Note – I’ve used a dash to indicate a passable but not perfect pronunciation.

Word Samantha Karen Daniel Tom
adobe
anticlimactic
biopic
signup
martian
packrat
resell
salesforce
spokesperson
wildlife
Uncorrected
Corrected

The Uncorrected row contains the voice’s uncorrected pronunciation attempt and the Corrected row contains the pronunciation after corrections have been applied. Notice once corrections have been applied, all voices pronounce all words correctly.

One set of results that surprised me were those for Tom. When I started writing up this post I was sure that Samantha was way ahead of the other voices. However this result shows that Tom is also a worthy contender. I’m still sure that Samantha has the most accurate pronunciation but the margin is not as great as I imagined.

My hunch is that Samantha is based on slightly newer technology and it’s the reason why the Samantha voice file is around 110MB in size whereas the others are around 70-90MB.

So does this mean that Samantha is the best voice and the one I should always use? What about regional differences?

Do voices from different regions pronounce words differently? Most definitely! Take the Australian voices Karen and Lee as examples. Not only do they have Australian accents, they correctly pronounce local Australian place names, whereas the other English voices can be way off. Listen to the following Australian place names (of aboriginal origin) spoken first by Samantha (US English) and then Karen (Australian English)

Pronunciation is only one criteria on which to choose a voice. I believe it’s more important to choose a voice you enjoy listening to. If you like the sound of Samantha then definitely choose her but if you prefer the sound of one of the other voices, go with them. Remember that pronunciation errors in normal day to day text are quite rare for any of the RealSpeak voices.

Graphic designer listens to scripture readings and Minnesota Twins commentary during daily commute.

April 2, 2008 at 9:01 pm | Posted in text to speech, Text2Go, Text2Go User Stories | Leave a comment

I have recently started a campaign to find out a bit about my Text2Go users and how they are making use of Text2Go. This third story is from Matthew, who was brave enough to be one of my very first beta testers. Matthew has provided a wealth of excellent feedback and support since then, so it’s with great pleasure that I publish his story here.

Text2Go has become part of my daily life. I use it on my morning commute to listen to blogs and articles that I otherwise would have spent time reading later in the day. I found Text2Go while searching the web for just such a program though at the time I didn’t even know if one existed. My wife and I work in cities about 75 miles apart and had decided to move to her town since she typically puts in more hours then me. I’m a graphic designer for a large international technology company which allows me to work from home a couple days at week so the move was kind of a no brainier. However, this meant that I would now be spending just under two and a half hours in my car on days I do have to commute. I knew there must be a way to make better use of my time and started to brainstorm different ideas for things I could accomplish on my drive. One thought I had was that I could convert some of the reading I do every day to spoken audio files on my iPod and that’s when I found Text2Go. It was still in its beta stage so I tried it out and found that the program was easy to use and the voices were realistic and easy to listen to.

Now my routine involves converting daily scripture readings and reflections to my iPod in the morning while I get ready for work and then listening to them on my drive in. For my drive home I’ll often convert blog articles and comments on my favorite sports team, the Minnesota Twins, or other articles I find that look interesting but are too long for a quick read. The end result is that I actually look forward to my drive and am able to spend more time away from my computer when at home. My commute doesn’t seem that long anymore and quite frankly I think I’d miss it if I changed jobs to one closer to home.

Even on the days I’m at home I’ll use Text2Go to have the computer read web pages to me while I’m busy doing other things. I also use it to proof important emails which really helps avoiding leaving words out of sentences or typing a word twice. All in all I find that Text2Go is one of the most used and useful programs on my computer and I’m very glad that I found it. Give it a try and you won’t be disappointed.

Matthew
http://armadillo44.blogspot.com/
http://armadillo44photos.blogspot.com/
http://emotiondesigntips.blogspot.com/

Thank you very much Matthew for taking the time to share your story. If you would like to share your experience, don’t hesitate to drop me a line.

Corporate Trainer makes productive use of travel time with Text2Go.

March 29, 2008 at 7:58 pm | Posted in text to speech, Text2Go, Text2Go User Stories | Leave a comment

I have recently started a campaign to find out a bit about my Text2Go users and how they are making use of Text2Go. This second story is from Rob Graham.

Listening to text to speech while travellingI’ll be happy to share a little info about myself and how I came to buy Text2Go.

I’m a corporate trainer in the areas of online marketing and advertising. As such, I’m often travelling or presenting. The nature of my field is that it is constantly in flux. In order for me to be able to make sure that my training offers the most relevant and up-to-date material means keeping up with all the news and changes in the field. The reality, however, is that I just don’t have the time to comb through the dozens of online newsletters I get daily in search of the useful tidbits.

Last week I was preparing for a 5 hour drive between cities. I had with me a folder of articles that I has printed and taken along with me with the hope that I would be able to find time to read through them. This hadn’t happened.

I said to myself, “ I wish I could just have somebody record these onto tape or CD so I could listen while I was traveling”. I then started looking around the web and found Text2Go and it seemed like the perfect solution. I downloaded the trial version, converted a couple of dozen articles straight from the web and within a few minutes I had over 5 hours of MP3s sitting in iTunes.

During my long drive I listened to the articles, and was able to get this useful information from the page and into my head using travel time that often is non-productive. I found the pronunciation of the avatar to be remarkably accurate and easy to listen to. The technology has come a long way in the past few years.

Now I have the ability to load my laptop or MP3 player up with the information I need to review and can do it at my leisure using time that would normally be less productive (wandering through airports, in the car, subways, etc). As far as an investment goes, Text2Go has already paid for itself.

Rob Graham
VP of Creative & Technical Training
The Laredo Group

Thank you very much Rob for taking the time to share your story. If you would like to share your experience, don’t hesitate to drop me a line.

 

Husband uses Text2Go to convert eBooks for his blind wife

March 28, 2008 at 8:55 pm | Posted in text to speech, Text2Go, Text2Go User Stories | Leave a comment

I have recently started a campaign to find out a bit about my Text2Go users and how they are making use of Text2Go. One of the aims is to allow me to better shape future versions of Text2Go. However, as this first story proves, it’s a very refreshing change to hear about someone else’s experiences. 

I put the following questions to Simon a few days after he purchased Text2Go. He immediately responded with the following story.

Mark: Could you provide some background information on who you are and what you do?

Simon: My name is Simon and I am an analyst programmer currently working for a tool hire company in the UK.

Mark: What sort of uses do you put Text2Go to?

Simon:

1. For my use, I have ebooks converted to my iPod. Client specifications, etc converted to listen to on route to different jobs and as my family send emails to me in the form of essays of no less than 4 pages, listen to them while doing other things.

2. My wife is partially blind so I convert ebooks for her to listen to while I am working.  I have also wired the house up to my home network so as certain sensors are triggered, vocal announcements are made to assist her.  I once a week go online and download the local paper and make the stories into audio files for her to listen to.  We have pets who come and go and took her some time to get use to so as they near her, the pet sensor is located within her radius and they are announced.

Mark: What sort of information do you typically convert to speech?

Simon:

1. Ebooks
2. Email
3. Client Specifications
4. Short audio soundbites
5. News stories
6. Audio Calendar events for wife

Mark: When do you listen to the information you’ve converted?

Simon:

1. My wife listens all day long…

2. Me, while driving, commuting, cooking and general housework.

Mark: Are there any other comments you may have?

Simon:

1. Would love similar controls for outlook like you built into internet explorer.
2. The software has aided in making my wife’s life a lot better… she now can keep up with current affairs and enjoy books again and now she wants me to convert more books that she and her friends who are also blind can listen to when they meet up every week at our house.

Mark:

Thank you so much for sharing your story Simon. I had never thought that Text2Go would be useful for the visually impaired as I believed you would need to be able to see the screen in order to perform the text to speech conversion in the first place. Doing the conversion for your wife (and her friends) is a great solution. It’s really nice to hear how much effort you’ve put in to make your home a safe and comfortable environment for your wife. I bet she really appreciates it.

Finally, adding a toolbar to Outlook is a great idea. I’m nearing completion of the next version of Text2Go and am starting to gather ideas for the following release, so I’ll certainly add your vote for an Outlook toolbar.

For more stories from Text2Go users, check back in the next couple of days. If you would like to share your experience, don’t hesitate to drop me a line.

I hate to admit it, but watching my child’s swimming lesson is becoming tedious.

March 20, 2008 at 6:55 pm | Posted in iPod, MP3 Player, text to speech, Text2Go | Leave a comment

The Dreaded Swimming LessonMy daughter currently has a swimming lesson once a week on a Tuesday evening. At first I was very excited, as I was able to knock off work a little early and be home in time to take her to her lesson. A few weeks in and I’m ashamed to admit it but it’s becoming a little tedious.

The problem is that it’s a group lesson with 3 other kids, so my daughter is only in action 25% of the time. It’s only a ½ hour lesson but then I let her have another ½ hour after the lesson to muck around in the pool.

Some of the other parents must also be finding it tough going. They come armed with an array of reading material. Books, magazines and newspapers are commonplace.

The thing is I really like to show my daughter support. If my head is buried in a newspaper when she looks up to see if I’m watching (which she does regularly – perhaps too often if truth be told), she’s going to be very disappointed. I’m also genuinely interested in watching her progress. I find it one of the most satisfying things as a parent, watching my kids develop over time.

Then it hit me. Text2Go is just perfect for this situation. I can listen to an eBook, article or collection of blog posts on my iPod while I keep my eyes on my daughter at all times. During the lesson I can offer encouragement and after the lesson I can just keep an eye on her so that she’s safe while playing in the pool.

This is definitely going to be the plan for next week’s lesson. The biggest problem I foresee is going to be the background noise level. The noise generated by 100 kids in an enclosed pool with concrete walls that bounce the sound back and forth is extreme. The standard iPod earbuds don’t actually block out a lot of outside noise. I know that Sennheiser make some iPod earbuds that include a set of earfit rings of varing sizes. These allow you to find the size that best fits your ear, creating a tighter seal between ear and earbud, and hence blocking out more background noise.

The next level up is to buy some earbuds or headphones that have active noise cancellation. These may be necessary in this situation. I’ll find out next week.

85% of Men Prefer Listening to a Female Voice

January 23, 2008 at 6:20 pm | Posted in RealSpeak, text to speech, Uncategorized | 5 Comments

Computerized Voice PrefencesI had a gut feeling, based on my own preferences that men prefer listening to female voices. I also assumed the opposite was true – women would prefer listening to male voices. Wrong!

To test these assumptions, I dug up some of my recent sales data for computerized voices. When you purchase Text2Go you can also choose to purchase one or more high quality RealSpeak voices. I looked at all purchases that included a single voice. I disregarded any purchase that included both male and female voices and any subsequent voice purchases. I also disregarded all sales of the Indian English female voice Sangeeta as there is no corresponding male Indian English voice.

It’s clear that men have a strong preference for the female voice and it’s what I expected. What is surprising is that women also prefer the female voice, albeit to a much lesser extent. In fact the statistics show woman don’t have a strong preference one way or the other. My wife certainly prefers the male computerized voices, so I’d expected this to be the case for the general population.

Not wanting to let hard data get in the way of assumptions and gut feelings, here are a couple of reasons that may explain the unexpected results for the women.

1. There are characteristics of the female voice that make it easier to produce a more natural sounding computerized voice.

2. We currently sell 7 female voices but only 3 male voices. This extra choice may give some bias to female voices. It may also be an indication that the developers of computerized voices have recognised the popularity of the female voice. Note that the US, UK and Australian accents have both male and female voices.

What’s your preference? Have a listen to the computerized voices on the Text2Go website and let me know.

4 Quick Tips When Converting eBooks from Text to Speech

December 18, 2007 at 9:47 pm | Posted in eBook, text to speech, Text2Go, Uncategorized | 2 Comments

As the Mirror Cracks by Steve JordanToday I purchased a new eBook ‘As the Mirror Cracks’by Steve Jordan and I thought I’d share a few tips on converting eBooks from text to speech.

1. Check the DRM permissions. In a perfect world people would trust each other and all eBooks would be DRM free. Thankfully Steve Jordan publishes all his books in multiple formats, none of which have any DRM protection. However the majority of eBooks available for sale are DRM-protected and they will cause you a world of pain. DRM-protected works place all sorts of restrictions on how and where you can view your eBook. When converting an eBook to speech, the DRM protection must allow the text to speech operation. Check very carefully before purchasing the eBook that you are granted this right. If it’s not explicitly stated, assume text to speech has been disabled. Even if the eBook allows text to speech, it will only allow it to be performed from within the authorized eBook reader. If this runs on your PC, then you will only be able to listen to the eBook while sitting at your computer. To use a product such as Text2Go to convert an eBook to an MP3 file that you can listen to on the go, the eBook will need to grant you ‘Copy and Paste’ rights. Most don’t, so it’s best just to say no to DRM-protected works.

2. Don’t convert an eBook in one single chunk or you’ll end up with one enormous track. If you lose your place during playback, it will be very hard to find it again as you will need to seek through an enormous file. Instead I create a playlist for the eBook and then split it up chapter by chapter and store each chapter as a track within the playlist. If I lose my place during playback, it’s easy to find the chapter I was up to and then do a quick seek within the corresponding track.

3. Don’t convert an entire eBook upfront. Instead I convert and listen to the first couple of chapters. This allows me to quickly identify any problem areas during the text to speech process. These may be mispronounced words (most common when the eBook contains a lot of jargon, slang or terminology specific to a particular field), or formatting specific to the eBook (e.g. special characters used to denote pauses, or dividers between sections, chapters, etc). I can then add corrections for the mispronounced words to the pronunciation dictionaries and create text cleanup rulesto handle the eBook’s specific formatting. With these in place I will convert the remaining chapters of the eBook.

4. Don’t use the free Microsoft voices. Listening to an entire eBook with one of these voices will not be a particularly pleasant experience. Instead purchase a high quality, natural-sounding voice.

That’s it. Do you have any tips of your own? Stay tuned for a review of ‘As the Mirror Cracks’.

« Previous PageNext Page »

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.

Follow

Get every new post delivered to your Inbox.