Supplement to digital speech within 100 Hz bandwidth

Mon Aug 18 11:56:00 -0700 2008
manage

Why not use voice to text software and text to

speech software that is available now?

1. Timing     Lets assume the above is working perfectly. Lets also assume one speaker of a stereo is the original voice and the other is the sound from the speaker of the receivers computer. Lets also assume the original voice is delayed the same amount as the all the processes. In a three minute speech the sound from the stereo speakers will not be the same. The best example of this would be the varying sound of a record played with a hole drilled off center. The timing of the digital speech within 100 Hz bandwidth software is accurate to within plus-or-minus 5 mS at any time no mater how long the original live speech last.

2. The alphabet verses the phoneme     This is just as true today as it was over 100 years ago.

A Plan for the Improvement of English Spelling

For example, in Year 1 that useless letter c would be dropped to be replased either by k or s, and likewise x would no longer be part of the alphabet. The only kase in which c would be retained would be the ch formation, which will be dealt with later.

Year 2 might reform w spelling, so that which and one would take the same konsonant, wile Year 3 might well abolish y replasing it with i and Iear 4 might fiks the g/j anomali wonse and for all.

Jenerally, then, the improvement would kontinue iear bai iear with Iear 5 doing awai with useless double konsonants, and Iears 6-12 or so modifaiing vowlz and the rimeining voist and unvoist konsonants.

Bai Iear 15 or sou, it wud fainali bi posibl tu meik ius ov thi ridandant letez c, y and x -- bai now jast a memori in the maindz ov ould doderez -- tu riplais ch, sh, and th rispektivli.

Mark Twain

3. Non-text word that are commonly spoken.   Al Capp made an art-form of spoken words that are not easily written or read in his comic strip "Li'l Abner". How many times have you used the word waja? When you can't understand some, you say "Waja say?" (What did you say?) Voice to text software and text to speech software can't handle this, but the digital speech within 100 Hz bandwidth software would not recognize anything unusual.

Shannon - Hartley Capacity theorem

Shannon's Law (actually Shannon - Hartley Capacity theorem) relating noise, bandwidth, signal power, and capacity can be found in any good communication text. In common terms the bandwidth of voice can not be compressed. I believe that voice is made of speech and pitch and volume and timing. Digital voice is high fidelity. Digital speech is understanding the meaning of the words without recognizing that persons voice. When you read something, dose your mind recognize the voice of the writer? Speech is the part of voice that contains meaning. It is the part that gets ham radio operators DX contacts. Speech can be compressed in bandwidth because it is made of phonemes that are quantize into a fixed number of parts.

Since the phoneme comparator at the transmit sequence used the voice of the sender, it will also use the pitch of the sender. Since the phoneme length is variable in 10 mS step size, this software will take care of timing without exceeding the 100 Hz bandwidth. Our radios have automatic volume circuits (speech processors) to make them have a constant volume, which means, volume is not as important as speech. Again in common terms we can understand the words of an old man or a little girl even though the voices are very different.

The software actually uses the pitch of the senders voice when the transmit section of the software learns the senders phonemes. When the receive section plays one of the twelve sets of phonemes, that will include pitch of the twelve people who made the audio clips of the phoneme library.

Channel spacing

As ham radio operators, we have clear channels now that the sun spot cycle is at its minimum. When the sun spot cycle is at its maximum, there won't be many clear channels. With a 24 times reduction of bandwidth over SSB voice,  there will be many more clear channels.

Why has this not been done before?

Unfortunately this project was not invented by one of the "big guns" in the digital voice and speech recognition software group. "Not invented here" applies.

Timing is critical for this to work and that part of software is not what is usually taught in schools.

The old computers are no match for speed and amount of memory of today's computers. But people only remember the sound of the "Speak-n-Spell".

In order for this project to work throughout the world, the software must be free. There is no hardware required beyond that used for PSK-31. The software cannot be patented because my paper has been published on the internet. There is no way to make money from doing this. People who work in the digital voice and speech recognition business could lose their livelihoods, if this project was to succeed.

What would it sound like if the voice of the person doing the sending was also used to make up the phoneme library for receiving?

I was hoping to be one of those twelve people. It makes me sad, disappointed and angry to know there was nothing else I could do to answer this question.

Conclusion

Without someone to do it, this project will end with my paper http://docs.google.com/Doc?docid=dggwnj3m_28hcx4xkhg&hl=en and this supplement.

Supplement to digital speech within 100 Hz bandwidth
Mon Aug 18 18:11:49 -0700 2008
manage

I've been looking into the text-to-speech a bit since your last post and have concluded that what you propose is non-trival in the highest order. Plenty of people earning their phds off this field.

It's a lot more complicated than taking a bunch of phones and pasting them together to make anything that sounds like the speaker. They have whole databases full of diphones, triphones and more since they are modified in speech by the surrounding phones in word formation.

Google gnuspeak and take a gander at the manual for Monet that is linked from the project page to get an idea of the complexity.

Crazy complicated stuff and I haven't gotten past reading the manual to see if I can get an understanding of the underlying code. I'm no programmer though so we'll see.

Supplement to digital speech within 100 Hz bandwidth
Tue Aug 19 08:42:19 -0700 2008
manage

I first started out my project by looking at speech to text (speech recognition) software on the internet. The common underlining problem is that they do not care how long it takes to get the text. I had to invent a totally new way of doing this.

Since my project does not use text, I can’t help you with text-to-speech.

Mike