|
|
 |
|
|
|
| Products |
|
|
| 1. Delta Speech Recognizer : Delta Recognizer V5.0 |
| Overview: |
| Delta Speech Recognizer (Server) is a continuous speech recognition engine. It enables developers to easily create a variety of different types of speech-enabled applications ranging from small-vocabulary applications, like command-control and auto attendant systems, to large-vocabulary applications, like address search and internet voice search. |
| |
| Features: |
| - Large vocabulary, speaker-independent, and enrollment free |
| - High accuracy and robust to noise |
| - Support for multiple languages |
| - Grammar-based language model |
| - Support for Just-In-time grammar |
| - Support for tone recognition |
| - Easy integration with lightweight design |
| - Flexible client-server architecture for large-scale support |
| - Support for confidence score |
| - Thread safe |
|
| |
| Specifications: |
| - Supported Platforms: PC Windows, Linux |
| - Supported Languages: Traditional Chinese / Simplified Chinese / English / Taiwanese |
|
| |
|
| 2. Delta Compact Speech Recognizer |
| Overview: |
Word Recognition is the basic mode in speech recognition technology. It exist several default words in the system. When the user speaks, the voice is conducted by signal processing, features are extracted and similarities calculated, and then the highest score among the words result in recognizing the correct word. Word Recognition uses the full match method to recognize what the user says.
In Keyword Spotting, the user can speak any combination of words and the keywords are recognized while filler words are filtered out. Keyword Spotting can extract and recognize certain keywords from the user's speech. |
| |
| Features: |
| - Speaker-independent and Speaker dependent |
| - High accuracy and robust to noise |
| - Support for multiple languages |
| - Small footprint |
|
| |
| Specifications: |
| - Supported Platforms: Philips TriMedia (PNX1500) / pSOS, Motorola ColdFire (MCF5249) / μClinux, Intel ARM (StrongARM, XScale) / WinCE or Linux, TI DSP5402, Winbond W55VA75 / GeneralPlus GPCE061A |
| An example of specification for speech recognition |
| Platform |
32 Bit RISC |
16 Bit RISC |
| Processor load |
81 MHz (72MIPS) |
49 MHz (10MIPS) |
| Response time |
100 ms |
100 ms |
| Memory requirements |
300k Bytes |
4k Bytes |
| Vocabulary size |
33 items |
20 items |
| Accuracy |
90% (word) / 80% (keyword) |
90% (word) |
|
|
| - Supported Languages |
|
| |
|
| 3. Delta Verbal Information Verifier |
| Overview: |
| Delta Verbal Information Verification (DVIV) is a speaker authentication system which verifies a speaker by the verbal contents in the user's registered profile. Typically in banking applications, speakers have to pass the authentication process and then get the permission to receive the personal information. In the past, the speaker authentication process is done by human. If it can be replaced by computers, both time and human resources will be saved a lot. |
| |
| Features: |
| - Fast |
| - High accuracy (95% above) |
| - Easy integration with speaker verification |
| - Support for confidence score |
| - Thread safe |
|
| |
| Overview: |
| Delta Verbal Information Verification (DVIV) is a speaker authentication system which verifies a speaker by the verbal contents in the user's registered profile. Typically in banking applications, speakers have to pass the authentication process and then get the permission to receive the personal information. In the past, the speaker authentication process is done by human. If it can be replaced by computers, both time and human resources will be saved a lot. |
| |
|
| 4. Delta Gender Recognizer |
| Overvie: |
Gender Recognition is a way to capture a speaker's voice characteristics to determine whether the gender of the speaker is male or female, as shown in the flow chart below.
|
|
| |
| Features: |
| - No language barrier |
| - Speaker Independent for any user |
| - High recognition rate is up to 95% even in noisy environment |
| - Provide C/C++ API functions and example code for application use |
|
| Specifications: |
| - Supported Platforms: PC Windows, Linux |
| - Memory: 90Kbytes |
|
| |
|
| 5. Delta Signal Detector |
| Overview: |
Auto dialing system has been widely used in call center all over the world. Most of dialing systems transfer the calls to agents after the calls are connected; however, for the calls were picked up by answer machine or voice mail are not necessary to be transferred to agents. Besides call transfer, redial is a convenient way to reach the numbers which have not been connected; however, it works worthless if the system repeating dial the numbers are invalid, suspended, or non-existed. Through Delta Signal Detector, the most innovated and intelligent system, the call respondents can be easily distinguished and categorized to 14 groups which shown as below. By sorting these 14 groups, dialing system can be programmable to route the calls in proper direction and works more efficiently in call center.
14 Groups:
- Voice mail box
- Turn off
- No answer
- Can not be connected
- In Suspension
- Non-existed phone number
- In suspension or no response
- No answer and turn to voice mail box
- Busy
- Busy with call waiting
- Busy and turn to voicemail box
- Phone number is not registered
- Phone number is not acceptable
- Do not disturb
One of smart designs in Delta Signal Detector is to delete invalid numbers from database automatically if the numbers had been detected as invalid number by system, or to leave a short message in mailbox if the calls are answered by voice mail.
The signal detection product provides C/C++ API functions for application developer use. The application can implements streaming recording and feeds-in the streaming data to the signal detection API in order to quickly get the call result in a real-time response. |
| |
| Features: |
| - Excellent recognition rate: up to 99% |
| - Auto detect and filter out (color) ring tone and other DTMF signals |
| - No language barrier |
| - Signal Independent: The quality of signal is regardless of background noise. Low computing power are required |
| - Fast response time: as short as 2.5 seconds to detect the voice greeting. |
|
| |
| Specifications: |
| - Supported Platforms: PC Windows, Linux, Embedded OS |
|
| |
|
| 6. Delta Speaker Verifier |
| Overview: |
| The speaker verification technology shown below, also known as speaker authentication or speaker detection, is used to verify the identity of a speaker Using the claimed speaker's voice print model in the database to match the captured tester's voice print characteristics, the system will be informed of the identity of the user. It could lead to two possible misjudgments. One is to wrongly accept (falsely accept) an imposter. The other is to wrongly reject (falsely reject) the real user. |
|
Delta Speaker Verification can be divided into two types of technologies. One is related to a text-dependent test, that is, the voice content of individual voice print authentication database and the testing voice must be the same. The other is related to the text-independent test, that is, the voice content of an individual voice print authentication database and the testing voice are allowed to be different. We provide both types mentioned above to allow users a wider application.
Speaker verification technology is widely used. Personal aspects can be applied to control equipment, such as user permission for personal computers. For enterprise applications, speaker verification can be used as fraud detection for credit card customers. It can also be used to extract specific speakers’ call logs from a large number of customer service dialogues, which would reduce the workload of customer service agents and improve customer service quality (Quality Management). |
| |
| Features: |
| - No language barrier |
| - Speaker Independent for any user |
| - Recognition rate is up to 94% for text-dependent speaker verification |
| - Provide C/C++ API functions and example code for application use |
|
| |
| Specifications: |
| - Supported Platforms: PC Windows, Linux |
| - Memory: 76Kbyte |
|
| |
|
| 7. Delta Speech Synthesizer |
| Overview: |
Text to Speech is the technology to transfer text to speech, and it’s also called as Speech Synthesis. Input a text, processed by the following module, and then generate the output speech.
- Text Processing Module: Extract synthesis information from text content
- Prosody Generating Module: estimate prosody from text information
- Signal Processing Module: adjust speech data by prosody
The word separated by space is small paragraph,comma is middle paragraph, and period is large paragraph; then finding corresponding syllable from speech corpus, adjusted by proper prosody, and generate the output speech.
Delta focuses on Syllable base Text to Speech technology, and it’s suitable for various embedded platform. Delta can provide various versions of speech corpus and synthesizing speed, according to system resource. More system resource, the synthesis speech quality is better; less system resource, the smaller size speech corpus and faster synthesizing speed is also workable. Besides Mandarin Text to Speech, Delta also provides Mandarin/English mixed lingual version. |
| |
| Features: |
| - Syllable based Mandarin Text to Speech |
| - Speech corpus, ranging from 10 Mbytes to 1 or 2 Mbytes |
| - Suitable for embedded device |
|
| |
| Specifications: |
| - Supported Platforms: Philips TriMedia (PNX1500) / pSOS, Intel ARM (StrongARM, XScale) / WinCE or Linux, An evaluation on iPAQ h3600, Strong ARM 206MHz + 16MB RAM + Compact Flash. |
| Quality |
Lib size (.so) |
Run time memory |
Real-time ratio |
| Best |
595KB |
2M/4.7M/5.5M |
0.65~0.51 |
| Intermediate |
595KB |
2M/4.7M/5.5M |
0.35~0.24 |
| Normal |
537KB |
2M/4.7M/5.4M |
0.21~0.11 |
|
| Corpus |
16K16bit tonal |
16K16bit |
8K16bits |
8K16bits mu law |
| Male |
X |
3.9MB |
2MB |
1.1MB |
| Female |
16.6MB |
3.63MB |
1.9MB |
1MB |
|
- The website also provides real time Text to Speech demo. Please link to the following
URL: http://60.250.37.144/mmi_tts.html. Enter the text in the text box; press the Text-to-Speech button for synthesis. Then click the text hyperlink below the Text-to-Speech button to play the synthesis speech |
|
| |
|
| 8. Entertainment Search |
| Overview: |
Listening to music and browsing photo are common applications in PC and MID, but it’s not easy to find the desired songs or photos with GUI. By voice, the user can simply says the singer and/or title name to get the desire songs, and says the photo tag and/or taken date to view the desired photos.
Scenario1 – Music Search
Says “Celine Dion” to listen to the album collections by Celine Dion
Says “Celine Dion, My heart will go on” to listen to that song
Scenario2 – Photo Search
Says “Pleo” to view dinosaur Pleo pictures
Says “2009 March” to view the photos taken in 2009/3 |
|
|
| |
| Features: |
| - Listen to music by saying the singer and/or title names |
| - Browse photos by saying tag and/or date |
| - Easy integration with speaker verification |
| - Speaker-independent, enrollment free |
| - Robust to noise |
|
| |
| Specifications: |
| - Supported Platforms: PC Windows, Linux |
| - Supported Languages: Mandarin, English, Japanese, Italian, French, German, Spanish, Portuguese |
|
| |
|
| 9. Mobile Internet Search |
| Overview: |
The Internet-browsing functions on mobile devices like cell phones, PDA, and MID are becoming more robust. Internet Search is a typical application that allows a user to browse the Internet with his voice. For example you can hold down your device's "talk" button and speak the keyword to the device like "Barack Obama". The device sends what you say to an ASR server that converts it into text and an Internet browser is launched with the searched keyword. The recognition work does not necessarily need to be done on a remote server. We also provide a stand-alone version.
Scenario1
Press button "3" and say "dao4 Google zhao3 tai2-da2-dian4"
Press button "3" and say "tai2-da2-dian4"
Scenario2
Press button "ASR" and say "dao4 Google zhao3 san1 tai2-da2-dian4"
Press button "ASR" and say "san1 tai2-da2-dian4" |
|
|
| |
| Features: |
| - Large vocabulary recognition: million of words on server; hundred thousand of words on client |
| - Constrained by word (phrase) length achieves higher accuracy — patent pending |
| - Word-based or phrase-based recognition |
| - Speaker-independent, enrollment free |
| - Robust to noise |
|
| |
| Specifications: |
| - Supported Platforms: PC/NB/Netbook: Windows, Linux, MID: Embedded Linux, Cell phone: Windows Mobile, Java ME, Android |
| - Supported Languages: Mandarin, English |
|
| |
|
| 10. MAP Search |
| Overview: |
Entering address into GPS device by hands is not convenient for users, especially while driving a car. Voice input provides hands-free and eyes-free to entering destination address.
Scenario1 – 3 Steps
Says “Taipei Neihu”, then “Ruey Kuang Road”, then “186”
Scenario2 – One Shot
Says “Taipei Neihu, Ruey Kuang Road, 186” |
| |
| Features: |
| - Enter address by voice input |
| - 3 Steps or One Shot |
| - Speaker-independent, enrollment free |
| - Robust to noise |
|
| |
| Specifications: |
| - Supported Platforms: GPS device: Windows Mobile/CE, Smart Phone: Windows Mobile |
| - Supported Languages: Mandarin |
|
|
| |
|
|