Speech recognition (SR)
Fonix provides tools for doing speech recognition: VoiceIn SDKs are based on Fonix's proprietary neural network technology, which provides accurate speech recognition even in noisy environments.
Fonix VoiceIn process
The speaker-independent SR engine performs three main activities: audio collection, speech detection and analysis, and phoneme and word recognition.
Raw data is collected from an input source, such as a microphone, and sent to audio processing. From here "framed" output contains the audio data necessary for speech recognition. In the next step, frequency components are extracted and sent to the neural networks, where phoneme probability estimates are calculated.
Neural networks are a key componenet of Fonix speech recognition technology.
The continuous word decoder compares the phoneme probabilities against a language dictionary, then returns the highest word probabilities. Speech recognition is the result.
Fonix VoiceIn SDKs
- Standard Edition 4.1 (SE)
- Game Edition 4.1 (GE)
- Karaoke Edition 4.1 (KE)
- Game Edition 5.0 (GE)
VoiceIn 4.1 enhancements
For more information about Fonix SDKs click on the links above.
Phonetic scoring for language learning tools
- Provides the boundaries for each recognized phoneme.
- Provides the score for each recognized phoneme.
- Provides the International Phonetic Alphabet (IPA) symbol for each recognized phoneme.
- The new scoring capabilities are available for known and unknown target words.
- The new scoring capabilities are available for all Fonix supported languages.
Other enhancements
- Improved rejection of non-speech sounds.
- Improved speech framing.
- Improved speech detection.
- Improved out-of-vocabulary rejection.
