Microsoft has announced that its speech recognition system has reached an important milestone. The company's researchers claimed that their system can now transcribe conversational speech as efficiently as human beings. In a statement, Microsoft's Chief Speech Scientist Xuedong Huang said that this a historic achievement.
The system logged a word error rate of 5.9 percent, surpassing the 6.3 percent error rate record that it set last month. According to Microsoft, the new rate is "about equal" to the one shown by professional transcriptionists. The system uses neural language models which aggregate similar words for the purpose of better generalization. It also uses an in-house developed deep learning Computational Network Toolkit for designing appropriate algorithm.
The system called NIST 2000 is likely to be used in different capacities. Microsoft plans to use this technology to boost its accessibility tools. The technology may also be used on Xbox and personal digital assistants including Cortana.
Microsoft CEO Satya Nadella said that the technology is expected to have far-reaching influence in computing. He equated the utility of the system to the influence of the graphic user interface.
Harry Shum, executive vice president of Microsoft's artificial intelligence and research group, said that even five years ago, this feat would have been inconceivable. He added that the development will make Cortana more efficient and may lead to the creation of "a truly intelligent assistant."
However, Microsoft is not the only company making advances in the area. Baidu Research collaborated with Stanford University and the University of Washington to develop Deep Speech 2. The program is said to have the ability to transcribe speech three times faster than a human being can type.