1

First-pass decoding with n-gram approximation of RNNLM: The problem of rare words

Character-based Neural Network Language Models (NNLM) have the advantage of smaller vocabulary and thus faster training times in comparison to NNLMs based on multi-character units. However, in low-resource scenarios, both the character and …

The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging

In this paper, we presented a neural network system for DCASE 2018 task 2, general purpose audio tagging. We fine-tuned the Google AudioSet feature generation model with different settings for the given 41 classes on top of a fully connected layer …

First-pass decoding with n-gram approximation of RNNLM: The problem of rare words

Recurrent Neural Network Language Models (RNNLMs) can be utilized in first-pass decoding by approximating them to N- gram models. Although these approximated RNNLMs have shown to improve the Word Error Rate (WER), our experiments show that the …

New Baseline in Automatic Speech Recognition for Northern Sámi

Automatic speech recognition has gone through many changes in recent years. Advances both in computer hardware and machine learning have made it possible to develop systems far more capable and complex than the previous state-of-the-art. However, …

Aalto system for the 2017 Arabic multi-genre broadcast challenge

We describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of …

Character-based units for Unlimited Vocabulary Continuous Speech Recognition

We study character-based language models in the state-of-the-art speech recognition framework. This approach has advantages over both word-based systems and so-called end-to-end ASR systems that do not have separate acoustic and language models. We …

Improved Subword Modeling for WFST-Based Speech Recognition

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, …

Automatic Construction of the Finnish Parliament Speech Corpus

Automatic speech recognition (ASR) systems require large amounts of transcribed speech data, for training state-of-the-art deep neural network (DNN) acoustic models. Transcribed speech is a scarce and expensive resource, and ASR systems are prone to …

Reading validation for pronunciation evaluation in the Digitala project

We describe a recognition, validation and segmentation system as an intelligent preprocessor for automatic pronunciation evaluation. The system is developed for large-scale high stake foreign language tests, where it is necessary to reduce human …

Towards SamiTalk: A Sami-Speaking Robot Linked to Sami Wikipedia

We describe our work towards developing SamiTalk, a robot application for the North Sami language. With SamiTalk, users will hold spoken dialogues with a humanoid robot that speaks and recognizes North Sami. The robot will access information from the …