In this case it is not understand the speech. It gives attention how to predict the frequency shift. The frequency prediction range is -300 to +300 Hz. Now I have train it with English voices and one of the tests it will be to test it with other languages. Also the time that give it to the model is now 0.1 sec which is too small to understand speech. Generally speaking it tries to predict what it is trained to do. In my case it is trained to predict the frequency shift.