Frequency Domain Methods in Recurrent Neural Networks for Sequential Data Processing

Wolter, Moritz

Volltext

View/Open (2MB)

Author

Wolter, Moritz

Type of Scholarly Publication

Dissertation

Date of Exam

30.06.2021

Date of Publication

29.07.2021

Advisor

Yao, Angela

Co-Referee

Klein, Reinhard

Involved Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/9245
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-63361

Abstract

Machine learning algorithms now make it possible for computers to solve problems, which were thought to be impossible to automize. Neural Speech processing, convolutional neural networks, and other recent advances are powered by frequency-domain methods like the FFT.
This cumulative thesis presents applications of frequency-domain methods in recurrent machine learning. It starts by exploring the combination of the STFT and recurrent neural networks. This combination allows faster training through windowing, end-to-end window function optimization, while low-pass filtering the Fourier coefficients can reduce the model size. Fourier coefficients are complex numbers, and therefore best processed in $mathbb{C}$. The development of a complex recurrent memory cell is an additional contribution of this text. To move a modern RNN-cell into the complex domain, we must make various design choices regarding the gating mechanism, state transition matrix, and activation functions. The design process introduces a new complex gate activation function the modSigmoid. Afterwards, we explore the interplay of state transition matrices and cell activation functions. It is confirmed that unbounded non-linearities require unitary or orthogonal state transition matrices to be stable.
General-purpose machine learning models often produce blurry video predictions. By using the phase of frames in their frequency domain representation, it is possible to do better. Image registration methods allow the extraction of transformation parameters. For single pre-segmented objects on input video frames, phase modification can help to predict future images.
The FFT represents all inputs in the fixed Fourier representation. The FWT works with infinitely many wavelets, all of which can serve as potential bases. This text proposes a loss function, which allows wavelet optimization and integrates the FWT into convolutional and recurrent neural networks. Replacing dense linear weight matrices with sparse diagonal matrices and fast wavelet transforms allows spectacular parameter reductions without performance loss in some cases. Finally, the last chapter finds that wavelet quantization can reduce the memory space required to store and transmit a convolutional neural network.

Maschinelle Lernalgorithmen erlauben es, Programme zu entwickeln, die Probleme lösen, die noch vor Kurzem für Computer als unlösbar galten. Fortschritte in der neuronalen Sprachverarbeitung, schnelle Faltungsnetze, und andere Neuentwicklungen jüngerer Zeit nutzen die schnelle Fourier Transformation (FFT).
Diese kumulative Arbeit widmet sich der Kombination von maschinellen Lernalgorithmen und Datenverarbeitung im Frequenzbereich. Die Kurzzeit-Fourier-Transformation wird mit rückgekoppelten neuronalen Netzen kombiniert. Diese Kombination erlaubt es, die Fenster-Funktion gemeinsam mit allen Gewichten zu optimieren. Sie beschleunigt den Lernprozess und ermöglicht durch Tiefpass-Filtern die Netzgröße zu reduzieren.
Fourier-Koeffizienten sind komplexe Zahlen, um sie im komplexen Zahlenraum verarbeiten zu können, wird der Entwurf komplexer rückgekoppelter Speicherzellen diskutiert. Hierbei kommt den Zell-Toren, der Aktivierungs-Funktion sowie der Zustands-Matrix besondere Bedeutung zu. Für komplexwertige Tor-Gleichungen wird die ModSigmoid-Aktivierung vorgeschlagen. Darüber hinaus wird bestätigt, dass unbeschränkte Zell-Aktivierungs-Funktionen orthogonale oder unitäre Zustandsmatrizen benötigen, um eine stabile Zelle zu bilden.
Klassische maschinelle Lernmodelle produzieren oft verschmierte Vorhersagen auf Video-Daten. Diese Arbeit enthält einen Lösungsvorschlag für Video-Bilder mit nur einem präsegmentierten Objekt. In diesem Fall lassen sich, mit Hilfe von Bildregistrierungsmethoden, Transformationsparameter aus der Phase vorheriger Frames ableiten. Mit Hilfe dieser Parameter lässt sich dann eine Vorhersage errechnen, indem die Phase des aktuellen Bildes modifiziert wird. Eine rückgekoppelte Zelle zu diesem Zweck wird vorgestellt.
Im Vergleich zur schnellen Fourier Transformation, die immer die gleiche Basis Nutzt stehen für die schnelle Wavelet Transformation unendlich viele Basis-Funktionen zur Verfügung. Aus allen möglichen Wavelets das Richtige auszuwählen ist nicht immer leicht. In dieser Arbeit wird daher eine Kostenfunktion zur automatischen Optimierung von Wavelets vorgeschlagen und die schnelle Wavelet Transformationen zur Kompression neuronaler Netze genutzt. Anstelle dicht besetzter Gewichtsmatrizen lassen sich Diagonalmatrizen in Verbindung mit den Vorwärts- und Rückwärts-Transformationen verwenden. In einigen Fälle hat dieser Ansatz keinen Genauigkeitsverlust zur Folge. Im letzten Kapitel wird abschließend ein Faltungsnetz mit Hilfe von Wavelet Quantisierung und Huffman Kodierung komprimiert.

Subjects

Rückgekoppelte Neuronale Netze, Frequenzbereich, Zeitreihen, Recurrent neural networks, Time series processing, Frequency-domain methods

Classification (DDC)

004 Informatik

Zitiervorschlag
BibTeX

Wolter, Moritz: Frequency Domain Methods in Recurrent Neural Networks for Sequential Data Processing. - Bonn, 2021. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-63361

@phdthesis{handle:20.500.11811/9245,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-63361,
author = {{Moritz Wolter}},
title = {Frequency Domain Methods in Recurrent Neural Networks for Sequential Data Processing},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2021,
month = jul,
note = {Machine learning algorithms now make it possible for computers to solve problems, which were thought to be impossible to automize. Neural Speech processing, convolutional neural networks, and other recent advances are powered by frequency-domain methods like the FFT.
This cumulative thesis presents applications of frequency-domain methods in recurrent machine learning. It starts by exploring the combination of the STFT and recurrent neural networks. This combination allows faster training through windowing, end-to-end window function optimization, while low-pass filtering the Fourier coefficients can reduce the model size. Fourier coefficients are complex numbers, and therefore best processed in $mathbb{C}$. The development of a complex recurrent memory cell is an additional contribution of this text. To move a modern RNN-cell into the complex domain, we must make various design choices regarding the gating mechanism, state transition matrix, and activation functions. The design process introduces a new complex gate activation function the modSigmoid. Afterwards, we explore the interplay of state transition matrices and cell activation functions. It is confirmed that unbounded non-linearities require unitary or orthogonal state transition matrices to be stable.
General-purpose machine learning models often produce blurry video predictions. By using the phase of frames in their frequency domain representation, it is possible to do better. Image registration methods allow the extraction of transformation parameters. For single pre-segmented objects on input video frames, phase modification can help to predict future images.
The FFT represents all inputs in the fixed Fourier representation. The FWT works with infinitely many wavelets, all of which can serve as potential bases. This text proposes a loss function, which allows wavelet optimization and integrates the FWT into convolutional and recurrent neural networks. Replacing dense linear weight matrices with sparse diagonal matrices and fast wavelet transforms allows spectacular parameter reductions without performance loss in some cases. Finally, the last chapter finds that wavelet quantization can reduce the memory space required to store and transmit a convolutional neural network.},
url = {https://hdl.handle.net/20.500.11811/9245}
}

The following license files are associated with this item: