Machine Perception and Cognition Group | ZHAW School of Engineering

“AI is THE key technology of the digital transformation, across sectors and industries, with major effects on our societies. Our research thus makes major contributions to the development of robust and trustworthy AI methods, and we enthusiastically teach their safe implementation and application.”

Professor Dr Thilo Stadelmann

Fields of expertise

Pattern recognition with deep learning
Machine perception, computer vision and speaker recognition
Neural system development

The MPC group conducts pattern recognition research, working on a wide variety of tasks relating to image, audio, and signal data per se. We focus on deep neural network and reinforcement learning methodology, inspired by biological learning. Each task we study has its own learning target (e.g., detection, classification, clustering, segmentation, novelty detection, control) and corresponding use case (e.g., predictive maintenance, speaker recognition for multimedia indexing, document analysis, optical music recognition, computer vision for industrial quality control, automated machine learning, deep reinforcement learning for automated game play or building control), which in turn sheds light on different aspects of the learning process. We use this experience to create increasingly general AI systems built on neural architectures.

Services

Insight: keynotes, trainings
AI consultancy: workshops, expert support, advise, technology assessment
Research and development: small to large-scale collaborative projects, third party-funded research, student projects, commercially applicable prototypes

Team

Head of Research Group

Prof. Thilo Stadelmann Dr. rer. nat.

+41 (0) 58 934 72 08
thilo.stadelmann@zhaw.ch

Projects

Unfortunately, no list of projects can be displayed here at the moment. Until the list is available again, the project search on the ZHAW homepage can be used.

Projects

Oops, an error occurred! Code: 20250422093119b5451b69

Publications

Meier, Benjamin Bruno; Elezi, Ismail; Amirian, Mohammadreza; Dürr, Oliver; Stadelmann, Thilo,

2018.

Learning neural models for end-to-end clustering [paper].

In:

Artificial Neural Networks in Pattern Recognition.

8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018.

Springer.

pp. 126-138.

Lecture Notes in Computer Science ; 11081.

Available from: https://doi.org/10.1007/978-3-319-99978-4_10
Hibraj, Feliks; Vascon, Sebastiano; Stadelmann, Thilo; Pelillo, Marcello,

2018.

Speaker clustering using dominant sets [paper].

In:

2018 24th International Conference on Pattern Recognition (ICPR).

24th International Conference on Pattern Recognition (ICPR 2018), Beijing, China, 20-28 August 2018.

IEEE.

pp. 3549-3554.

Available from: https://doi.org/10.1109/ICPR.2018.8546067
Amirian, Mohammadreza; Schwenker, Friedhelm; Stadelmann, Thilo,

2018.

Trace and detect adversarial attacks on CNNs using feature response maps [paper].

In:

Artificial Neural Networks in Pattern Recognition.

8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018.

Springer.

pp. 346-358.

Lecture Notes in Computer Science ; 11081.

Available from: https://doi.org/10.1007/978-3-319-99978-4_27
Meier, Benjamin; Stadelmann, Thilo; Stampfli, Jan; Arnold, Marek; Cieliebak, Mark,

2017.

Fully convolutional neural networks for newspaper article segmentation [paper].

In:

Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto Japan, 13-15 November 2017.

Kyoto:

CPS.

Available from: https://doi.org/10.21256/zhaw-1533
Lukic, Yanick X.; Vogt, Carlo; Dürr, Oliver; Stadelmann, Thilo,

2017.

Learning embeddings for speaker clustering based on voice equality [paper].

In:

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2017), Tokyo, 25-28 September 2017.

IEEE.

Available from: https://doi.org/10.1109/MLSP.2017.8168166

Other releases

When	Type	Content
2023	Extended Abstract	Thilo Stadelmann. KI als Chance für die angewandten Wissenschaften im Wettbewerb der Hochschulen. Workshop (“Atelier”) at the Bürgenstock-Konferenz der Schweizer Fachhochschulen und Pädagogischen Hochschulen 2023, Luzern, Schweiz, 20. Januar 2023
2022	Extended Abstract	Christoph von der Malsburg, Benjamin F. Grewe, and Thilo Stadelmann. Making Sense of the Natural Environment. Proceedings of the KogWis 2022 - Understanding Minds Biannual Conference of the German Cognitive Science Society, Freiburg, Germany, September 5-7, 2022.
2022	Open Reserach Data	Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, and Alireza Darvishy. FormulaNet: A Benchmark Dataset for Mathematical Formula Detection. One unsolved sub-task of document analysis is mathematical formula detection (MFD). Research by ourselves and others has shown that existing MFD datasets with inline and display formula labels are small and have insufficient labeling quality. There is therefore an urgent need for datasets with better quality labeling for future research in the MFD field, as they have a high impact on the performance of the models trained on them. We present an advanced labeling pipeline and a new dataset called FormulaNet. At over 45k pages, we believe that FormulaNet is the largest MFD dataset with inline formula labels. Our dataset is intended to help address the MFD task and may enable the development of new applications, such as making mathematical formulae accessible in PDFs for visually impaired screen reader users.
2020	Open Research Data	Lukas Tuggener, Yvan Putra Satyawan, Alexander Pacha, Jürgen Schmidhuber, and Thilo Stadelmann, DeepScoresV2. The DeepScoresV2 Dataset for Music Object Detection contains digitally rendered images of written sheet music, together with the corresponding ground truth to fit various types of machine learning models. A total of 151 Million different instances of music symbols, belonging to 135 different classes are annotated. The total Dataset contains 255,385 Images. For most researches, the dense version, containing 1714 of the most diverse and interesting images, is a good starting point.