Experience

Research Positions

Senior Research Scientist

Google DeepMind

Jan 2025 – Present Mountain View (USA, California)

Senior Research Scientist at Google DeepMind working on Gemini. I focus on large-scale pre-training. More to come soon.

Senior Research Scientist

NXAI

Sep 2024 – Dec 2024 Linz (Austria) | Amsterdam (The Netherlands)

Senior Research Scientist at NXAI in the LLM Team (joined temporarily to bridge the Visa application time to join Google DeepMind in the US). I focused on large-scale pretraining, designing a JAX-based code base which can scale to training 70B models. I conducted research on a 7B model scale, distributed over 256 H100 GPUs, and focused on architecture research for advancing the xLSTM as a Transformer alternative. Tasks included:

Designing a scalable research code-base for training large-scale models across 256 GPUs.
Conducting research on the xLSTM architecture and its potential as a Transformer alternative.
Writing and optimizing Pallas and Triton Kernels on H100 GPUs.

Student Researcher

Google DeepMind

Aug 2023 – Nov 2023 Amsterdam (The Netherlands)

Student Research Intern at Google DeepMind. I was supervised by Mostafa Dehghani and focused on generative multi-modal pretraining. This included training models at scales of up to 4 billion parameters, distributed across 512 devices. The research has been continued in the Gemini project. Tasks included:

Designing a generative pretraining objective for datasets with interleaved text-image sequences (e.g. blog posts).
Parallelizing models over up to 512 devices with tensor and data parallelism.
Training Generative Multi-Modal Large Language Models (VLM) with up to 4 billion parameters.
Setting up evaluations and demo of developed generative model.
Investigating scaling laws for VLMs.

Research Intern

Microsoft Research

Mar 2023 – May 2023 Amsterdam (The Netherlands)

Research Intern in the AI4Science lab at Microsoft Research. I was supervised by Johannes Brandstetter and focused on improving scientific simulations with AI and Deep Learning. We developed a Diffusion-like Neural PDE Solver which provides significantly longer accurate rollouts in complex, chaotic PDEs like Navier Stokes and weather dynamics. Our research was published at NeurIPS 2023, see our paper for more details. Tasks included:

Analyzing limitations of current methods and challenges of long stable predictions of Neural PDE solvers.
Developing PDE-Refiner, a novel method which, based on noise-based refinement, provides significantly longer accurate predictions.
Implementing and Training Video Diffusion Models on 1D and 2D PDEs.
Running large hyperparameter searches and model comparisons.

Student Research Assistant

University of Amsterdam (AIRLab), Ahold Delhaize

Jul 2019 – Dec 2019 Amsterdam, Zaandam (The Netherlands)

Research Assistant in the AIRLab (ILPS) at the University of Amsterdam in cooperation with Ahold Delhaize (full-time till August, part-time till December). My research, supervised by Pengjie Ren, focused on advancing task-oriented dialogue systems to conduct human-like conversations with diverse responses. The results are summarized in our paper Simultaneously Improving Utility and User Experience in Task-oriented Dialogue Systems.

Student Research Assistant

Vrije Universiteit Amsterdam

Nov 2018 – Jun 2019 Amsterdam (The Netherlands)

Research Assistant (part-time) in the field of high-order automated theorem proving developing a hammer for Lean. A hammer reduces high-order formulas into first-order clauses to efficiently check for possible proofs and makes manual formalizations much easier. The research is part of the Lean Forward project supervised by Jasmin Christian Blanchette and Robert Y. Lewis.

Student Intern

Mercedes Benz Research and Development North America (MBRDNA)

May 2017 – Aug 2017 Sunnyvale (USA, California)

Student intern in the team Trajectory Planning (Research and Development - Autonomous Driving) working on predicting agent intentions and future traffic scenarios. The research focused on using Generative Adversarial Networks (GANs) in the context of video generation. I have been supervised by Yan Meng.

Cooperative Student / Student Intern

Daimler AG, Mercedes-Benz

Dec 2016 – Sep 2018 Stuttgart (Germany)

Student intern in the team Image Understanding (Research and Development - Autonomous Driving) working on hierarchical multi-label classification with a special focus on rare classes. The research was conducted both on bounding-box detection and semantic segmentation for complex, urban traffic scences. I have been supervised by Björn Fröhlich (Dec 2016 - Mar 2017) and Jonas Uhrig (Apr 2018 - Sep 2018). During the last internship period, I wrote my bachelor thesis on Hierarchical multi-label object detection of rare classes for autonomous driving (PDF).

Teaching

Courses and student supervision

Courses

I have been a teaching assistant and lecturer for a couple of courses of the Master program “Artificial Intelligence” at the University of Amsterdam. A short description of each can be found below.

Deep Learning (Lecturer & TA: 19/20, 20/21, 21/22, 22/23)

This course teaches the fundamentals of deep learning including backpropagation, initialization and regularization techniques, and optimizing deep neural networks. Furthermore, we discuss common neural network architectures (CNNs, RNNs, GNNs), and generative models (Energy-based, VAE, GANs, Normalizing Flows). I have been responsible for creating and teaching a series of Jupyter notebooks to more than 150 students showing the implementation of the most important models, including their benefits and drawbacks. The notebooks are uploaded on this website, and more details on the course can be found here.

Foundation Models (Lecturer & TA: 23/24)

This course reviews recent advances in Foundation Models across multiple modalities. I am giving a lecture on “Training Models at Scale”, discussing various parallelism strategies for distributed training of large models. I also supervise a group of students in their research project.

Advanced Topics in Computational Semantics (Lecturer & TA: 19/20, 20/21, 21/22, 23/24)

In this research-focused course, we discuss recent advances in the field of Natural Language Processing and Computational Semantics. The topics include multi-task learning and meta-learning, as well as transfer learning from large language models (BERT-family) and multi-lingual tasks. The students work on research projects in groups, of which some can become published as conference papers. I have given a lecture on Transformer models, and supervised a group of students in the course years 2020 and 2021.

Natural Language Processing 1 (TA: 19/20)

This course teaches the fundamentals of Natural Language Processing, with the second half of the course focusing on RNNs, Attention and applications including machine translation and summarization.

Fairness, Accountability, Confidentiality and Transparency in AI (TA: 19/20)

This course focuses on four, often underrated topics in Artificial Intelligence: Fairness, Accountability, Confidentiality and Transparency. The goal of the course is to gain a general understanding of those four topics, and reproduce a published paper/method in one of those four fields.

Information Retrieval 1 (TA: 19/20)

The course discusses state-of-the-art techniques that constitute the core of information retrieval systems, such as search engines, recommender systems, and conversational agents. It focuses on evaluation, document representation and matching, learning to rank, and user interaction.

PhD-Level courses

ASCI Computer Vision by Learning (Head TA)

For the ASCI PhD course “Computer Vision by Learning, 2022” (website), I have been the Head Teaching Assistant for designing and teaching the practical sessions. The 1-week course covers fundamentals of Computer Vision by using Machine/Deep Learning, but also the recent trends and research in this domain, including Vision Transformer, Group-Equivariant CNNs, and Self-Supervised Learning. For the course, I have created a series of 5 practicals where students get to implement and experiment with these recent trends in Computer Vision, and they are publicly available on this website.

ELLIS European Summer School on Information Retrieval (Lecturer)

I joined the 15th European Summer School on Information Retrieval (ESSIR 2024) as lecturer. I will talk about Transformers, LLMs, and how to train these models at larger scale.

Student supervision

Kaleigh Douglas (MSc AI thesis, Jan 2021 - Jan 2022)
Nadja Rutsch (MSc QUVA Intern, Jun 2021 - Oct 2021; MSc AI thesis, Apr 2022 - Jan 2023)
Frank Brongers (MSc AI thesis, Nov 2021 - Aug 2022)
Mátyás Schubert (MSc AI thesis, Nov 2021 - Nov 2022)
Danilo de Goede (MSc AI thesis, Nov 2022 - Jun 2023)
Angelos Nalmpantis (MSc AI thesis, Nov 2022 - Jun 2023)

If you are a student looking for a thesis supervisor and working on a similar topic to my research interests, feel free to send me a mail.

Conferences

Reviewing and presentation

I have been involved in the following conferences:

Organized: CORR-2024, UAI-2025
Reviewer for: ECCV-2020, ICCV-2021, CausalUAI-2021, ICLR-2022, CVPR-2022, CLeaR-2022, NeurIPS-2022, CRL-2022, CML4Impact-2022, CDS-2022, nCSI-2022, ICLR-2023, CLeaR-2023, TSRL4H-2023, Physics4ML-2023, UAI-2023 (Top-Reviewer), NeurIPS-2023, ICML-2024, TMLR
Volunteer for: ICLR-2020, ICML-2020
Presented work at: NeurIPS-2020, ICLR-2021, ICLR-2022, ICML-2022, UAI-2022, ICLR-2023, ICML-2023, UAI-2023, NeurIPS-2023