Research Positions
Senior Research Scientist at NXAI in the LLM Team (joined temporarily to bridge the Visa application time to join Google DeepMind in the US). I focused on large-scale pretraining, designing a JAX-based code base which can scale to training 70B models. I conducted research on a 7B model scale, distributed over 256 H100 GPUs, and focused on architecture research for advancing the xLSTM as a Transformer alternative. Tasks included:
Student Research Intern at Google DeepMind. I was supervised by Mostafa Dehghani and focused on generative multi-modal pretraining. This included training models at scales of up to 4 billion parameters, distributed across 512 devices. The research has been continued in the Gemini project. Tasks included:
Research Intern in the AI4Science lab at Microsoft Research. I was supervised by Johannes Brandstetter and focused on improving scientific simulations with AI and Deep Learning. We developed a Diffusion-like Neural PDE Solver which provides significantly longer accurate rollouts in complex, chaotic PDEs like Navier Stokes and weather dynamics. Our research was published at NeurIPS 2023, see our paper for more details. Tasks included:
Courses and student supervision
I have been a teaching assistant and lecturer for a couple of courses of the Master program “Artificial Intelligence” at the University of Amsterdam. A short description of each can be found below.
This course teaches the fundamentals of deep learning including backpropagation, initialization and regularization techniques, and optimizing deep neural networks. Furthermore, we discuss common neural network architectures (CNNs, RNNs, GNNs), and generative models (Energy-based, VAE, GANs, Normalizing Flows). I have been responsible for creating and teaching a series of Jupyter notebooks to more than 150 students showing the implementation of the most important models, including their benefits and drawbacks. The notebooks are uploaded on this website, and more details on the course can be found here.
This course reviews recent advances in Foundation Models across multiple modalities. I am giving a lecture on “Training Models at Scale”, discussing various parallelism strategies for distributed training of large models. I also supervise a group of students in their research project.
In this research-focused course, we discuss recent advances in the field of Natural Language Processing and Computational Semantics. The topics include multi-task learning and meta-learning, as well as transfer learning from large language models (BERT-family) and multi-lingual tasks. The students work on research projects in groups, of which some can become published as conference papers. I have given a lecture on Transformer models, and supervised a group of students in the course years 2020 and 2021.
This course teaches the fundamentals of Natural Language Processing, with the second half of the course focusing on RNNs, Attention and applications including machine translation and summarization.
This course focuses on four, often underrated topics in Artificial Intelligence: Fairness, Accountability, Confidentiality and Transparency. The goal of the course is to gain a general understanding of those four topics, and reproduce a published paper/method in one of those four fields.
The course discusses state-of-the-art techniques that constitute the core of information retrieval systems, such as search engines, recommender systems, and conversational agents. It focuses on evaluation, document representation and matching, learning to rank, and user interaction.
For the ASCI PhD course “Computer Vision by Learning, 2022” (website), I have been the Head Teaching Assistant for designing and teaching the practical sessions. The 1-week course covers fundamentals of Computer Vision by using Machine/Deep Learning, but also the recent trends and research in this domain, including Vision Transformer, Group-Equivariant CNNs, and Self-Supervised Learning. For the course, I have created a series of 5 practicals where students get to implement and experiment with these recent trends in Computer Vision, and they are publicly available on this website.
I joined the 15th European Summer School on Information Retrieval (ESSIR 2024) as lecturer. I will talk about Transformers, LLMs, and how to train these models at larger scale.
If you are a student looking for a thesis supervisor and working on a similar topic to my research interests, feel free to send me a mail.
Reviewing and presentation
I have been involved in the following conferences:
Reviewer for: ECCV-2020, ICCV-2021, CausalUAI-2021, ICLR-2022, CVPR-2022, CLeaR-2022, NeurIPS-2022, CRL-2022, CML4Impact-2022, CDS-2022, nCSI-2022, ICLR-2023, CLeaR-2023, TSRL4H-2023, Physics4ML-2023, UAI-2023 (Top-Reviewer), NeurIPS-2023, ICML-2024, TMLR
Presented work at: NeurIPS-2020, ICLR-2021, ICLR-2022, ICML-2022, UAI-2022, ICLR-2023, ICML-2023, UAI-2023, NeurIPS-2023