I am a Post-Doctoral Research Associate co-affiliated with the Department of Computing and the Department of Earth Science, Imperial College London. My research focuses on HPC and compiler intermediate representations (IRs), mainly on automating cache-level optimizations for stencil computations.
I received my PhD from the Department of Computing, Imperial College London, in 2023. I hold an MSc in Advanced Computer and Communication Systems, specialization in Intelligent Systems - Methods of Computational Intelligence and Applications and a Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki.
Most of my time is allocated to the following projects:
Previously, I worked as a researcher for the Faculty of Engineering, the Aristotle University of Thessaloniki, for the DigiPro project.
Please reach out if you are interested in cache optimizations, compiler IRs, loop-level transformations, parallel programming, shared- and distributed-memory parallelism, benchmarking, and reproducibility.
PhD in High Performance, Embedded and Distributed Systems, 2023
Imperial College London
MSc in Intelligent Systems - Methods of Computational Intelligence and Applications, 2019
Aristotle University of Thessaloniki (AUTH)
Dipl. Eng. in Electrical and Computer Engineering, 2017
Aristotle University of Thessaloniki (AUTH)
Responsibilities include:
Responsibilities include:
external_link
.Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, applying temporal blocking to practical applications’ stencils remains challenging. These computations often consist of sparsely located operators not aligned with the computational grid (“off-the-grid”). Our work is motivated by modelling problems in which source injections result in wavefields that must then be measured at receivers by interpolation from the grided wavefield. The resulting data dependencies make the adoption of temporal blocking much more challenging. We propose a methodology to inspect these data dependencies and reorder the computation, leading to performance gains in stencil codes where temporal blocking has not been applicable. We implement this novel scheme in the Devito domain-specific compiler toolchain. Devito implements a domain-specific language embedded in Python to generate optimized partial differential equation solvers using the finite-difference method from high-level symbolic problem definitions. We evaluate our scheme using isotropic acoustic, anisotropic acoustic, and isotropic elastic wave propagators of industrial significance. After auto-tuning, performance evaluation shows that this enables substantial performance improvement through temporal blocking over highly-optimized vectorized spatially-blocked code of up to 1.6x.
Feel free to contact via the following form: