Markus Heinonen, PhD

I am Academy Research Fellow in Aalto University and Finnish Center of AI associated with research groups:

Office: room B360, CS-building, Aalto, Konemiehentie 2, 02150 Espoo, Finland
mobile: +358 44 294 2600

My guidelines for PhD students on doing machine learning research.

    Research topics
  • Deep Bayesian learning
  • Non-stationary Gaussian processes
  • Spectral kernels
  • Nonparametric dynamics of ODEs, SDEs, PDEs
  • Differential flows
  • Bioinformatics: metabolites, proteins, genomes


48. Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging
Aarne Talman, Hande Celikkanat, Sami Virpioja, Markus Heinonen, Jörg Tiedemann
NoDaLiDa 2023
[ arxiv | pdf ]

47. AbODE: Ab initio antibody design using conjoined ODEs
Yogesh Verma, Markus Heinonen, Vikas Garg
ICML 2023

46. Learning Energy Conserving Dynamics Efficiently with Hamiltonian Gaussian Processes
Magnus Ross, Markus Heinonen
TMLR 2023
[ arxiv | website | code ]

45. Incorporating functional summary information in Bayesian neural networks using a Dirichlet process likelihood approach
Vishnu Raj, Tianyu Cui, Markus Heinonen, Pekka Marttinen
[ arXiv ]

44. Latent Neural ODEs with Sparse Bayesian Multiple Shooting
Valerii Iakovlev, Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki
ICLR, 2023
[ arXiv ]

43. Generative Modelling With Inverse Heat Dissipation
Severi Rissanen, Markus Heinonen, Arno Solin
ICLR, 2023
[ arXiv | website ]

42. Human-in-the-loop assisted de novo molecular design
Iiris Sundin, Alexey Voronov, Haoping Xiao, Kostas Papadopoulos, Esben Bjerrum, Markus Heinonen, Atanas Patronov, Samuel Kaski, Ola Engkvist
Journal of Chemoinformatics, 2022
[ chemRxiv | paper ]

41. TCRconv: predicting recognition between T cell receptors and epitopes using contextualized motifs
Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Vladimir Gligorijevic, Satu Mustjoki, Richard Bonneau, Markus Heinonen, Harri Lähdesmäki
Bioinformatics 2022
[ biorXiv | paper ]

40. Modular Flows: Differential Molecular Generation
Yogesh Verma, Samuel Kaski, Markus Heinonen, Vikas Garg
NeurIPS 2022
[ arXiv | pdf | website | github ]

39. Tackling covariate shift with node-based Bayesian neural networks
Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
ICML 2022 (oral presentation)
[ arXiv | website ]

38. Variational multiple shooting for Bayesian ODEs with Gaussian processes
Pashupati Hegde, Cagatay Yildiz, Harri Lähdesmäki, Samuel Kaski, Markus Heinonen
UAI 2022
[ arXiv | pdf ]

37. Likelihood-Free Inference with Deep Gaussian Processes
Alexander Aushev, Henri Pesonen, Markus Heinonen, Jukka Corander, Samuel Kaski
Computational Statistics and Data Analysis 2022, 174:107529
[ arXiv | pdf | paper | doi ]

36. Modeling binding specificities of transcription factor pairs with random forests
Anni Antikainen, Markus Heinonen, Harri Lähdesmäki
BMC Bioinformatics 2022, 23:212
[ doi | paper ]

35. Prediction and impact of personalized donation intervals
Jarkko Toivonen, Yrjö Koski, Esa Turkulainen, Femmeke Prinsze, Pietro della Briotta Parolo, Markus Heinonen, Mikko Arvas
Vox Sanguinis 2021
[ doi:10.1111/vox.13223 | paper ]

34. De-randomizing MCMC dynamics with the diffusion Stein operator
Zheyang Shen, Markus Heinonen, Samuel Kaski
NeurIPS 2021
[ arXiv | pdf ]

33. Bayesian Inference for Optimal Transport with Stochastic Cost
Anton Mallasto, Markus Heinonen, Samuel Kaski
ACML 2021
[ arXiv | pdf ]

32. Continuous-Time Model-Based Reinforcement Learning
Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
ICML 2021
[ arXiv | pdf ]

We propose continuous-time model-based reinforcement learning setting, and derive a continuous-time actor-critic algorithm.

31. Predicting recognition between T cell receptors and epitopes with TCRGP
Emmi Jokinen, Markus Heinonen, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki
PLOS Computational Biology, 2021
[ bioRxiv | article | github ]

We propose GP classifiers to determine T cell receptor epitope specificity from human and mouse subjects. We employ multiple kernel learning to determine relevances of sequence positions. The classifier achieves better accuracy than the earlier kernel and random forest based methods.

30. Learning continuous-time PDEs from sparse data with graph neural networks
Valerii Iakovlev, Markus Heinonen, Harri Lähdesmäki
ICLR 2021
[ arXiv | pdf ]

29. Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations
Simone Rossi, Markus Heinonen, Edwin Bonilla, Zheyang Shen, Maurizio Filippone
[ arXiv | pdf ]

28. Substrate specificity of 2-Deoxy-D-ribose 5-phosphate aldolase (DERA) assessed by different protein engineering and machine learning methods
Sanni Voutilainen, Markus Heinonen, Martina Andberg, Emmi Jokinen, Hannu Maaheimo, Johan Pääkkönen, Nina Hakulinen, Juha Rouvinen, Harri Lähdesmäki, Samuel Kaski, Juho Rousu, Merja Penttilä, Anu Koivula
Applied Microbiology and Biotechnology, 2020
[ paper ]

Sample-efficient reinforcement learning using deep Gaussian processes
Charles Gadd, Markus Heinonen, Harri Lähdesmäki, Samuel Kaski
Technical report, 2020
[ arXiv | pdf ]

Scalable Bayesian neural networks by layer-wise input augmentation
Trung Trinh, Samuel Kaski, Markus Heinonen
Technical report, 2020
[ arXiv | pdf ]

27. Learning spectrograms with convolutional spectral kernels
Zheyang Shen, Markus Heinonen, Samuel Kaski
[ pdf | arXiv ]

We propose convolutional kernel structures with well-defined spectrograms to shed light into kernel learning and deep Gaussian processes.

26. ODE2VAE: Deep generative second order ODEs with Bayesian neural networks
Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki
NIPS 2019
[ pdf | arXiv ]

We propose VAE's for dynamical systems where the latent space is a second-order ODE with Bayesian neural networks.

25. Deep Convolutional Gaussian Processes
Kenneth Blomqvist, Samuel Kaski, Markus Heinonen
[ pdf | arXiv | github ]

We propose deep convolutional GPs that convolve repeatedly the image signal resulting with Gaussian processes. We achieve state-of-the-art accuracy on CIFAR-10 and MNIST with over 10 percentage point improvement in CIFAR-10 to DeepGP's and Convolution GP.

24. Bayesian Metabolic Flux Analysis reveals intracellular flux couplings
Markus Heinonen, Maria Osmala, Henrik Mannerström, Janne Wallenius, Samuel Kaski, Juho Rousu, Harri Lähdesmäki
[ pdf | arXiv | github ]

We propose Bayesian flux analysis, where full flux distributions are modeled and sampled, in contrast to point-wise FBA estimates.

23. Deep learning with differential Gaussian process flows
Pashupati Hegde, Markus Heinonen, Harri Lähdesmäki, Samuel Kaski
AISTATS 2019, PMLR 89:1812-1821, notable paper award (top 1%)
[ pdf | arXiv | github | poster ]

We propose novel paradigm of machine learning through SDE-GP flows that warp the inputs until final classification or regression function. We achieve better performance than deep GPs.

22. Harmonizable mixture kernels with variational Fourier features
Zheyang Shen, Markus Heinonen, Samuel Kaski
AISTATS 2019, PMLR 89:3273-3282
[ pdf | arXiv ]

We propose theoretical foundations of non-stationary kernels through harmonizable covariances, and present a practical Harmonizable mixture kernel which admit variational Fourier features. We propose Wigner distributions to visualise and interpret spectral kernels.

21. Learning Stochastic Differential Equations With Gaussian Processes Without Gradient Matching
Cagatay Yildiz, Markus Heinonen, Jukka Intosalmi, Henrik Mannerström, Harri Lähdesmäki
Machine Learning in Signal Processing, MLSP 2018
[ pdf | arXiv | github ]

We solve SDE's by fitting drift and diffusions to match trajectories as arbitrary sparse Gaussian processes. We extend the sensitivity equations to Euler-Maruyama approximation.

20. Learning unknown ODE models with Gaussian processes
Markus Heinonen, Cagatay Yildiz, Henrik Mannerström, Jukka Intosalmi, Harri Lähdesmäki
International Conference of Machine Learning, PMLR 80:1959-1968, ICML 2018
[ pdf | arXivgithub ]

We propose to fit black-box ODE models to arbitrary data. We represent ODE differentials as Gaussian processes, and propose efficient sensitivity equations to optimize the models.

19. Variational zero-inflated Gaussian processes with sparse kernels
Pashupati Hegde, Markus Heinonen, Samuel Kaski
Uncertainty in Artificial Intelligence, UAI 2018
[ pdf | arXiv | github ZIGP | github GPRN ]

We propose sparse kernel that can learn zero covariances, and predict exact zeros. We also derive sparse GPRN latent mixing models, and their SVI bounds.

Neural Non-Stationary Spectral Kernel
Sami Remes, Markus Heinonen, Samuel Kaski
[ arXiv | pdf ]

We propose neural network parameterisation of the Generalised Spectral Mixture (GSM) kernel. The DNN parameterisation improves scalability and efficiency of the spectral kernel.

18. Temporal clustering analysis of endothelial cell gene expression following exposure to a conventional radiotherapy dose fraction using Gaussian process clustering
Markus Heinonen, Fabien Milliat, Mohamed Benadjaoud, Agnès François, Valérie Buard, Georges Tarlet, Florence d’Alché-Buc, Olivier Guipaud
PLOS ONE, 0204960, 2018
[ paper ]

We propose temporally clustering endothelial cell gene expression profiles, which reveals clusters similarly expressed genes. We propose new temporal clustering technique over Bayesian expression models.

17. Learning with multiple pairwise kernels for drug bioactivity prediction
Anna Cichonska, Tapio Pahikkala, Sandor Szedmak, Heli Julkunen, Antti Airola, Markus Heinonen, Tero Aittokallio, Juho Rousu
Bioinformatics, 34(13):i509–i518, ISMB 2018
[ paper | code ]

16. mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion
Emmi Jokinen, Markus Heinonen, Harri Lähdesmäki
Bioinformatics, 34(13):i274-i283, ISMB 2018
[ paper | arXiv | pdf | github ]

We combine experimental and simulated data to learn a Gaussian process based protein stability predictor. We propose a Bayesian data transformation that calibrates the simulated data against the experimental one. Our method requires less experimental measurements due to inclusion of simulated data.

15. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity Upon Mutation
Kyle Barlow, Shane O Conchuir, Samuel Thompson, Pooja Suresh, James Lucas, Markus Heinonen, Tanja Kortemme
Journal of Physical Chemistry B, 122(21):5389-5399, 2018
[ paper ]

14. A Mutually-Dependent Hadamard Kernel for Modelling Latent Variable Couplings
Sami Remes, Markus Heinonen, Samuel Kaski
Asian Conference on Machine Learning, PMLR 77:455-470, ACML 2017
[ paper | arXiv | pdf | github ]

We introduce a new non-stationary kernel between inputs and signals, which allow non-stationary couplings between latent variables. The new kernel is based on Gibbs kernel and Generalised Wishart Process.

13. Non-Stationary Spectral Kernels
Sami Remes, Markus Heinonen, Samuel Kaski
Neural Information Processing Systems, NIPS 2017
[ paper | arxiv | pdf | github ]

We introduce non-stationary spectral kernels, which can learn covariances based on input-dependent frequencies (e.g. wavelets). We model the input-dependent frequencies as Gaussian process mixtures, and can learn signals with varying frequencies.

12. Random Fourier Features for operator-valued kernels
Romain Brault, Markus Heinonen, Florence d'Alche-Buc
Asian Conference on Machine Learning, PMLR 63:110-125, ACML 2016
[ abstract | PDF ]

We introduce random fourier features for vector-valued function learning, i.e. RFF's for operator-valued kernels.

11. Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo
Markus Heinonen, Henrik Mannerström, Juho Rousu, Samuel Kaski, Harri Lähdesmäki
Artificial Intelligence and Statistics, JMLR 51:732-740, AISTATS 2016
[ abstract | PDF | supplements | code ]

We model all kernel parameters and the noise as separete Gaussian processes which are smoothly input-dependent. HMC sampling reveals the full parameter function posteriors.

10. Genome wide analysis of protein production load in Trichoderma reesei
Tiina Pakula, Heli Nygren, Dorothee Barth, Markus Heinonen, Sandra Castillo, Merja Penttilä, Mikko Arvas
Biotechnology for Biofuels, 9:132, 2016
[ abstract ]

Transcriptomics and metabolic analysis of Trichoderma Reesei protein production.

9. Detecting time periods of differential gene expression using Gaussian processes: An application to endothelial cells exposed to radiotherapy dose fraction
Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Georges Tarlet, Marc Benderittter, Farida Zehraoui, Florence d'Alche-Buc
Bioinformatics, 31(5):728-735, 2015
[ abstract | nsgp R package ]

We propose a two-sample differential testing model on Gaussian processes. We introduce a new two-sample test that is continuous along time and results in differential confidences along time.

Learning nonparametric differential equations with operator-valued kernels and gradient matching
Markus Heinonen, Florence d'Alche-Buc
arXiv, 2014
[ arXiv ]

We propose learning ODE's as operator-valued kernel functions.

Time-dependent gaussian process regression and significance analysis for sparse time-series
Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Florence d'Alche-Buc
Machine Learning in Systems Biology, MLSB 2013

We propose non-stationary kernels for Gaussian processes and a new Gaussian process optimization criteria suitable for sparse data. We propose new likelihood ratio tests for significance analysis using GP's.

8. Metabolite Identification trough Machine Learning -- Tackling CASMI Challenge using FingerID
Huibin Shen, Nicola Zamboni, Markus Heinonen, Juho Rousu
Metabolites, 3(2):484-505, 2013
[ abstract ]

Our experiences in the CASMI metabolite identification challenge.

Full waveform forward seismic modeling of geologically complex environment: Comparison of simulated and field seismic data
Suvi Heinonen, Markus Heinonen and Emilia Koivisto
European Geosciences Union (EGU) General Assembly, 2012
[ abstract ]

7. Metabolite identification and fingerprint prediction via machine learning
Markus Heinonen, Huibin Shen, Nicola Zamboni, Juho Rousu
Bioinformatics, 28(18):2333-41, 2012
[ abstract | preprint PDF ]

First application of machine learning to identify metabolites based on MS/MS data. We use probability product kernel over mass spectral features to learn a mapping between mass spectrum and binary structural properties of the unknown metabolite. We show that the properties can be used to query the unknown structure from e.g. PubChem.

6. Efficient path kernels for reaction function prediction
Markus Heinonen, Niko Välimäki, Veli Mäkinen, Juho Rousu
International Conference on Bioinformatics Models, Methods and Algorithms [BIOINFORMATICS], pages 202-207, 2012
[ abstract | preprint PDF ]

We introduce first feasible path-based graph kernel. The main contribution is to apply a compressed string index to store millions of paths efficiently. We utilize the path kernel to predict chemical reaction function (EC class) over reaction graphs.

5. Computing atom mappings for biochemical reactions without subgraphs isomorphism
Markus Heinonen, Sampsa Lappalainen, Taneli Mielikäinen, Juho Rousu
Journal of Computational Biology, 18(1):43-58, 2011
[ abstract | preprint PDF | KEGG 01/2009 atommappings | bin + src ]

We study the problem of mapping the atoms between reactants and products in a chemical reaction. We introduce the first definition of optimality of such mappings through graph edit distance. An A* algorithm is applied to compute the optimal mappings of KEGG reactions. We also introduce atom level descriptors through a message passing algorithm.

4. Structured output prediction of anti-cancer drug activity
Hongyu Su, Markus Heinonen, Juho Rousu
Pattern Recognition in Bioinformatics, PRIB 2010
[ abstract | PDF ]

We utilize MMCRF for structured output prediction on small molecules for effectiveness against 59 cancer cell lines. Structured prediction outperforms individual SVM's clearly. However, the structure of the outputs seems to have little effect on performance.

3. Multilabel Classification of Drug-like Molecules via Max-margin Conditional Random Fields
Hongyu Su, Markus Heinonen, Juho Rousu
Probabilistic Graphical Models, PGM 2010
[ PDF ]

2. FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data
Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Juha Kokkonen, Jari Kiuru, Raimo Ketola, Juho Rousu
Rapid Communications in Mass Spectrometry, 22(19):3043-3052, 2008
[ abstract | PDF ]

We introduce software for identifying product ions from MS/MS data. The method outperforms rule-based methods in our dataset of amino acids and sugarphosphates.

1. Ab initio prediction of molecular fragments from tandem mass spectrometry data
Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Esa Pitkänen, Juha Kokkonen, Juho Rousu
German Conference on Bioinformatics, 83:40-53, GCB 2006
[ PDF ]

We present a combinatorial algorithm for searching of plausible fragment structures for product ion peaks, based on a bond energy scoring function. We also introduce a mixed integer linear programming algorithm for choosing an optimal fragmentation tree.