Advice for machine learning PhD students

Markus Heinonen, Aalto University, 2023

Accompanying slideset is at users.aalto.fi/~heinom10/research-tips.pptx

Assorted wisdom

The four pillars. Computational sciences are (1) reading, (2) writing, (3) communicating, and (4) coding; in this order of importance. Maximize your time spent on (1,2,3) to minimize time spent on (4). Your equations will live, your code will be forgotten. Machine learning is a science, not a benchmark competition.
Yet, become a coder wiz. Make clean code and package it with good tutorials for the community in github. Use software to automate experiments, runs, results. Generate figures and results automatically. Ask your group members for their best practises on coding. Learn to distribute your code to a GPU cluster, and start re-runs in minutes.
Read. Topic papers, off-topic papers, landmark papers, textbooks. Learn by reading bad papers (ask your supervisor for review work). You need to become world’s top expert in your phd topic during the four years; plan accordingly and read continuously. Do not stop reading when you hit problems with your experiments, instead start reading even more: all ML problems have already been solved by someone in some paper (almost surely).
Write. Formalize your ideas and models with precise math in latex. Writing forces you to conceptualise and clarify your thoughts into hypotheses and claims. Aim at publication style. If you hit a problem in your experiments, spend more time concretizing your assumptions, hypotheses, models, background, context, motivation, related works, etc.
Communicate. Prepare presentation slides for every meeting, no matter how casual: Your collegues and supervisors will appreciate this. Aim at conference presentation quality. Practising presentations from the very beginning is helpful, since it forces to conceptualise your work. Remember that your supervisors work on 10 other projects and need a context switch for every meeting. Include a setting slide, and include a slide on why are we meeting. Minimize text in slides. What are the main points you need to convey? Great slides usually have 1 picture and around 10 words per slide (not 100).
Conceptualise. Your math and illustrations reflect your conceptualisation of the model. If you have messy equations or figures, your thinking is still messy.
Visualise to understand. Plot or draw everything about your model: the loss, the optimisation, the network, the activations, the weights, the data, the likelihood, the gradients, the layers, etc. You need to understand your model inside-out.
Run experiments slowly. Do not rush into training the big network right away, instead take things slowly. You first need to study and understand the data, and then the problem (spend time on this). Identify the problem you want to solve in the baselines, and make sure it exists. Formulate hypotheses on how to improve. Run a sequence of more and more complex models. Start from linear regression, and build your way up. Try to change one thing at a time, and make sure you can quantify whether you improve or not. If you get stuck, avoid temptation to spend more time running experiments. Instead start reading, writing and discussing more to clarify that the problem is true, and solution is correct. Follow Karpathy’s neural network recipe http://karpathy.github.io/2019/04/25/recipe/.
Don’t chase SOTA. Benchmark tables are not scientifically interesting: every year new methods crop up and errors go down brrrr. Instead aim at understanding the insight behind the contribution, or finding qualitative improvements, gaps in literature, or problems behind SOTA models. These often come from understanding related works and your model more in-depth.
Break your model. Stress-test your model until it breaks. What are its limits? This gives you direct avenue to making a second paper. Look at the “XAI question bank” https://arxiv.org/abs/2001.02478 (fig1).
Solutions are cheap, problems are gold. Instead of finding solutions, focus on finding problems that are true. Find open problems by looking at what state-of-the-art can’t do, does poorly, or ignores. Focus your time on understanding the problem, and the solution will emerge.
Don’t hide from your supervisors. We love to talk about science, we love to be challenged and proven wrong, we love to hear about your ideas and progress. If you spend a week reading, don’t say that “I have no new results”; you have made lots of progress by learning new things. Actively ask for advice and feedback from your supervisor: meetings where only you talk benefit no one.
Be honest. Tell your supervisor when you don’t understand something or when you are struggling. Don’t nod if you didn’t understand, ask for clarification. Implying otherwise makes it difficult to work with you. You don’t need to know everything. Don’t imply that you are doing fine when you aren’t. Project meetings are not an exam: you don’t need to pass. Instead, you should give transparent situation report such that people around you can help you. Don’t cancel meetings. Make meetings notes during the meeting, and send them to everyone immediately after the meeting.
Network and socialise. Attend a top conference in your field (NeurIPS/ICML/ICLR/etc) every year, even if you have no paper. First time you will know no one, but next time you will. Workshop papers and posters are great way to get your foot in to the community. Attend journal clubs: establish one if none exists in your school. Make sure you have online presence (website, blog or github) so that collegues and bigshots can find you. If you have no papers yet, having a technical blog is a good way to show your expertise.
Follow the domain. Be aware of all highlight papers in top conferences each year to see where the field is moving. Follow what your competitors are publishing. Use Google Scholar to follow seminal papers and their forward citations. Follow machine learning social media.
Attend a summer school. Go to one on your first year.
Love what you do. Move towards projects that interest you. Finish your current project regardless. Students who deliver are more likely to get the cool projects in future. If your project is not progressing, take initiative. This is your phd thesis and career, you need to drive it forward.
If you are stuck. Slow down, rethink what you are doing, and discuss with your collegues (you will notice that people love to give advice!): what problem are you solving and is it the right problem? What is your goal again? If the problem is right, is the solution?
Organize your time. Make sure to spend ~20% of your time reading and another ~20% writing your ideas down. Do not slip from this. Keep (i) a research diary and maintain (ii) a literature review and (iii) technical report on your models. Share these as a non-changing single-click url for your supervisors.
Queue your work. Research is a sequence of small tasks. Treat it as FIFO queue: have a single active task at a time and close and conclude all previous tasks. Do not multitask. Do not leave unfinished tasks. If your backlog is growing, stop, and resolve them first.
Study math. You want to understand linear algebra and vector calculus, probability and statistics and Bayes, differential and integral calculus. You will also benefit from measure theory, functional analysis, topology, differential geometry and complex analysis.
The story of a research project. Science is built on top of earlier work, and all science is incremental, even the ones that say they are not. The earlier state-of-the-art is the bedrock, and your contribution is the house you build on top of it. The steps are
1. Understand the research domain on high level (don’t get scooped)
2. Be able to reproduce earlier, state-of-the-art results (find the bedrock)
3. Demonstrate a significant short-coming in state-of-the-art (find a crack)
4. Find or adapt a known solution to the problem type (fill the crack)
5. Verify and illustrate the solution
Understand the foundations. Courses, Wikipedia or blogs do not give deep understanding. Read textbooks cover-to-cover: your phd will become easier. You want to understand mathematical foundations (Deisenroth), statistical learning (Gelman, Tibshirani), classic machine learning (Bishop, Murphy) and deep learning (Goodfellow). Delving deeper into advanced topics is even better. If you only read one book, choose Murphy. Great books are:
- On math: Deisenroth et al “Mathematics for machine learning” and "Matrix Cookbook”
- On probabilistic learning: Murphy “Probabilistic machine learning” series, Bishop “Pattern recognition and machine learning”
- On learning theory: Mohri et al “Foundations of machine learning”, Swartz et al “Understanding Machine Learning"
- On statistical learning: Tibshirani et al “Elements of Statistical Learning"
- On Bayesian modelling: Gelman et al “Bayesian Data Analysis"
- On deep learning: Goodfellow et al “Deep learning"
- On generative models: Tomczak “Deep Generative Models”
- On information theory: MacKay “Information theory, Inference and Learning Algorithms"
Listen to big-shots
- Andrej Karpathy "A Survival Guide to a PhD" karpathy.github.io/2016/09/07/phd/
- Bill Freeman “How to do research" people.csail.mit.edu/billf/publications/How_To_Do_Research.pdf
- Eamonn Keogh “How to do good research" www.cs.ucr.edu/~eamonn/Keogh_SIGKDD09_tutorial.pdf
- https://agents.inf.ed.ac.uk/phd-handbook/