MANTIS-4D and Physical AI hackathon
==================

:Date: 2026-03-01
:Tags: hackathon, machine learning, so-101
:download:`MANTIS-4D.pdf <../_static/MANTIS-4Dv1.2.1.pdf>`
Developed a depth-based policy model for the LeRobot so-101.

Key points
----------

- We challenged ourself with a hardware hackathon: something completely new and distant for me atleast!
- I wanted to develop a similar pose estimation model as presented in https://arxiv.org/html/2501.08329v1.
- In the end we desided to add the depth-estimates from the raw feed as features to the network which substantially reduced training loss and train/val divergence issues with our first setup.
- With the help of AI: *sonnet-4.6+GPT-5.3-codex* We collected our findings to a pre-print. 

Future
-----
It was really cool to play around with the so101-arm and follower-leader setup! LeRobot has matured as an ecosystem and many teams were able to imitation learn highly complex tasks with just under 20 hand recorded episodes.

While our work focused on computer-vision side of things, I believe that trialing out new innovative ways to combine predictions will lead to future improvements! I would love to continue this little detour further to validate on different setups and robustness.

Links
-----

- The paper that led to this idea: https://arxiv.org/html/2501.08329v1
- download our pre-print: :download:`MANTIS-4D.pdf <../_static/MANTIS-4Dv1.2.1.pdf>`
- Repo: https://github.com/PlayerPlanet/MANTIS-4D
- Datasets: https://huggingface.co/datasets/k-kiirikki/top_wrist_so101_with_depth
- Models: https://github.com/PlayerPlanet/MANTIS-4D/releases