FGCI Tech : Software installations with CI build system

Contents

  • I’ll try to explain our build system from the perspective of the problems it tries to solve
  • Goal: automate boring work as much as possible without compromising quality
  • Current system is ~5th iteration. It has been in use for about 2 years.

Step 1: Building software

  • We use spack for compiled software (https://spack.io)
  • We use miniconda + mamba when we create anaconda environments for Python use
  • We use singularity to build containers for specialized software

Task: Build software with Spack

  • Install OpenMPI: spack install openmpi@4.0.5

Problem:

  • Spack finds dependencies dynamically = different dependency resolutions
  • Packages have multiple variants = different routes lead to lots of different installations

Solution:

  • Use Spack’s site configs to specify default versions / variants

Task: Build for specific architecture with specific default compilers

  • spack install openmpi@4.0.5 %gcc@8.4.0 arch=linux-centos7-haswell

Problem:

  • Sometimes Spack does not propagate architecture optimization flags to builds

Solution:

  • Write default compiler options to ~/.spack/linux/compilers.yaml

Task: Repeat 100x

Problem:

  • Lots of stuff to write / remember
  • Configuration is not versioned

Solution

  • Python code that creates commands from minimal yaml file, runs them, does deployment with rsync (https://github.com/AaltoSciComp/science-build-rules)
  • Put spack site configs and build configs to a git repository (https://github.com/AaltoSciComp/science-build-configs)

Step 2: Do it automatically

  • Consistent builds are step in the right direction, but not enough
  • Move towards CI

Task: Ownership troubles

  • If admins run the commands, there will be problems

Problem:

  • Files owned by different users = builds often fail
  • Settings that admins have set might affect the build

Solution:

  • Create a user that runs the buildrules (for us, triton-ci)

Task: Builds can be heavy

Problem:

  • Builds can create huge amounts of temporary files and take a long time

Solution:

  • Run builds on a separate machine (for us a Dell workstation with few SSDs + HDD for end products)

Task: Build with desired OS

Problem:

  • Build environment should be similar to the system where software will be run
  • System libraries + mountpoints should be similar

Solution:

  • Run builds in docker containers (we build for our workstations (Ubuntu 20.04) + cluster (CentOS 7.9))

Task: Do builds automatically

Problem:

  • Builds should happen automatically after configurations have been updated in git

Solution:

  • Use buildbot (https://buildbot.net) to run the builds in containers

Task: Set up everything together

Problem:

  • The CI system + builders should be set up from configurations

Solution:

  • science-build-rules has ci-builder that sets up the builder
  • End product is a folder with docker-compose that one can use to bring the whole system up

Demo

  • Quick demo on how we install packages

Current problems

Problem:

  • Deployment for CI is not good.
  • Updating spack or building from upstream can result in “build avalanches” when some package gets a new version / variant.

Solution:

  • We’re switching towards ansible so that we can set all of the requirements as well.
  • We’re trying to get a two step strategy:
    • one slower moving build that would build base compilers etc.
    • one faster that would build end products using the other software

Current problems

Problem:

  • science-build-rules pretty scripty at times and easier configs can make adopting it actually more complicated.
  • The build system sometimes moves fast so that documentation is not up to date.

Solution:

  • The code should be streamlined so that it is easier to read + adopt.
  • We’re increasing the number of people involved in the project so that we have to document it better.
// reveal.js plugins