Final Report of Google Summer Of Code 2019 with CROSS

Jayjeet Chakraborty
getpopper
Published in
9 min readAug 19, 2019

--

In this article, I will be talking about my contributions to the Popper 2.x project during the Google Summer Of Code 2019 program. Popper is an experimentation protocol for organizing an academic article’s artifacts following a DevOps approach. The Popper 1.x CLI tool was migrated to specify pipelines in the form of workflows and actions which marked the beginning of the Popper 2.0 project. My work included extending and adding various features to the Popper 2.x CLI tool. I have documented all my work in the form of pull requests that got merged in chronological order right from the proposal period to the end of the coding period.

Proposal and community bonding period

  • March 5, 2019 & March 7, 2019— #519

I was trying to build the Popper codebase and that's when I found a small bug which was a missing comma in the setup.py file. I fixed it and sent a pull request and it got merged. Hooray! I made my first contribution to the Popper codebase. I also updated the .gitignore of the repository to include the build/ and dist/ folder and got it merged. By now, I was starting to get familiar with the codebase.

  • March 15, 2019 — #529

This feature included implementing validation rules for the HCL based workflow syntax according to the Github defined Workflow configuration options defined here. By now, I was pretty comfortable with the project's codebase.

Implemented the popper scaffold subcommand, that would help new popper users to get started with a basic workflow. Also, I fixed a bug in the workflow validation feature that got merged the previous day. I got to learn about click, a great library for making CLI tools.

  • March 23, 2019 — #548

Until up to this time, all git related operations were implemented by executing raw git commands. This was definitely not a good practice. So, through this PR, I migrated the codebase to use the GitPython library for all git related operations.

In a Github workflow, there are always some actions that can be run in parallel. We call them stages. Through this PR, I implemented the --parallel flag for popper run which when passed, run the actions in a stage in parallel.

Another small change was done that separated cloned action repositories based on the git repository service they belonged to like github.com, gitlab.com, bitbucket.org, etc.

  • March 27, 2019 — #552; March 28, 2019 — #569

Till now, we only supported docker runtime in Popper for executing Github actions. But we soon realized that it would have been great if we could execute workflows in HPC environment. But since Docker isn't supported in HPC environments, we decided to add Singularity runtime support besides Docker as it can run completely rootless.

These 2 PR’s achieves an initial implementation of Singularity runtime by which actions in workflow could run in Singularity containers. I also added a --runtime flag later to popper run, to specify which container runtime to use in different environments.

Through these 3 PR’s I implemented a new popper add command which enabled users to fetch remote workflows easily from within the Popper tool. Also, the popper ci command was upgraded to install Popper in CI environments through pip, which was earlier done by adding the binary to $PATH.

I worked on a bunch of small bugs and modifications. I also modified some of the features that I had previously implemented in Popper. I also worked on improving and maintaining the health of the codebase, since we were in a very fast-paced environment.

  • May 11, 2019 — #639, May 14, 2019 — #645

I added a special value for the uses attribute named sh which simplified the highly frequent process of executing shell commands on the Host machine. Also, we started to support the Singularity runtime installation through the popper ci command by implementing a --install option which takes runtime names as values.

The first month

  • May 28, 2019 — #656

By now, we were having Docker and Singularity as the supported runtimes in Popper. But there was an issue that when somebody ran a workflow in a runtime that was supported by Popper but not installed on the host machine, they would get an ugly error stack trace. Thus I fixed it through this PR, by checking for the presence of required software dependencies on the host machine during the initialization phase.

Implemented the --skip option for popper run . This option enabled users to pass a list of actions to skip while executing a workflow. I took this opportunity to refactor all the parsing and validation logic into a separate module and made parsed workflows immutable. Thus any modification to a workflow like skipping actions, filtering out a specific action would operate on a copy of the original workflow and return us the modified copy.

  • May 30, 2019 — #650

I implemented a --with-dependencies option to popper run, which would enable user’s running only a single action from any workflow to specify whether to run the action with all its dependencies or not.

  • May 30, 2019 — #659

On the very same day, another feature got merged. Through this, I added the ability to blacklist/whitelist which workflows to execute based on keywords found in a Git commit message. Since the different popper run flags are not available in a CI environment, Popper would search for special keywords in the head commit message of the repository. If found, it would execute workflows according to the instructions given in the head commit messages.

For example,

If the head commit message of a repository contain the keyword popper:whitelist[a.workflow, b.workflow] , then Popper would execute only a.workflow and b.workflow from a group of workflows present in the repository. Like this, we also implemented a popper:skip[...] keyword to do the same thing another way round.

  • June 6, 2019 — #664

I added a capability to run Popper even in projects that are not tracked by git. But, this has a few minor downsides as the tool is ideally meant to work on Git tracked repositories or workspaces.

  • June 11, 2019 — #667

With this, I fixed 2 issues #623 and #658. The first one deals with rewriting the scm.parse function completely using re so that we have a clear logic of parsing reference URLs. The later one fixes a bug that caused failures while referencing deeply nested action repositories.

I brought back the search and info commands in Popper 2.x, which would help users developing with Popper search for actions and getting useful information about an action easily from within the tool. We also started to maintain a catalog that would serve as a database for the search command and it's getting richer by the day.

With #680, I also added the ability to generate a brigade pipeline through the help of popper ci command.

I added a new option --on-failure to the popper run command that would enable specifying an action or a workflow to execute when a failure occurs in a workflow. Through the next PR, I added some more search sources to our actions catalog and we keep on doing the same.

I made Popper capable of running workflows even without cloning any action repositories from Github or pulling images from Docker hub when the required action repositories or images are present locally or in the cache. This would, in turn, allow us to run workflows in offline mode. This was implemented by introducing 2 new flags to popper run i-e, --skip-clone and --skip-pull .

Also, some bugs in the popper:whitelist and popper:skip keyword features were fixed.

  • June 25, 2019 — #695

This PR fixed a bug that led Popper to download the dependencies of unreachable actions in a workflow unnecessarily and thus the resource utilization became quite efficient.

The second month

  • July 4, 2019 — #699

Till now, Popper decided which runtime to use based on the composition of the actions or the uses attribute. But we felt that it would be better if instead of using shub:// URLs or Singularity files directly, we could run actions made for running in Docker in Singularity containers. Thus I introduced a new flag --runtime which would accept a value like Docker or Singularity or any other runtime that would be added in the future and run actions in the specified containers. For now, we have the support of running Actions in Docker and Singularity runtimes. Our integration tests were also made capable of running in both Docker and Singularity containers.

With this PR, I also solved a couple of other bugs which include #670, #678.

Till now, we only supported popper:whitelist[..] and popper:skip[...] keywords in commit messages while running Popper in CI mode for whitelisting or skipping workflows. We soon felt that we should extend this to pass any popper run flag in a popper:run[..] keyword so that users can utilize all of the core CLI features even in CI mode. For example, the command popper run --runtime singularity now has its CI equivalent keyword as popper:run[--runtime singularity] .

Along with this, another PR was merged the same day which enabled popper ci to generate CI config files to run workflows in Singularity runtime.

This was the time we decided to unit test our codebase. So, I started off with writing unit tests for the scm.py module. Turns out, that as soon as I started writing tests a lot of small bugs popped out and I fixed them. We also removed the --recursive flag from popper run as we were now supporting recursive mode by default in the CI environment.

  • July 18, 2019 — #718

I fixed the incorrect setting of environment variables in the Host runner implementation and hence refactored some duplicated code along the way.

  • July 23, 2019 — #719

Added unit tests for utils.py module and other small fixes and refactoring. By this time, I got a hang of writing unit tests and it turned out to be quite fun.

The third month

  • July 27, 2019 — #724

We deprecated the popper add command based on feedback from our users, as we felt that simply cloning workflow repositories served the purpose better.

Added unit tests for parser.py module and fixed a bug in using the correct working directory in Singularity runtime. While writing tests for parser.py module, I also refactored the code a lot and brought down its cyclomatic complexity by a huge extent.

Added tests for our final vital module, gha.py with some usual refactoring. Also, got some bugs fixed in Docker runtime, which caused actions to fail when image references inherited from repository names were not in lowercase.

  • August 9, 2019 — #733

In the meantime, we identified a bug that prevented caching of already built Singularity container images. I researched a bit about it and it turned out to be some feature of Singularity itself, though it was not fulfilling our purpose of caching container images completely. So, I added code to put built singularity container images in a special folder /tmp/singularity and reuse them whenever a request for building the same container came. In this way, we also brought down the workflow execution time in Singularity runtime significantly.

  • August 16, 2019 — #734

Through this PR, I made Popper capable of handling simultaneous popper run instances on the same workflow at the same time which was not supported earlier. Popper now uniquely identifies workflows based on a workflow ID (md5 checksum) generated from the uid of the user and the workflow path and thus allocates resources uniquely for each of its running instances. We also changed the base cache path to $HOME/.cache/.popper following the XDG specifications.

  • August 20, 2019 — #735

Until till now, we had support for exit code 0 and other non-zero codes from actions which meant success and failure of action respectively. We were missing out on the exit code 78, which when returned by an action meant that it wants the workflow execution to be terminated, but not fail. In this case, popper terminates all the concurrently running actions and skips all other actions lined up next. I added the support of this through this PR.

If you want to build upon my code or use it, it can be done by simply doing pip install popper==2.4.0. You can also build the application from source by following the instructions given here.

Besides writing code, I also got the chance to write several articles related to my work on Popper which was quite fun for me. I also got to interact and work with amazing people and communities around the globe and that helped me grow a lot as a developer. I learned new and exciting things almost every single day. A huge thanks to my mentor Ivo for being a great support. Thanks to CROSS and GSoC for making my summer so productive and memorable which I would cherish for my life.

--

--