Common wisdom dictates not to track generated files under source-control, which is good practice: they clutter diffs and pull-requests, one needs to ensure they’re always updated whenever changes are made,… Some platforms or build-systems require generated files to be part of distributed packages though, and keeping record of what gets delivered to users is of utmost importance. I recently adopted a release/branching scheme to bridge this gap.

Platforms which require machine-generated files to be part of release packages include the JavaScript ecosystem, where often packed and/or minified versions of a set of source files (generated by tools like Gulp or Webpack) are bundled in a distribution, or build systems like the Autotools toolchain (which brings us the familiar ./configure, make, make install procedure) where files like configure and Makefile.in are generated from configure.ac and Makefile.am (and other auxiliary files).

The traditional approach to generate release (source) packages for such projects is to generate the files on a developer machine, or in some CI environment, create a tarball, and store this artifact in a safe location: if one loses this file, it may be impossible to ever recreate an exact copy of the original release. This could e.g. be caused by later versions of the build tools to generate slightly different results for the same input, or these tools using ‘external’ values (including time, randomness,…) during generation.

First Attempt

If the content of these release files is so precious, it makes sense to keep them alongside something which is at least as valuable: the source repository of the codebase, which contains the all code and its history. Given this, one can come up with a scheme like the following:

Whilst this repository now contains a tag to a released tree which contains all generated files, this has a major drawback: it also introduces a subsequent commit (12c6685) which explicitly removes all these generated files again, immediately following the commit in which they were added. Furthermore, this requires one more commit to bump the version number. This clutters the repository history.

Second Attempt

To overcome the sequence of Import generated files and subsequent revert commits, we can try an alternative approach, stashing the imports away in a branch:

Whilst this approach doesn’t require any revert of Import generated files commits, it has another drawback: the relation between tags and branches is lost. As an example, running a command like git describe --tags on the master branch will never result in any release tags to be part of the output.

Final Approach

Fixing the limitations of the previous approach is trivial: we simply need to reinstate a connection between the release tag, and the development branch (master in this example). How does one create a link between two branches? By merging, of course! Here’s how this works:

As-is, this wouldn’t yield the desired result: the merge-commit in 49e527e would result in a tree which contains the generated files (in this specific case the tree at 49e527e would be equal to the package-1.0.0 tree, actually), which is clearly not what we aimed for. Instead, we should run git merge --no-commit when merging package-1.0.0 into master, undo all the changes made in the release-1.0.0 branch (i.e. remove the generated files and reset the version number), only then to git commit the resulting merge. Also note the tag (package-1.0.0) gets merged into master, not the release-1.0.0 branch (which makes only a difference when the tag is annotated, of course).

Aside: you may have noticed the schema above no longer contains a Set version number to 1.0.1-pre commit: this is no longer required, because one may opt to bump the version number as part of the merge commit (49e527e). Some may object against this approach and still keep a separate commit to increase the value. Others may use some specific number for master versions. All of these have pros and cons, pick one and be consistent.

Whilst the approach described above may seem laborious, note it’s fairly easy to automate the workflow. Also, when using Autotools, it’s now possible to release the output of git archive of a release tag, instead of relying on make dist. Validating whether make distcheck passes, and asserting the content of a resulting distribution package resembles an archive generated by Git, e.g. as part of a CI pipeline, is of course good practice!

Finally, this approach interacts nicely with the GitWaterFlow branching model we presented at RELENG’16, but more about that later!