Common wisdom dictates not to track generated files under source-control, which is good practice: they clutter diffs and pull-requests, one needs to ensure they’re always updated whenever changes are made,… Some platforms or build-systems require generated files to be part of distributed packages though, and keeping record of what gets delivered to users is of utmost importance. I recently adopted a release/branching scheme to bridge this gap.
Platforms which require machine-generated files to be part of release packages
include the JavaScript ecosystem, where often packed and/or minified versions
of a set of source files (generated by tools like Gulp or Webpack) are
bundled in a distribution, or build systems like the Autotools toolchain
(which brings us the familiar ./configure
, make
, make install
procedure)
where files like configure
and Makefile.in
are generated from
configure.ac
and Makefile.am
(and other auxiliary files).
The traditional approach to generate release (source) packages for such projects is to generate the files on a developer machine, or in some CI environment, create a tarball, and store this artifact in a safe location: if one loses this file, it may be impossible to ever recreate an exact copy of the original release. This could e.g. be caused by later versions of the build tools to generate slightly different results for the same input, or these tools using ‘external’ values (including time, randomness,…) during generation.
First Attempt
If the content of these release files is so precious, it makes sense to keep them alongside something which is at least as valuable: the source repository of the codebase, which contains the all code and its history. Given this, one can come up with a scheme like the following:
Whilst this repository now contains a tag to a released tree which contains all
generated files, this has a major drawback: it also introduces a subsequent
commit (12c6685
) which explicitly removes all these generated files again,
immediately following the commit in which they were added. Furthermore, this
requires one more commit to bump the version number. This clutters the
repository history.
Second Attempt
To overcome the sequence of Import generated files
and subsequent revert
commits, we can try an alternative approach, stashing the imports away in
a branch:
Whilst this approach doesn’t require any revert of Import generated files
commits, it has another drawback: the relation between tags and branches is
lost. As an example, running a command like git describe --tags
on the
master
branch will never result in any release tags to be part of the output.
Final Approach
Fixing the limitations of the previous approach is trivial: we simply need to
reinstate a connection between the release tag, and the development branch
(master
in this example). How does one create a link between two branches? By
merging, of course! Here’s how this works:
As-is, this wouldn’t yield the desired result: the merge-commit in 49e527e
would result in a tree which contains the generated files (in this specific
case the tree at 49e527e
would be equal to the package-1.0.0
tree,
actually), which is clearly not what we aimed for. Instead, we should run
git merge --no-commit
when merging package-1.0.0
into master
, undo all
the changes made in the release-1.0.0
branch (i.e. remove the generated
files and reset the version number), only then to git commit
the resulting
merge. Also note the tag (package-1.0.0
) gets merged into master
, not the
release-1.0.0
branch (which makes only a difference when the tag is annotated,
of course).
Aside: you may have noticed the schema above no longer contains a Set version
number to 1.0.1-pre
commit: this is no longer required, because one may opt to
bump the version number as part of the merge commit (49e527e
). Some may
object against this approach and still keep a separate commit to increase the
value. Others may use some specific number for master
versions. All of these
have pros and cons, pick one and be consistent.
Whilst the approach described above may seem laborious, note it’s fairly easy
to automate the workflow. Also, when using Autotools, it’s now possible to
release the output of git archive
of a release tag, instead of relying on
make dist
. Validating whether make distcheck
passes, and asserting the
content of a resulting distribution package resembles an archive generated by
Git, e.g. as part of a CI pipeline, is of course good practice!
Finally, this approach interacts nicely with the GitWaterFlow branching model we presented at RELENG’16, but more about that later!