The Why and How of Contributing to Open Source
Why you should contribute to open source and the basics of contributing.
Contributing to open source is about much more than a warm feeling and can in fact have major economic benefits for companies, in avoiding to have to maintain special in-house versions of software and being locked out from using the latest and greatest features in public releases.
Open source provides many benefits beyond not having to pay software licensing fees and a great deal of value comes from the fact that you have the freedom to study the source code and modify and distribute software. For example, if something does not quite meet your needs or there is a bug, you have the freedom to do something about this — which at the very least might be to contract someone other than the original developers, if you’re not equipped to carry out the work yourself.
However, you might reasonably ask, “but why should I contribute back improvements that I have made?” In a business environment, it may appear beneficial not to, since this might seem to give you a competitive edge. Well, if everyone took the same view, open source probably wouldn’t work — but this can also be counterproductive even from a purely selfish perspective; once you are using your own special fork of the software, you no longer benefit from improvements that are made to the main branch of development. Of course, you could decide to backport these updates, but this means ongoing engineering work. So in short, you either take on maintenance of the software and all that entails, else you seek to offload this liability by getting your changes merged upstream.
As we can see, aside from the warm feeling you might get from contributing to open source, there is also a real economic motivation; would you like to maintain someone else’s code indefinitely, or would you rather be free to focus on primary the task at hand? In discussion with management and when building the case for contributing, making the implications clear can really help.
Finally, in some cases, there may also be a legal obligation to make available changes that you made to open source software. Where the software employs copyleft licensing, the obligation is usually triggered at the point of providing copies to third parties, but this can also be triggered when using the software to provide a service, as with the AGPL. This doesn’t mean that you would be forced to contribute your changes to the upstream project and you could just host a ZIP download on your own website, but if you have to publish your modifications, why not go the extra mile?
Due care must be taken whenever there is any uncertainty concerning license obligations. Corporate legal counsel are usually happy to advise on such matters and company policy may even be that their approval must be sought. Meanwhile, individuals and smaller organisations can take advantage of the various resources freely provided in support of license compliance — and as part of wider open source governance efforts — such as those from the OpenChain Project.
It’s also important to note that employment contracts may have restrictions with implications for staff contributing to open source projects. Typically this might be that permission must first be sought, a process followed and the appropriate sign-off provided, so as to ensure that issues do not arise from company proprietary intellectual property (IP) inadvertently “leaking” out.
These things may sound scary, but in reality, they are usually far from it and just a question of process, where you describe precisely what it is that you want to do, with the appropriate due diligence being carried out and then approval provided following this.
So now we know some of the motivations for contributing to open source and appreciate that management sign-off off may be required, how does contribution actually work in practice?
Each open source project or community is likely to have its own guidelines and/or process. This could include things such as coding standards, signing-off contributions (to indicate that you have the right to make them) and the review process which will subsequently take place. Contributions will also generally have to use the same licence as the project itself, otherwise it can get very messy having a mixture of licences in use and some are incompatible with others.
Some projects require developers to sign a contributor licensing agreement (CLA) before any contributions will be accepted. CLAs offer certain legal benefits to the project. This is also something that would very likely need to be run by an employer before signing.
Assuming we have the requisite permission, our contribution adheres to project standards and any necessary agreements are in place, what next? Well, again this will depend upon the project and some will take contributions in the form of patches sent via a mailing list. Others might want a pull request (PR) to be submitted against a GitHub repository; this is where we fork the main project repository, make changes and then submit a request for our modifications to be merged.
Changes may not immediately be merged and often there will be some initial feedback or discussion, where the main project developers might request further changes before merging. It’s also increasingly common for continuous integration (CI) tools to be set up, whereby some checks are automatically run and notification provided if the build fails or there is some other quality issue.
Let’s take a look at a simple example, where DesignSpark interns added support for new Pmods to the DesignSpark.Pmod Python library. We can find the original GitHub repository for this at:
At this point, it should be noted that it’s pretty common for git branches to be used in development, where the default master branch might be reasonably stable, but then you have a develop branch and/or feature branches, which are in a greater state of flux and may or may not have a working build at any point in time. There are other strategies also. However, the project in question had just one person committing updates and only a single (master) branch.
Near the top-right of the GitHub page, we see a button that allows us to fork a copy of the repository to under our own account.
Above we can see the fork created by the interns, which is located underneath the account DSInterns. Note also how we can see this was forked from the DesignSparkrs owned repository.
If we click on the Branch button we can see that it has many branches. What’s been done is to split up the changes into lots of smaller changes, which add support for a new Pmod, an example or documentation update, and each lives on its own branch, labelled patch-1 and patch-2 etc.
We can see on the upstream repository owned by DesignSparkrs that there were 15 pull requests outstanding. Clicking on the appropriate tab will provide us with a list. We can then select an individual pull request to get a few more details. We could also select to merge the changes into master now, or perhaps comment and discuss the updates first.
Drilling further down we can click on the Commits tab and then select a commit — here we just had one — and view the changes made. In this case, a single new file of 209 lines was added.
At this point, a project maintainer might review PR each in turn, provide feedback and possibly request that further changes are made before merging. However, in this case, a quick review was carried out, nothing immediately stood out and it was therefore decided to:
- Create a feature branch called new-pmods
- Select to merge all the PRs onto this new branch instead of master
- Test the updates as a whole and once this is completed, merge new-pmods → master
If when it comes to testing, further changes are required, these will either be made directly, else a request sent to the interns to make the changes. Following which a new PR would be made against the new-pmods branch. Once we’re happy with this branch, it could then finally be merged onto master.
It should be noted that the approach outlined above is somewhat atypical and a contribution would not usually be accepted and merged until the maintainers are happy with this, since at that point a contributor would, quite reasonably, assume that their part is done. However, this situation is a little different and there is a separate channel of communication between the maintainer and contributors.
Although a very simple example, this does serve to give an idea of the sort of workflow involved.
Maintaining your own internal forks of open source software is inefficient and generally a bad idea. Where changes implement a bug fix or enhancement that others will typically benefit from, it usually makes sense to try and get these merged upstream. Of course, there will be times where changes are needed to support unusual circumstances or some highly niche use case, which the upstream project maintainers may, quite rightly, not be willing to accept contributions for.
Approval may be required from an employer before you can contribute to an open source project, while projects will have their own governance processes that determine how contributions are handled. However, neither of these things may be particularly onerous and in both cases, there are usually people happy to help.