Measuring Community Participation in Open Source Projects: Data from Drupal.org

  • 12 minute read

Measuring Community Participation in Open Source Projects: Data from Drupal.org

A defining aspect of open source software such as Drupal is the collaboration among multiple individuals and organizations on the same software.

So how do you measure the activity of contributors who are working on that software?

For Drupal-based projects, the issue queue and project usage statistics are two good sources of data for measuring participation.

This blog post discusses some of the data we consider within the Acquia Distributions team when tracking the community health of our Drupal Commons and Conference Organizing (COD) distributions, and expands on the conversation from the DrupalCon Denver panel discussion about Drupal distributions.

What's a distribution?
If you're new to Drupal, you might be wondering what a distribution is.
A distribution of Drupal is a piece of software that’s like a website in a box - As soon as you install it, it’s ready to fulfill a particular purpose such as organizing a conference or providing a community collaboration platform.

Let’s start by looking at the COD project usage data, as reported by the Drupal.org usage tracker.

Examining Project Usage Statistics

COD Project usage

COD usage has shown a steady increase in usage since the first Drupal 6-compatible release in 2010 all the way through early 2012 -- a full a year after the release of Drupal 7 core. This suggests that site-builders still recognize COD as the go-to solution for event sites. However, the leveling off in usage in early 2012 suggests that demand for a Drupal 7 version of COD is increasing. Luckily, thanks to the help of COD sprinters, efforts to update COD to Drupal 7 are well under way.

One thing to note about COD’s usage statistics is that it’s common for event sites to be turned into static HTML archives after the event has passed. As a result, there may be a greater number of COD sites that are still online but are no longer reporting into the Drupal.org project usage system.

Next, let’s look at the usage for Drupal Commons.

Drupal Commons Usage statistics

It’s clear from the chart above that Commons has a greater number of site installs than COD, but fairly static growth. This relatively static usage reflects saturation in the demand for Drupal 6 websites and a high level of interest in building new websites on Drupal 7 -- particularly for sites without a specialized use case such as event management.

There are now more sites built on Drupal 7 core than on Drupal 6, and this leveling off in Drupal 6 demand is visible across the Drupal core and contributed space, affecting the most popular projects like Views, where Drupal 6 adoption is no longer increasing and was recently surpassed by Drupal 7 usage, or Panels, where Drupal 7 usage surpassed Drupal 6 usage in March.

Efforts are well under way in the Commons issue queue and group on Groups.Drupal.org to designing and developing the Drupal 7 version of Commons, and I expect that overall Commons usage will increase as a result of a Drupal 7 version being available.

Note on Commons usage statistics:
You might be wondering with the above graph: Why all the different data points about Commons project usage? The answer is that up until August 2011, Commons was developed on GitHub and reported its project usage against the now deprecated Drupal.org, “Commons release” project. By comparing the usage statistics from Commons release with the data from the official Drupal.org Commons project, we can get a sense of the total number of Commons sites that are reporting usage, and also see that the number of Drupal.org-developed 2.x Commons sites has exceeded the number of 2.x sites from the GitHub repository. Note that the Commons project started reporting usage several months after it was first released. To avoid portraying that change as a skyrocketing increase in Commons usage, we’ve displayed data for Commons_release starting around the same time that Commons 2.x was released.

Moving Commons Development to Drupal.org

(Spoiler: Participation increased significantly)

Drupal Commons issue queue participation

Last Fall I wrote about the plan to move Commons development from GitHub to Drupal.org in order to increase transparency and community participation in Commons.

Following the move to the Drupal.org issue tracker and Git repository, Commons saw a significant increase in community participation by several metrics. The graph above shows three key metrics about participation in the Commons issue queue:

- # of people commenting
- # of people filing new issues
- # of non-Acquia attachments

Certainly all three of these are important indicators of the community health of an open source project: It’s great to have a large number of people participating overall with a lot of discussion (new issues and issue comments). Particularly interesting is the # of non-Acquia attachments in the issue queue. We’ve subtracted the patches filed by users whose work was directly sponsored by Acquia, so that this number generally corresponds to the number of patches from folks outside of Acquia.

As this graph clearly shows, contribution of code to the Commons project from outside of Acquia increased dramatically after moving development to Drupal.org.

Following the first quarter 2012 releases of Commons, we can see a drop-off in the number of patches being filed and a trend towards decreasing comment activity. A possible interpretation for that decrease in activity is that the Commons 2.4 and later releases reduced the number of bugs, resulting in a decreased need for bug fix patches. Still, the number of comments and unique commenters during this time period remains at a reasonably high level.

COD Issue activity

Spoiler: Hooray, sprinters!

COD issue queue statistics

Looking at the historical development activity for COD, the largest spike in activity is around early 2011, which was when COD 6.x-Alpha3 was released, and then again during July/August of 2011, when COD 6.x-Beta2 was released.

More recently, there are two clear spikes in issue queue activity in January and March of 2012. The increased number of commenters, people filing new issues and patches reflects the SandCamp and DrupalCon Denver sprints.

Thank you, sprinters for your participation there!

Sidenote: The COD presence at DrupalCon Denver was truly amazing, with around 15 participants in the code sprint and and about 30 participants at the COD BoF. Those are particularly impressive numbers when you consider that 30 participants is 10% of the total reported COD user base. That’s a really active user base!

My hope is that by continuing on a trend towards increased transparency and community outreach, we can foster a similar level participation and enthusiasm for Drupal Commons. Already as part of the Drupal 7 planning efforts we’ve seen a lot of new folks getting involved in the Commons queue and IRC channel, which is great! We’ll be able to measure that growth in participation when we re-run these queries in a few months.

Apps and Modularity: Increasing the base of potential contributors for a project

UC_Signup and COD project usage comparison
As distributions become more popular and developers start building apps and Feature modules for their distributions, interoperability of those components becomes a topic of greater interest. One conclusion we can draw from COD is that it’s helpful for a distribution to have as many of the components living as standalone projects as possible.

For example, in COD, the UC_Signup module, which powers the paid event registration workflow, lives as a separate project.The idea of building discrete components that are responsible for specific tasks makes sense for technical reasons, and is certainly not new to software development. However with Open Source software, there’s an additional benefit to following this pattern: By building the components of your project in functionally discrete ways and splitting them into their own Drupal.org projects, each component becomes potentially applicable to a wider variety of websites, and as a result, a greater pool of potential contributors than if they were specific only to one distribution.

The chart above shows usage statistics for COD and the UC_Signup module. UC_Signup has approximately twice the number of reported installs as there are COD sites. UC_Signup was released before COD, and that is certainly one explanation for why it has a greater number of site installs. More importantly, because UC_Signup was architected to function outside of the Conference Organizing Distribution and lives as its own Drupal.org project, it can be installed on a wider variety of sites. An example of this would be a site that needs a paid registration workflow but doesn’t need the range of other features that COD provides.

Similarly, Ubercart has over 10,000 site installs -- and therefore a much larger base of contributors -- and patches to Ubercart generally benefit both UC_Signup and COD.

This allows COD development to be focused on features that make COD unique, such as
the BoF scheduler, schedule grid with personalized schedule, session submission workflow, and moderation and scheduling workflow while still gaining the best-of-breed signup and commerce features provided by other modules. In the future, we may see even more of these components living as discrete projects.

One potential challenge to this approach is that it can increase the complexity of issue queue use for a distribution. If I have a bug in COD, should I search the COD issue queue or the UC_Signup queue? This can be mitigated somewhat with the use of issue queue tags and potentially issues in the main distribution that are easily searchable and refer to corresponding issues in the relevant project. COD and Drupal Commons currently use the codlove and gdolove tags.

One idea is to make it possible to do a single issue searched filtered down to the contributed components in a given distribution.

On a related note, a feature proposed to the Bot module would make it possible to record statistics about participation in IRC channels.

How do you measure?

Of course, we’re interested in learning about how other open source projects measure community participation.

What statistics do you look at when measuring participation in your projects? Do you see ways we can improve this data?