Distributed bug tracking is a topic which had a burst of interest in 2008 and then again in
2010. Unfortunately, since then not that much has come of it and there is a lot of
misunderstanding. Distributed bug tracking is also a relatively new concept so there are
many facets which haven't been fully thought through or which are being re-invented
repeatedly. This is an attempt to collect and explain all the major issues and approaches to
distributed bug tracking seen in software to the current date. The intention is to serve
both as starting point for those looking to use a distributed bug tracker and a summary of
the major issues for those considering to write or just understand distributed bug trackers.
There is also a comparison of existing software and some possible use cases for the reader
interested in using a distributed bug tracker as part or a project.
Before diving in it's important to note that there are actually two definitions of
distributed bug tracking competing for the term. The older, which I'll be discussing in
detail below, is tracking or distributing bug information in a distributed manner much like
you can track and distribute source code using a distributed version control system such as
Git or Mercurial. The second definition is distributing bugs between many more traditional
centralized bug trackers such as Bugzilla or Jira. I won't cover this latter definition
here, but perhaps in a later post as a rise of DVCS-like distributed bug tracking will
drastically increase the need for inter-tracker bug synchronization.
Software
Over the past five or six years there have been several distributed bug trackers written
which have explored various different aspects of the domain. Most of these have issues
ranging from minor through major. Here I've listed all the distributed bug trackers I was
able to find in the course of my research into the topic. In a later section I'll go over a
matrix of their capabilities and designs.
As you can see there is no lack of early projects exploring distributed bug tracking. Later
I'll compare them to each other but first I will discuss the various dimensions and design
decisions which go into a distributed bug tracker and are expressed in the above software.
Design Considerations
There are several aspects of distributed bug tracking which have parallels with traditional
centralized bug tracking, such as which fields a bug should have, and several which are
distinct, such as how bugs are stored relative to branches. This section will discuss only
those shared issues which have direct relevance to distributed bug tracking. Issues such as
bug priority policy will not be discussed as those don't differ between centralized and
distributed bug tracking. Issues unique to distributed bug tracking will also be discussed.
On-Branch, Off-Branch or Out-of-tree
The first issue which comes up when people first ponder distributed bug tracking is where,
with respect to the code, the bug database should be stored. There are three common options.
The first and most popular is to store the bugs next to the source code in a separate
directory in the source VCS. This is attractive because the developers already have that
source available and it's easy for the tracker developer because if there is any VCS support
required it is limited to basic content tracking commands such as add and commit. Using a
VCS also lets the distributed bug tracker developer leverage the existing VCS synchronizing
and merging capabilities. Further it allows bug information to follow the code across
branches.
This latter ability is one of the great possibilities that distributed bug tracking brings
to the table. Large complex projects which have several development and maintenance branches
often have difficult or complex ways in which they track whether a particular branch has the
fixes for a particular bug or not. The best track which commits fix a particular bug and
then leverage the VCS to determine if a particular branch has that change or not. Other
systems use multiple bugs or other manually maintained fields to store such information for
release and maintenance branches, usually development branches are too much work to cover
using these manual systems. In the worst case the source of information is the original
developer being asked to examine the branch to see if a particular fix exists there.
Obviously all the lessor traditional approaches have their issues. However even the best
traditional method depend heavily on the VCS being able to effectively determine if a change
exists on a particular branch across a wide array of obstacles including complex merges,
rebases, double commits and changes passed around as patches, which may be manually
reapplied. This is a difficult proposition and inevitably the coverage of supported cases
will have holes.
On-branch storage also has the advantage of keeping the bug database with the code is that
the bug database can follow the code through source tarballs and packages as they are
distributed and incorporated into distributions. It is also possible, with greater or lessor
merge troubles, to have the bug data follow fixes along in the patches.
The on-branch strategy is not without disadvantages, most of which are trade-offs for the
advantages gained. The aforementioned bug data in patches is one such disadvantage. Since
the bugs are stored beside the code any diffs or patches will, by default, contain change
information related to the bugs as well. This is not always desirable and results in extra
work to clean up patches or ignore bug changes. Similarly having the bug status track the
code through various branches is a useful feature, but brings about the challenge of
producing a summary view across the various release and trunk branches. It is also not
immediately obvious where bugs against a particular release version should be filed or how
to determine which branches have a fix if any have it at all.
Another alternative is to store the bugs inside the VCS in a separate branch. This
approach results in a system which is more similar to the traditional centralized bug
tracking paradigm. Designed this way there is only one source of bugs of which any
particular copy of the repository will have a more or less up to date version.
Off-branch bug storage solves some of the issues related to on-branch storage, namely issues
related to where a bug should be entered, keeping bug data out of patches or diffs and, as
will be discussed later, how to get descriptions of bugs onto the branches where the bugs
are. Similarly off-branch storage has as its disadvantages many of the advantages of
on-branch storage. In particular off-branch storage does nothing to help track the state of
a bug on any particular code branch.
Off-branch storage also suffers a few disadvantages of its own. By storing the bugs away
from the code in a separate branch extra care must be taken to ensure that the bug branch is
propagated. For example, systems such as git don't automatically push and pull branches other
than the current one. This can lead to a project being pushed, to Github say, without the bug
database being included. As we'll see later this is one aspect which may have contributed to
low recommendation scores of some of the existing distributed bug tracking software since it
may be the reason several appear to not dogfood themselves.
Off-branch storage will also have difficulty transferring between different version control
systems. Though it may feel like everybody uses git all the time, it just isn't true.
Unfortunately how branches work differs across VCSes both semantically and with respect to
the interface. This can cause limitations with entities, such as Linux distributions,
integrating the upstream bug repository.
The least favoured storage method is neither on-branch nor off-branch, but out-of-tree.
With out-of-tree the bug database is stored in some other fashion either inside the VCS or
using some other external database. One example of this is Fossil which stores the bugs as
part of its distributed database, but not really in a separate branch at all. Another
example are systems which take advantage of the git-note capabilities. These systems have
the advantage of being clean since they don't have the clutter of bug directories or bug
branches. Unfortunately that is really the only advantage they have. Storage of this form
tends to be tightly integrated with a single VCS and usually even more care must be taken to
ensure that the bug databases are propagated and merged correctly then in the off-branch
case.
One advantage shared between off-branch and out-of-tree is that they hold the possibility of
using custom merge algorithms. If bugs are stored on-branch then they must be merged
alongside the source code and thus, for the most part, must use the standard source control
merging algorithms. This will constrain the file formats of the bug database to forms which
are feasible for basic textual merges to be successful and relatively easy for humans to
merge manually when conflicts arise. Off-branch and out-of-tree, in contrast, hold the
promise of using custom merging algorithms. This is theoretically possible with off-branch
storage, depending on VCS support, and the norm with out-of-tree storage.
File Formats AKA Ease of Merging
Traditional centralized bug tracking has a great freedom in how its data is structured and
represented on disk. It is perfectly acceptable to require specialized tools to read the
data and the data is optimized to be processed by trusted and properly configured server
software, exceptions to this will be performed by prepared system administrators who take
the utmost care. Distributed bug tracking has none of these freedoms.
Distributed bug tracking must operate in a world where the tracker doesn't have full control
over what happens to their data or who has permissions to change it. As we'll come back to
later distributed bug tracking cannot rely on authorization to ensure that only permissible
states are entered, instead the best they can do is verification of change before they are
integrated into the local bug database. As such one important aspect in the chosen file
formats of distributed bug trackers is that they must be difficult to corrupt.
The minority of existing distributed bug trackers have the ability to rely on specialized
merging algorithms. Mostly these are out-of-tree based or based upon specialized databases.
The rest must at least perform acceptably without the benefit of custom merging code. This
is very true of on-branch trackers where the bug changes will pass through the standard code
merging algorithms and mostly true for off-branch trackers where the bug branch will likely
have at least a few hops where the specialized merge tool is not installed.
The two important aspects of distributed bug database file formats are how well they merge
automatically using the standard textual merge tools and, since conflicts are sometimes
unavoidable, how easily they can be resolved by humans. Conflicts are unavoidable in all
cases because some data about a bug, such as whether it is resolved or not, is semantic and
singular. A bug is either declared fixed or not. Consider the case of bug A and a tracking
policy which has three possible bug states: New, Diagnosed and Fixed. Suppose Alice, in her
branch or repository clone, fixes bug A and marks it as fixed in her copy. Suppose
concurrently Bob, in his branch or repository clone, looks at the bug and figures out what's
wrong so he marks it as Diagnosed. If Bob later pulls in Alice's changes he will receive a
textual conflict related to the bug state. If a custom merge algorithm could be used this
wouldn't be an issue since Fixed obviously overrides Diagnosed.
Though the above case could be solved by a custom merge algorithm there are cases where it
is not clear that any algorithm can always make the correct merge. Consider the case of the
customer severity of a bug. Alice may mark a bug as Minor because it only affect two or
three customers. Bob might, however, mark it as Critical because one of those few customers
is the biggest customer the company has. No mere computer could ever have all the relevant
information to always make the correct choice.
With these two aspects in mind there are several different file formats which have seen use
in the software I've found. These can be divided to cover the span of two dimensions. The
first dimension is the format of each file and the second dimension is what is contained
within these files. Out-of-tree storage designs won't be covered here since they tend to
demand custom merging utilities anyways and be based upon more complex databases.
The most common file format appears to be a simple markup. Simple markups rate highly for
ease of human resolution since there isn't a finicky file format to worry about. They tend
to be rather inflexible and difficult to code for however. Most of the formats in this class
are usually too simple to have a name or look much like the INI format.
The second most popular format seems to be a hierarchical markup akin to YAML. This differs from a simple markup in that the format
is more complicated, but also more flexible. While these formats don't rate badly in terms
of human conflict resolution there is a risk of a missing significant character causing
issues.
The least popular appears to be full serialization formats such as JSON or XML. Unless
pretty printed these are nearly impossible to manually resolve. With pretty printing these
serialization formats tend to be merely error prone and tedious. One technique I have seen
is to use JSON with each data element separated from any other via five or six newlines. The
intent here is to reduce the possibility of a merge conflict by removing the other data in
the JSON file from the context of the merge.
The file format chosen is perhaps the greatest determiner of how often automatic merging
will be successful and how much pain the human will have to suffer when automatic merging
fails. From this perspective alone the simple markup seems the best which is possible. Since
they tend to be one statement per line formats and have minimal grammar requirements
automatic merges tend to corrupt these formats the least and they are the easiest,
especially when the lines in the file are in a fixed order and produce nice diffs, for a
human to manually merge.
There are also three major ways to arrange the storage of bugs among a number of files. The
simplest from a file layout point of view is to store the entire bug database in a single
file. This has advantages of efficiency, speed and ease of coding. As a disadvantage every
change to any bug will modify this file thus ensuring that it will have to be merged
constantly. This option is not used it many of the existing distributed bug trackers.
The most popular file layout appears to be one file per bug. This has the advantage of
reducing conflicts since it is less likely that two developers will modify the same bug than
two bugs in the same database. If the tracker restricts itself to singular semantic data
only, such as bug state, then this can work well since any concurrent changes the data would
have to be manually merged in any case. If the tracker supports things like bug comments
then this format is still open to frequent file merges as different people comment on the
same bug at different times. Unfortunately bug comments in a single file will cause frequent
merge conflicts until the number of existing comments becomes sufficiently large. At that
point it is possible to place new comments into the file randomly to give the automatic
merge the best possibility of success. Most bugs to not accumulate more than a dozen
comments however.
The final common layout is to use (almost) immutable objects. In this scenario each issue
has a number of files. All or most files will be immutable. One way to accomplish this is to
put each comment into a separate, immutable file and give each bug one small mutable file
which contains the singular semantic data. Since concurrent comments are common and, in
principle, easily merged automatically the comments would be trouble free. Since singular
data is impossible to automatically merge in all cases the file being mutable gives the
human the full power of their VCS to help them determine the correct semantic resolution. An
alternative is to use fully immutable objects and a log-like structure where newer objects
override older objects. Such a system is capable of always merging automatically, but when
the merges are incorrect, as in the Fixed/Diagnosed example above, the human is left with
minimal tools to determine the correct resolution or even receive any indication that a
conflict occurred which requires their attention.
In allowing the maximum number of successful automatic merges and immediately bringing
semantic conflicts to the user's attention the mostly immutable object method appears to be
the superior method. Successful automatic merges have much less friction than the
alternative which is important to support the adoption of distributed bug tracking.
In summary it would seem, at this time, that a series of mostly immutable object in a simple
markup format would be the best available choice for the backend bug storage format.
Process Automation
Centralized bug trackers tend to support process automation. Process automation is the
ability of the bug tracker to ensure that a bug goes from New to Assigned to Resolved and,
being assigned back to the reporter, to Closed. Many projects use this to implement complex
bug life cycles and bug handling processes. Distributed bug trackers don't have the luxury
of supporting this feature in a reliable manner. There are two central reasons for this.
The first is that while centralized bug trackers operate on centralized and controlled
servers, distributed bug trackers run on the developers' own machine. The developer won't be
able to short circuit the twelve step bug process on the server, but if they are aggravated
enough they'll disable the process enforcement code on their own copies of the repository.
With no way to trust that every step has been performed in an allowable order the only way
to confirm the process has been followed is to verify after the fact.
Unfortunately this verification comes with its own problems, even if the developers follow
the process locally. Merging state between concurrent modifications can, depending on the
complexity of the bug process, result in invalid or at least ambiguous states. Merging the
output state of two identical but independently run state machines is not guaranteed to
result in a valid state of the state machine. It is possible to verify that a valid state has
been reached as the result of a merge, but that will involve manual resolution, often of a
frustratingly tedious nature. Merging of bugs makes it difficult to maintain a verified bug
state since the transitions cannot necessarily be observed.
In the end it seems that bug tracker automation will either be done mostly with wrapper
tools or VCS hooks. As with DVCS hooks versus CVCS hooks I believe we'll find that
distributed bug tracking results in the adoption of less stringent processes and additional
trust put into the users of the bug database because only after the fact can hooks be
executed at a canonical repository.
Comments, Attachments and Fields
The oldest form of distributed bug tracking is a TODO file committed beside the code. This
is usually a simple list of tasks or bugs to be fixed, perhaps with a single brief comment
explaining the issue in detail. This is the simplest form of bug tracking, just a list of
titles, maybe with a description. At the other end there are massively complex centralized
systems with bug processes, multiple comments, attachments and more fields, both free form
and constrained, than you can shake a stick at.
Distributed bug tracking covers this entire range. Simple TODO lists are not very
interesting because they are simple and quite limited, massively complex systems are
unlikely to succeed as distributed bug trackers for the reasons described in the previous
section. Most interesting is the middle ground along the lines of a basic Bugzilla
installation. Such a bug tracker support a handful of useful fields: severity, component,
state, owner, etc. They also support comments on bugs and attachments on bugs. Systems of
these moderate complexities are commonly found in open source applications and smaller
corporations.
The handling of the metadata fields is not terribly complex. These are the singular semantic
data concerning a bug which computers will find difficult to correctly merge in all
situations. Having a large number of these is not an engineering challenge, but beyond some
number it will strain the patience of the developer and be ignored. A large number of
metadata fields may also not be as useful in distributed bug tracking as centralized bug
tracking. Since relational database formats are troublesome when it comes to distributed
bug storage many of them use less structured file formats, thus running arbitrarily complex
queries on the bug database is cumbersome, often requiring parsing hundreds or thousands of
files into memory before checking each record in a loop. This is more difficult than simply
using existing text processing tools to run regex queries on the database. If a tool like
grep is used then there is no point in having a field for every possible situation since all
the comments will be searched anyways. This being the case I believe that only the most
useful of fields will be formalized with any other data being put into a structured form
appropriate for the project and placed into comments.
The issue of attachments is also not complicated other than the fact that most existing
distributed bug trackers ignore this feature entirely. This is likely just an oversight due
to the relative immaturity of the field. Attachments play an important role in the operation
of a bug tracker by being able to store data that is too large to fit conveniently into a
comment. Examples of this include logs or configuration files.
Comments in a bug tracker are a critical collaboration feature. Comments allow one developer
to communicate through time to either themselves, users, watchers or other developers. It
provides an organized area to maintain comments and investigation notes concerning a bug.
One particular issue related to comments and distributed bug tracking is comment order. Many
bug trackers use a flat comment model where comments are made in a linear order. In a
centralized model this works well since there is a definite order to the comments and there
are only small windows where a comment can be posted while another is being prepared. In
fact, many bug trackers detect this situation and prevent submitting the latter comment
until the user has read the former comment. This is a form of real time merging. Because a
consistent linear order is maintained in the views of different users the comments can
constructively reference each other. However, in a distributed world concurrent comments
will be the norm rather than the exception and not until much after the fact will it be
possible to determine a canonical comment ordering.
It is not an insurmountable challenge to make flat comments work in a distributed world, but
it is also not clear that it is the best way. One alternative is to work along the lines of
email, where you respond to particular comments in a tree. This can then be displayed in a
nested fashion this makes which comments are replying to which parent comments clear.
Perhaps this might ease the difficulties of creating a consistent canonical ordering. One
particular additional requirement of a nested presentation might be the necessity to show
the user which comments are new when they revisit a bug thread. None of the trackers I
investigated appear to support this at the moment.
User Interfaces
The most popular bug tracker user interfaces are web interfaces. A web interface is
convenient for centralized trackers because it is graphical in nature and has an easy
communication path from the centrally controlled web server to the centrally controlled
database server, often the same machine. The web interface also provides realtime feedback.
The less common, but nonetheless effective, interface types often seen are CLI interfaces,
email interfaces and GUI interfaces. Often these are used in concert with a web interface.
Of these there is no intrinsic argument against any but email interfaces. It is too
burdensome to expect a developer to always have local email configured and to integrate
every project or branch of a project into such a system. Most of the distributed bug
trackers offer a CLI interface. This is a popular option because most interactions with a
bug tracker during development are changing the state or commenting on a particular bug. For
these purposes a CLI is more than adequate. CLI interfaces also have the great advantage of
fitting well with the other CLI development tools such as VCSes, editors, build systems and
test runners. CLI interfaces are also easy to script which allows developers to automate or
integrate the tracker with other tools, such as their editor.
In general CLI interfaces are very convenient for the developer who is working on a bug.
They are less convenient if the developer has to wade through a list of bugs to find a
particular bug or otherwise navigate a large amount of data. CLI interfaces are also
entirely inappropriate for users of a project. It is unreasonable to expect a user to
checkout the source repository and use a CLI to see whats bugs a project has or the current
state of their particular bug.
Many of the disadvantages of the CLI interface could be ameliorated with a curses interface
to allow interactive navigation and modification of issues. However this would still limit
the interface to textual information. A related approach which offers additional flexibility
is the support a local web browser interface. If this interface has reasonable support for
terminal browsers then the effect can be almost as good as a dedicated curses interface with
the advantage of supporting GUI browsers with all the niceties that entails.
Distributed bug tracking brings one additional wrinkle to a web browser interface. If the
bug tracker is running locally and stores its database near the source code then having
multiple concurrent users against one instance brings numerous difficulties. Among these are
handling commit attribution avoiding conflicts. VCSes provide tools to do this when one user
uses a checkout at a time, but tend to provide no help on a finer level. The result of this
is that any bug tracker intending to support this will likely end up reimplementing much of
the isolation support of formal databases. Since this isn't required in the common case of a
single developer working on their own checkout this seems to be wasted effort.
Thus many distributed bug trackers will have two web interfaces if they have any. One will
be for local use and one will be for public use. I am not aware of any existing distributed
bug tracker which provides a read-write public web interface and stores the bugs either
on-branch or off-branch, but there are a several which have a readonly public interface. In
a later section I will discuss possible ways in which this can be made to work when
interacting with the public. If one is to write a read-write public web interface there are
several design issues which need to be thought through first.
The first of these is how to get bug changes from the webserver to the source repository. A
traditional centralized bug tracker stores its bug repository in a mutable database. This
allows data to be deleted at will. Additionally the integrity of a separate database bug
database is usually not considered as critical as the project's source repository. Thus if a
malicious user comes along and fills the centralized tracker up with hundreds of megabytes
of bugs the effects are relatively minor and a system administrator can easily delete the
greater portion of the mess. If a distributed bug tracker stores its database in the VCS
then it may not be possible to permanently delete junk data. It could be made to not appear
in recent versions, but would still exist in the immutable history. A rapid increase in size
could also cause severe problems as a source checkout which was less than a megabyte
suddenly turns into one several gigabytes in size. Even if the checkout size later decreases
back to the original size after a cleanup.
The second is related to the interface for resolving conflicts between the public interface
and the canonical bug repository. Since distributed bug tracking is distributed many bug
changes can happen concurrently only to be merged later. This is handled using VCS
capabilities in the developer case, but it is likely that using a VCS backend to a public
web interface would be cumbersome or be used differently because of the possibility of many
concurrent public users. If VCS help isn't possible in the same way as the developer use
case then a separate tool might need to be provided to pick and choose which public changes
pass moderation.
The rest of the major issues in considering a read-write public interface are those of any
other public web site with user generated content and won't be covered here.
One possible solution to this is to have some sort of staging system where the new data from
the public interface for the bug repository is manually vetted before inclusion the
permanent copy of the bug database. This moderation would need to either be performed
frequently or have the unmoderated modifications appear on the public tracker immediately to
ensure the public users receive timely interface feedback.
Though there is no specific reason a dedicated GUI interface could not be written none of
the major software described does so. This is likely partially due to the effort required
compared to a web interface or CLI. Modern web technologies coupled with a single user web
server would seem to provide nearly all the advantages of a GUI with significantly better
portability and reduced development effort.
Bug Identification
As with the change from centralized VCSes to DVCSes global identification is a tricky
subject. It is undeniable that the traditional linear numbering of bugs is an obvious
method, where possible, and easier to remember when the numbers are small. Unfortunately
such a system cannot be globally unique in a distributed world.
As with DVCSes there appears to be no alternative to random or pseudo-random identifiers,
such as cryptographic hashes. This has proven to not be overly burdensome in practice as
long as the tracker attempts to disambiguate hashes from a subset of the full string. For
example the tracker should be able to determine that the bug identified by a8d82 is actually
a8d82ff764188578 as long as the shorter prefix isn't shared by more than one full hash.
There are several methods that can meet this need, but the most common are encoded UUIDs and
taking a cryptographic hash of the contents. Obviously these should be presented in a human
readable format such as base-64 or hexadecimal. Base-64 has the advantage of being a denser
representation, but it suffers from using most of the keyboard characters and both upper and
lower case letters. This latter can cause trouble with typing correctly or going through
some systems which may stomp on the case. Hexadecimal, on the other hand, is slightly less
dense, but doesn't suffer from the case problem. Also, since it uses a limited set of
characters it is easier to include in identifiers in other systems, such as version codes.
It seems that there is room for an encoding of the pseudo-random hashes which is both denser
than hexadecimal and yet avoids the major issues of base-64. Perhaps something like base-36
(0-9a-z) would fit the bill, though some of those characters may be difficult to type on
some keyboards in some languages.
The inability to support linear numerical identifiers would seem to be a severe
disadvantage. For projects with a small number of bugs, less than one or two thousand say,
this is definitely the case. Beyond that number however the situation is less clear cut.
When the number of required digits in an identifier is greater then four or five or the new
bug rate is more than a handful per day, then the numbers themselves become more difficult
to remember and lose meaning. On many large projects bug IDs are copy and pasted anyways
since they are difficult to remember and easy to mistype. A similar situation appears to
have won out in the DVCS world, large project will have millions of commits and in such a
situation a linear numbering scheme can be no easier to tell apart than the pseudo-random
hashes which replaced them.
Software Comparison
Above I've listed all the distributed bug tracking software, both defunct and active, I
could find. In this section I will compare them briefly. First I will make any notes about
the software and then I will have a summary table of the major aspects. Most of the aspects
which are specific to distributed bug tracking have been discussed and explained above.
After all the software has been described individually I will compare the most usable
software in a table.
Artemis
Artemis is a basic tracker built as a
Mercurial extension. It has pretty complete filtering options including the ability to store
custom filters.
Last commit/release: Feb 2012
Language/Runtime: Python / Mercurial plugin
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: Maildir per issue
VCSes: Mercurial
Custom Fields:
Comments: Nested
Attachments: Yes
BugID: Hash
Multiuser: No
Bug Dependencies: No
b
b is another Mercurial extension with
a simpler model than Artemis. Note that the last release is quite old, but the development
tree has activity as of late last year. b is based off the t extension but adapted to
provide for more bug tracker-like use cases. b doesn't provide a public website itself, but
the hgsite extension will take a b bug database and
produce a simple static website from it.
Last commit/release: Oct 2012
Language/Runtime: Python / Mercurial plugin
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: Readonly via hgsite
GUI: No
File format: Sectioned text fields
VCSes: Mercurial
Custom Fields: No
Comments: Yes
Attachments: No
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
Bugs Everywhere
Bugs Everywhere is likely the most mature of the
distributed bug trackers. It has a reasonably active user base and seems to have most of the
features to be expected of a distributed bug tracker. The project has had multiple
contributors and is currently on its third maintainer since 2005. Bugs Everywhere
additionally has an email interface, which is rare among distributed bug trackers.
Last commit/release: March 2013
Language/Runtime: Python
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: Readonly
GUI: No
File format: JSON, one file per comment
VCSes: Arch, Bazaar, Darcs, Git, Mercurial, Monotone, Others possible
Custom Fields: No?
Comments: Yes
Attachments: Yes
BugID: UUID
Multiuser: Yes
Bug Dependencies: Yes
cil
cil is another small CLI only distributed bug
tracker. It provides some basic integration with Git, but can also be used with other VCSes
as long as you are willing to add and commit changes to the bug repository manually.
cil uses a unique bug repository format where every issue and comment have a file inside a
single directory. Each issue and comment has a link to it's children or parent. Thus adding
a comment may cause a merge conflict in the issue file if another comment was added
concurrently, but it will be restricted to references.
Last commit/release: Oct 2011
Language/Runtime: Perl
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: Simple key-value-freeform markup
VCSes: Git-supported but not required
Custom Fields: No
Comments: Yes
Attachments: Yes
BugID: Hash
Multiuser: Yes
Bug Dependencies: Yes
DisTract
DisTract is one of the older distributed bug
trackers, but seem to have fallen off the Internet. You can find the last copy of the site
at Archive.org. DisTract is
interesting in that it doesn't provide a CLI interface, but instead all the bug interactions
are performed from within a page in Firefox (not any other browser) which uses Javascript to
access the filesystem directly. Unfortunately since I have been unable to find any copies of
DisTract not a lot is known about it.
From the archived website it was clear that the author intended to have a bug specific merge
algorithm, though it seems unlikely that ever came to pass.
A bug tracker which didn't make my list because it requires realtime access to a central
repository but takes a similar implementation view is Artifacts for Web. The bug tracker runs locally in the browser but
all the bug storage happens on a central SVN server directly.
Last commit/release: mid-2007
Language/Runtime: Haskell / Javascript / Firefox
Bug storage: ?
Dog food: Yes
CLI: No
Local Web UI: Yes
Public Web UI: ?
GUI: No
File format: JSON?
VCSes: Monotone
Custom Fields: ?
Comments: ?
Attachments: ?
BugID: ?
Multiuser: ?
Bug Dependencies: ?
DITrack
DITrack is the first off-branch distributed bug tracker in
this list. Ditrack is interesting in that it only support SVN, a centralized VCS. As
such several of its design features are rare. The first is a linear bug ID scheme, bugs are
numbered in sequential order. Each issue is a directory made up of multiple files. Each file
is numbered in sequence but seems immutable. Thus each issue is the sum of the log-type
entries from the files. While sequential numbering has obvious problems in a decentralized
system the log structure does present an interesting solution to the merging problem. Since
the bug is the combined last state of various fields from the log there need never been any
manual merging since regular file merging will always automatically result in a last-wins bug
metadata merging strategy.
Last commit/release: Aug 2008
Language/Runtime: Python
Bug storage: Off-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: Read-only
GUI: No
File format: RFC-822
VCSes: SVN
Custom Fields: No
Comments: Yes
Attachments: Yes
BugID: Linear
Multiuser: Yes
Bug Dependencies: No
dits
dits appears to be the aborted beginnings of a
distributed bug tracker. Its functionality isn't very complete and it doesn't appear usable.
Last commit/release: Apr 2010
Language/Runtime: Python
Bug storage: On-branch
Dog food: Yes
CLI: No
Local Web UI: Yes
Public Web UI: No
GUI: No
File format: JSON
VCSes: HG, Git?
Custom Fields: No
Comments: No
Attachments: No
BugID: Hash
Multiuser: No
Bug Dependencies: No
Ditz
Ditz is a distributed bug tracker which was, at one
time, fairly popular as distributed bug trackers go in the Ruby community. Now it seems to
be abandoned though several people have created personal forks on gitorious. Ditz has no
native support for any particular VCS, but it does have a plugin system which has been used
to integrate with Git. Of interest, especially to Emacs users, is that Ditz has an
accompanying Emacs major mode. Ditz has a particular focus on grouping issues into releases.
There appears to be a local web UI "Sheila" but I am unsure of it's usability state.
Last commit/release: Sept 2011
Language/Runtime: Ruby
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: Read-only
GUI: ditz-commander
File format: YAML
VCSes: Agnostic
Custom Fields: No
Comments: Yes
Attachments: No
BugID: Hash
Multiuser: With plugin
Bug Dependencies: No
Fossil
Fossil is not just a
distributed bug tracker, but an entire development forge in a box. It includes a DVCS, wiki,
bug tracker and web server. One might call it a distributed forge. Since Fossil stores the
tickets in its distributed database there is a custom merging algorithm which is mostly
apparently newest-wins, but avoids any manual merging of bug files.
Last commit/release: Apr 2013
Language/Runtime: C
Bug storage: Out-of-tree
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: Read-write
GUI: No
File format: Database
VCSes: Fossil
Custom Fields: Yes
Comments: Yes
Attachments: Yes
BugID: UUID
Multiuser: Yes, but no ownership
Bug Dependencies: No
UPDATE 2013-06-03: As C2H5OH mentioned in the comments it is possible to add custom
fields.
git-case
git-case is a bare bones proof of concept
distributed bug tracker built in the style of the git porcelain. The website claims that
some operations are sluggish, but no further details are given.
Last commit/release: Oct 2010
Language/Runtime: Bash
Bug storage: Off-branch
Dog food: No
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: Plain text
VCSes: Git
Custom Fields: Yes
Comments: Yes
Attachments: Yes
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
git-issues
git-issues is a mostly defunct tracker
built on top of Git in a similar manor to git-case.
Last commit/release: June 2012
Language/Runtime: Python
Bug storage: Off-branch
Dog food: No
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: XML
VCSes: Git
Custom Fields: No
Comments: Yes
Attachments: Yes
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
gitissius
gitissius started off as a fork of
git-issues, but then diverged significantly.
Last commit/release: Dec 2011
Language/Runtime: Python
Bug storage: Off-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: JSON
VCSes: Git
Custom Fields: No
Comments: Yes
Attachments: No
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
gitli
gitli
is really more of a single user TODO list than a fully fledged distributed bug tracker. All
the issues are contained within a single file. With that setup, linear BugIDs and no
comments this isn't really suitable for any except the simplest of project needs.
Last commit/release: March 2011
Language/Runtime: Python
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: Custom text
VCSes: Git
Custom Fields: No
Comments: No
Attachments: No
BugID: Linear
Multiuser: No
Bug Dependencies: No
gitstick
gitstick is apparently based upon Ticgit. This
seems to be a young distributed bug tracker yet. Unfortunately I wasn't able to determine
much information about how this project operates from inspection. It may be less appropriate
to call this a standalone distributed bug tracker than a local web UI for Ticgit.
Last commit/release: Jan 2013
Language/Runtime: Scala
Bug storage: Off-branch
Dog food: No
CLI: No
Local Web UI: Yes
Public Web UI: No
GUI: No
File format: ?
VCSes: Git
Custom Fields: ?
Comments: ?
Attachments: ?
BugID: ?
Multiuser: Yes
Bug Dependencies: No
klog
klog appears to be greatly in flux at this time
so it is difficult to say much which is likely to be accurate in a year. There appears to be
a great many features planned, but only the most basic features are implemented. According
to the bug database a complete rework of the way the bug database is stored is planned.
Last commit/release: Mar 2013
Language/Runtime: Javascript
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: Prototype?
Public Web UI: No?
GUI: Mac OSX
File format: Key-value-text
VCSes: Agnostic
Custom Fields: No
Comments: No
Attachments: No
BugID: Hash
Multiuser: No
Bug Dependencies: No
Mercurial Bugtracker Extension
Mercurial Bugtracker Extension
uses an unusual layout for bugs. There is one directory for open bugs and another for closed
bugs. Such a layout may cause issues when there are concurrent modifications such as one
person modifying an open bug and another closing it.
Last commit/release: May 2012
Language/Runtime: Python / Mercurial plugin
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: INI
VCSes: Mercurial
Custom Fields: No
Comments: No
Attachments: No
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
milli
milli seems to have disappeared during the
lengthy research period so no further information is available.
Last commit/release: ?
Language/Runtime: ?
Bug storage: ?
Dog food: ?
CLI: ?
Local Web UI: ?
Public Web UI: ?
GUI: ?
File format: ?
VCSes: Agnostic
Custom Fields: ?
Comments: ?
Attachments: ?
BugID: ?
Multiuser: ?
Bug Dependencies: ?
Nitpick
Disclosure: Nitpick is written by the author.
Nitpick is a relatively young distributed
bug tracker with most of the significant features discussed in this article. One notable
feature of Nitpick not present in other distributed bug trackers is the ability to combine
multiple Nitpick databases, via the foreign project feature, into a single view. This allows
viewing both bugs across several project and across several branches in a single instance of
Nitpick.
Last commit/release: Apr 2013
Language/Runtime: Python
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: Read-only
GUI: No
File format: Simple markup
VCSes: git, hg, svn
Custom Fields: No
Comments: Nested
Attachments: Yes
BugID: Hash
Multiuser: Yes
Bug Dependencies: Yes
pitz
pitz started off as a reimplementation of Ditz.
Last commit/release: Aug 2012
Language/Runtime: Python
Bug storage: On-branch
Dog food: Yes
CLI: Yes
Local Web UI: No
Public Web UI: No
GUI: No
File format: YAML
VCSes: Agnostic
Custom Fields: No
Comments: Yes
Attachments: Yes
BugID: UUID
Multiuser: Yes?
Bug Dependencies: No
scm-bug
scm-bug is not a standalone distributed bug tracker.
Instead it ties source code to an existing bug tracker. It might be possible to use this
with a locally installed tracker in a distributed fashion.
Last commit/release: Feb 2011
Language/Runtime: Perl
Bug storage: Out-of-tree
Dog food: ?
CLI: ?
Local Web UI: ?
Public Web UI: ?
GUI: ?
File format: ?
VCSes: svn, git, cvs, hg
Custom Fields: ?
Comments: ?
Attachments: ?
BugID: ?
Multiuser: ?
Bug Dependencies: ?
Simple Defects
Simple Defects is more than just a distributed bug tracker,
it is also capable of synchronizing bidirectionally with several centralized bug trackers.
SD uses a distributed database instead of storing the bug repository alongside the
source code in a VCS. As such the VCS support it does have is mostly limited to adding
commands to the VCS command it self. Since SD is capable of synchronizing bugs in multiple
ways it might be possible to use it as an intermediate step between a central project bug
tracker and a locally installed centralized bug tracker for developer use.
Last commit/release: Sept 2012
Language/Runtime: Perl
Bug storage: Out-of-tree
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: No
GUI: No
File format: Database
VCSes: git, darcs and other
Custom Fields: ?
Comments: Yes
Attachments: Yes
BugID: Linear
Multiuser: Yes
Bug Dependencies: ?
Stick
Stick is another one of those distributed bug
trackers which seems to have fallen of the Internet. I'm unable to retrieve the source to
get much concrete information but the website makes it seem as if Stick was mostly in the
idea conception phase with little actual working functionality.
Last commit/release: ?
Language/Runtime: ?
Bug storage: ?
Dog food: ?
CLI: ?
Local Web UI: ?
Public Web UI: ?
GUI: ?
File format: ?
VCSes: Git
Custom Fields: ?
Comments: ?
Attachments: ?
BugID: Hash
Multiuser: No
Bug Dependencies: ?
ticgit-ng
ticgit-ng does dogfood itself, but that
isn't evident from the main repository. I had to search through some forks on Github to find
the bug branch. Ticgit-ng uses an interesting approach to managing the data by having a
single 'file' per field. Thus there is a file for the state and one for each comment. Not
evident in the feature summary is that Ticgit-ng supports tagging issues, though it isn't
clear if it supports multiple tags or only one.
Last commit/release: Oct 2012
Language/Runtime: Ruby
Bug storage: Off-branch
Dog food: Yes
CLI: Yes
Local Web UI: Yes
Public Web UI: No
GUI: No
File format: Plain text
VCSes: git
Custom Fields: No
Comments: Yes
Attachments: Yes
BugID: Hash
Multiuser: Yes
Bug Dependencies: No
Veracity
Veracity is another distributed forge in that it
is not only a distributed bug tracker, but also wiki and source control. Again the bugs are
stored in a distributed database which has some special logic and interfaces for helping
merging along.
Last commit/release: Mar 2013
Language/Runtime: C
Bug storage: Out-of-tree
Dog food: Yes
CLI: No
Local Web UI: Yes
Public Web UI: No?
GUI: No
File format: Database
VCSes: Veracity
Custom Fields: No?
Comments: Yes
Attachments: Yes
BugID: Linear
Multiuser: Yes
Bug Dependencies: No
Summary Table
Only what I consider to be fully fledged distributed bug trackers worth considering for use
in a project of more than one developer will find a place in this summary table for reasons
of space. All the same information is available for every tracker I evaluated in their
respective sections. The primary determinant of suitability is multi-user support (the
ability to assign bugs to users and to determine who made any particular comment or bug
report), a sufficiently recent commit or release and what appeared to be at least one mature
interface for developer use. The range on project complexity these trackers are suitable for
varies, but since small two man project should use a bug tracker just as large projects
should I list trackers of multiple complexities.
To save space I have skipped the fields for the GUI (since none of the selected trackers
have a GUI), support for custom fields (again since none of them appear to have such
support) and multiuser support since that was one of the requirements and all have some such
support. It is important to note that all but Fossil have full multi-user support out of the
box. Fossil lacks the ability to assign a bug to a particular person for resolution, but it
can be added as a set of custom fields.
Comparison part 1
Software | Last Commit / Release | Language | Bug Storage | Dogfood | CLI | Local Web UI | Public Web UI |
b | Oct 2012 | Python | On-branch | Yes | Yes | No | Read only |
Bugs Everywhere | Mar 2013 | Python | On-branch | Yes | Yes | Yes | Read only |
Fossil | Apr 2013 | C | Out-of-tree | Yes | Yes | Yes | Read-write |
git-issues | June 2012 | Python | Off-branch | No | Yes | No | No |
Mercurial Bugtracker Extension | May 2012 | Python | On-branch | Yes | Yes | No | No |
Nitpick | Apr 2013 | Python | On-branch | Yes | Yes | Yes | Read only |
Simple Defects | Sept 2012 | Perl | Out-of-tree | Yes | Yes | Yes | No |
ticgit-ng | Oct 2012 | Ruby | Off-branch | Yes | Yes | Yes | No |
Veracity | Mar 2013 | C | Out-of-tree | Yes | No | Yes | No? |
Comparison part 2
Software | File format | VCSes | Comments | Attachments | BugID | Bug Dependencies |
b | Sectioned text | hg | Yes | No | Hash | No |
Bugs Everywhere | JSON | Many | Yes | Yes | UUID | Yes |
Fossil | Database | Fossil | Yes | Yes | UUID | No |
git-issues | XML | git | Yes | Yes | Hash | No |
Mercurial Bugtracker Extension | INI | hg | No | No | Hash | No |
Nitpick | Simple Markup | svn git hg | Nested | Yes | Hash | Yes |
Simple Defects | Database | git darcs other | Yes | Yes | Linear | ? |
ticgit-ng | Plain text | git | Yes | Yes | Hash | No |
Veracity | Database | Veracity | Yes | Yes | Linear | No |
Bug Handling Strategies
By analogy with DVCSes distributed bug tracking provides some new capabilities, make some
older techniques easier and make some traditional centralized bug tracking methods all but
impossible. In this section I'll try to cover the most common of these cases and some way to
work within the limits, tighter and looser, which distributed bug tracking software as
it exists today provides.
Distributed Use Cases
Most of the talk around distributed bug tracking is about replacing a centralized bug
tracker completely. This is so for the obvious reason that most developers don't want more
than one bug tracker per project. There are, however, some interesting alternative uses
which are not in direct conflict with a centralized tracker. One such use is the aggregation
of multiple trackers into a single one. Consider the case of a developer who works on
several different projects, If these projects don't share one bug tracker then the developer
must regularly check these separate trackers. Some of the distributed bug trackers described
above support bidirectional communication with other bug trackers, centralized or not. As
such the developer could configure a local distributed bug tracker to give an overview of
several trackers.
An alternative is a hybrid centralized-decentralized setup, similar to how DVCS is used in
practice in many cases. If the project or organization has a single centralized tracker a
developer could setup a distributed tracker as a mirror, full or partial, of the centralized
tacker for personal consumption and modification when they are disconnected or operating
over a poor network link. Whenever it is convenient they would then trigger a bidirectional
synchronization. Thus they gain all the advantages of distributed bug tracking without many
of the disadvantages. This model is similar to individual developers who use git as an
interface to a Perforce or Subversion repository.
Yet another use case, again depending on aggregation, is to combine various bug trackers for
a single project. As an example consider the case where the bug trackers of an open source
package and bugs against that package in the trackers of all the major Linux distributions
could be combined for a more complete view of the issues users are having with the software.
The various ways a distributed bug tracker could be used is not fully explored so these are
just a few examples of how they could be integrated into workflows.
Non-Developer Members
One common concern with large projects moving to distributed bug tracking is how to
integrate QA and project managers. The predominant view among existing large projects is
that QA and project managers, for the most part, should neither need nor have access to the
VCS. Bringing this stance to distributed bug tracking would imply that QA would have no way
to directly interact with the bug tracker in other than a readonly fashion. The solution to
this predicament is to give those QA and project managers read-write access to the VCS.
There are a few reasons such a move is resisted. Many of them are obsolete or misinformed
notions based upon limitations of old VCSes or poor bug trackers. The first of these is an
entirely valid argument that requiring all the QA and project managers to become experts in
the VCS of choice is overly onerous. At more than one place I have seen the local VCS expert
setting up special wrappers to perform only the limited set of functionality a particular
artist or QA needed to get their job done and hid all the other complexity. In a similar
vein any good distributed bug tracker will provide a sufficiently simple interface to the
VCS for the bug operations that minimal training should be necessary.
A second common claim, especially among open source projects, is that the VCS is for source
code only and everything else should be kept separate. While it is possible to have a
parallel VCS repository, or some other arrangement as will be discussed below, modern VCSes
are not simply source control systems, but generalized version control systems. Though some
VCSes handle them less well than others, many large projects have good success storing large
assets or even build chain tools into the VCS alongside the project. As such there is no
reason not to also store the bug database and all the input from the QA people as well. The
VCS can be viewed as the project state and not just the project output.
A final possible complaint is that the QA people may, not being VCS experts, make disastrous
mistakes relating to merging or reviving stale commits or just editing the other parts of
the project inadvertently. While this is true when no protections are put in place, most
VCSes provide the ability to either restrict different users to different portions of the
checkout tree or otherwise have a knowledgeable person double check their changes before
accepting them into the main development repository.
Integrating QA, project managers and other non-developers such that they can make full use
of the distributed bug tracker is not a difficult matter, it merely requires that sufficient
training and protections be put in place. These less technical people will likely, however,
not be pleased with purely a command line interface to the tracker. Partially this is
because their use cases tend to not deal with one bug at a time but normally traversing,
reading, commenting on and modifying several in quick succession. Partially this aversion
will be to reduced familiarity with CLI tools as compared to the average developer. For this
reason any distributed bug tracker used should also provide a good read-write local web or
graphical interface for these less technical users.
Care must also be taken when it comes to helping them know which branch of development to
find the appropriate bugs in. For support type staff this is as easy as having them choose
the version to file the bug against first and choosing the correct bug repository version
based upon that. For QA users it is more a matter of ensuring that the builds they test come
along with the bug database. This is most easily accomplished with a fully automated build
system which can produce QA testable builds on demand. With such a system QA is given a
source tree which is trivially built into a product to test. Then QA need merely use that
branch to handle with any bugs for that build.
Public Users
As previously discussed a major outstanding issue, especially for open source projects, is
how to provide the public with a useful interface into the project bug tracker. Few
read-write web interfaces suitable for public consumption with a distributed bug trackers
have been created, though no insurmountable obstacle appears to block the way in most cases.
Currently I can only recommend one approach to solving this issue, namely having a readonly
user web interface which is updated frequently and handling any bug modification or creation
on the part of users as part of a support mailing list.
This will not be as convenient for both the developers and users as a public bug tracker,
but is likely to provide better results for both parties. The users, instead of creating a
new bug which is likely never to be answered if it is ever read by the developer, will
interact with a developer or other support person for the project directly. This will allow
the developer to not only determine if this is an existing bug, a step users often never
perform correctly, but also ensure that all the necessary information has been acquired
before the user leaves and is never heard from again. Many bugs in open source bug trackers
are full of incomplete information and the reporting user is nowhere to be found. The user
is also better off as there may be a solution to the particular issue they are facing which
they will be told about immediately instead of waiting for a WorksForMe resolution of their
bug, if that ever comes.
As previously mentioned the developer will be able to extract all the necessary information
from the user more easily because the discussion will happen immediately instead of days or
weeks later when somebody gets around to viewing the newest bugs on the tracker. Developers
also benefit by having fewer duplicate bugs with slightly different information cluttering
up the tracker because they'll deduplicate as they go along. An additional advantage is a
greater likelihood of a user actually reporting the issue. If the recommended way for a user
to report a problem is to a mailing list they are reasonably likely to do so. They are less
likely to create yet another account for yet another bug tracker which they will never use
again such that they can file one bug which will almost certainly not receive a response.
One critical and necessary aspect of this to remember is that responses to the users on the
list must be timely and efficient. It is this requirement of good communication which brings
about the benefits for both the developers and users. In fact, this method is how many
commercial companies operate, the customers interact directly with support staff who
navigate and fill in the necessary information in the bug tracker. The second critical
aspect is that the public readonly web view is updated frequently. With an up to date place
for users to track the state of their bug, look up resolutions to other similar issues or to
point users when they are having a known issue is invaluable and saves developers time.
Users prefer to get the answers they want without having to bother the developers and they
like to see progress.
Multibranch Overview
One particular issue with distributed bug tracking is that there is not necessarily a single
complete view of the bugs at any time. Instead different branches may have different bugs in
different and conflicting states. For example a development branch may have fixed a bug but
since that branch hasn't yet merged to the trunk the trunk doesn't have that bug marked as
fixed. A further example is a release branch having a bug created against it on the
complaint of a user, but that bug not having made its way via merging to the trunk and so
that bug exists nowhere else in the VCS. These are all examples of the power of distributed
bug tracking when it is used to have bug states follow the code flow within the project.
However, sometimes it is useful to have a complete, or more complete view of the bugs. As an
example, a project manager may want to know what bugs have been fixed for the coming
release, even if not all of those changes have made it to the release branch yet. Perhaps
the code must move from a development branch, through a QA branch before arriving in the
release branch. It is still important to note the state of the bug that is otherwise marked
as open in the release branch is actually closed in some branch. Another situation is one of
a developer who works in a development branch. This developer would like to be able to, when
he views the bug database, see not just the bug information as it appears in his branch, but
also in the trunk in case some new bug or comment relevant to his current branch appears.
This cross-branch bug database merging is an important feature to ensure a wider view of the
state of the bug database when such a view is useful. At the time of the this writing only
one distributed bug tracker which I am aware of, Nitpick, supports such a facility directly.
Indirectly it is possible to script a CLI interface to merge the bug query results across
many VCS checkouts. Of course any distributed bug tracker which used off-branch or
out-of-tree storage will have neither this disadvantage nor the advantage of having
differing branch versions of the bug database.
Using On-Branch as Off-Branch
Distributed bug tracking holds many possible advantages and uses which cannot be filled by
traditional centralized bug trackers. But it may be that not all of this power is desired
for a particular project. In many cases it is possible to configure the distributed bug
tracker, with some scripting effort, to work in a less powerful mode.
For example, while off-branch bug repositories will likely have an easier interface for
storing bugs that way it is possible to use an on-branch bug tracker as an off-branch
tracker. Simply create a separate branch or checkout for the bug repository and write some
scripts which direct the bug tracker to use that branch or checkout instead of putting the
bugs beside the source code. This relatively simple step will produce an off-branch bug
tracker with the bugs stored in the VCS.
Similarly a setup even more similar to a centralized bug tracker, but with the bugs stored
entirely in the VCS could be setup by presenting the web interface to each developer and
then directly committed to the VCS. There is even the possibility of doing these simpler
setups for some users of the repository while allowing the full distributed capabilities for
others, perhaps the remote workers.
In much the same way an on-branch or off-branch bug tracker can be turned into an
out-of-tree tracker simply by having a separate VCS repository which contains only the bug
repository.
Surviving A Manual Bug Process
In the beginning bug trackers started as simple TODO lists, perhaps with some notes. From
there the massive environmental spectrum of bug tracking tools and processes evolved. At the
extreme end there are very complicated bug processes and tools. While these sorts of
processes can be translated to distributed bug tracking they are likely to be cumbersome and
disappointing. Instead distributed bug tracking is better suited to simpler processes and
fewer fields. Because all the bug database is available locally to a developer it is
simpler to run complex queries as scripts locally. Any datum which isn't extremely common is
likely better off as a formatted comment to a bug instead of a custom field with complex
automation behind it.
Along these lines a simpler bug process in general is recommended. A small number of bug
states, priorities and the like is recommended. If the bug process is simple enough then no
automation will be necessary because the developer will have only one obvious choice and
will be able to arrive directly at the state they desire. This is contrary to a common setup
where the developer must first mark a bug as assigned before marking it resolved before
marking it closed. For many simple bugs this is overkill and the developer will spend more
time navigating the process than fixing the bug. In such cases the developer wants to be
able to skip straight to closing the bug.
With distributed bug tracking it pays to have a clear and simple process. Only a handful of
states are needed and most information is better suited to being in a comment than a custom
field.
Where to Enter Bugs
One issue which comes up when discussing the abstract theory of distributed bug tracking is
that it would be ideal if a bug could be associated with the original commit which
introduced the error across all the various branches and clones. This is nice in theory but
also impossible in theory. There may be no single commit which introduced the error for one
thing. While it is possible to associate a bug with any commits which do introduce an issue
that is really just a mapping from bug ID to commit ID. It is possible to construct a bug
tracker in this way, and it would be able to cut across branches using this mapping.
Lacking such a system the next best that can be done is to ensure that bugs are entered
where the fix would be placed. As an example consider a project with a recent release and a
trunk where development continued. During the release stabilization process a branch would
have been created for the release while any remaining major bugs where fixed there. All
those changes would then be merged back into trunk at a later time. Any bugs found by QA
against that stabilizing release should be raised in the release branch. Then when any fixes
propagate so too will the bug information.
In a similar way bugs in maintenance releases should be raised in the branch for that
release to eventually be merged into the trunk. The fix itself may or may not still be
applicable, but since some changes will be the bug database changes should be merged up as
well.
Now all this depends on the particular branching and versioning strategy the project uses.
If the project doesn't have maintenance releases or doesn't move changes around like that
then a different location to report or modify bugs will be appropriate.
Other Distributed Tracking Options
As previously stated distributed bug tracking really started as simple TODO files. As such
there are ways of tracking bugs which don't require fully fledged bug tracking software,
distributed or otherwise. Most of these are severely limited in several ways, but a project
may not hit these limits.
Beyond the simplest TODO lists are things such as the emacs org-mode. This can work well,
but may fall apart when multiple developers are involved and will make providing a public
read only view into the bug database cumbersome.
Another alternative, which doesn't suffer from this last limitation, is to use wiki software
to track bugs. There exist VCS based wikis, such as ikiwiki. These will tend to be usable with a standard text editor but
still provide an easy rendering option to provide users on the web. Using a wiki like this
will tend to make it difficult for users to find issues that may apply to them expect by
reading all the existing issues.
Is This Worth Doing At All?
It may seem odd to have a section which deals with the question of the value of distributed
bug tracking so late in the article, but without understanding distributed bug tracking as
it is currently known it is quite difficult to make a reasoned judgement on the matter.
There are opinions in both directions. Proponents of distributed bug tracking focus on the
isolation capabilities, offline support and bug branching, while opponents focus on the
collaborative aspects of bug tracking which distributed bug tracking slows down. Both
groups have points and the strength of any particular point truly depends on the project in
question.
To start consider a project where all work is done on feature or bugfix branches, there is a
thorough review process and all real discussion happens on a mailing list with relevant
messages copied into the bug tracker manually for reference. In this case distributed bug
tracking would seem to have few downsides. All the discussion happens in a broadcast medium,
the email list, so every developer can easily get a sense of the current stage and latest
debugging information. Since the bugs are fixed on-branches and a thorough review process
may cause a large span of time to pass between the bug being fixed and that fix being merged
into the trunk the ability of bug states to follow the fixes is very useful, especially if
there is some tool support to aggregate bug states across multiple branches.
A different project might instead to the vast majority of its development on the mainline
with all discussion occurring via the bug tracker. Here distributed bug tracking seems to
have no detriment. Surely the full capabilities are not being used, but perhaps offline
support is sufficiently useful. As soon as the developer synchronizes their local copy they
will have all the new discussion. This does require that the developer frequently and
regularly synchronize, which may be a change in workflow. This is less of a burden with
DVCSes, but can be an issue for projects or developers which prefer checking in single,
complete units of work as a single commit. Something like committing a few days of work at a
single time instead of as several commits over those days. In these situations frequently
merging in changes from the trunk may be onerous.
There is then a third case of a project with the branching structure of the first case, but
the communication system of the second. That is, all work is done on-branches and the vast
majority of the communication occurs exclusively via the bug tracker. This situation can
cause some difficulty when using a distributed bug tracker. The time to push a new bug
comment up and then have another developer pull it down can be quite significant. There are
simple solutions however, the simplest of which is to have the canonical VCS repository have
a hook which emails out new bug comments and state changes when those changes are pushed to
it. Having the centralized bug tracker email out such information is very common already and
shouldn't be an issue.
It remains to be seen which side of the argument will win out, but it currently appears that
distributed bug tracking fills a real need, especially when coupled with a DVCS, and has few
downsides without relatively simple technical fixes. For the time being at least,
distributed bug tracking appears to be useful tool worth using.
Future Thoughts
Distributed bug tracking is a young concept, even considering the age of other concepts in
the various fields of computing. As such there are many areas which have not been thought
through and it is unlikely that any of the current generation of distributed bug trackers
have all the features and functionality which will one day be considered essential. Here are
a few ideas or issues which still need resolution with respect to distributed bug tracking.
Tracking Changes
One of the first advantages which comes up when considering distributed bug tracking is the
ability for the closing of a bug to follow the change which fixes the bug as it propagates
through branches and releases. This works fine with on-branch storage, but there are
arguments against on-branch storage related to bug visibility and the length of time it
takes for a comment on a bug to propagate to the branch a developer is watching. Off-branch
bug storage, however, gives up on the ability for bug state to follow code fixes. One
possible solution to this is to, with VCS support, store the change IDs which fixes the bug
and then query the database to see if that change exists on the branch in question when
showing the bug state.
VCS Storage Limitations
It is appealing to store the bugs directly in the VCS of the project, either beside the code
or in their own branch. For a moderate number of bugs and comments this is not an issue.
However, the file layouts and formats which are the easiest for the VCSes to merge are not
especially efficient and may cause issues when scaling. There does not yet seem too be
enough experience to determine how this scaling should be dealt with or even how much of an
issue it will become. Should old issues be archived into a more efficient format? Should old
issues be deleted from the HEAD of the VCS and rely on the VCS history to retrieve it? Is
there some other option which is superior to those mentioned?