A structured approach to labeling
Everything these days have labels: emails, GitHub issues and pull requests, todo list items, saved articles, etc. More importantly, labels often are the only available tool for classification. Yet, bad labels can increase clutter and confuse more than they inform.
In this post I’ll make a case for structuring labels, using GitHub issues and pull requests labels as an example.
Sending the right message
One label, two meanings
It’s often underlooked that a label has two meanings: one when it’s applied, and one when it isn’t. A label adds a piece of information not only to items that bear it, but also by contrast to items which don’t. One striking example is the help wanted
label on GitHub: using this label on any subset of items conveys (deliberately or not) that help is not desired on the others.
Take the time to consider the implicit classification of unlabeled items. When help wanted
exists as a label, the default situation is implicitly that help isn’t wanted. Rename this label to help not needed
and you just reversed that perception. Such details may be of importance when aiming to build a welcoming project.
The label soup
The default set of labels on GitHub is the following: bug
, duplicate
, enhancement
, help wanted
, invalid
, question
, wontfix
.
The problem with a flat collection of labels is that there’s no obvious purpose, and no explicit relationship between them. For example, which labels are mutually exclusive? It might be generally accepted that an issue is either a bug
or an enhancement
, but what about question
? Can I have a bug question
and an enhancement question
? What about a invalid question
?
A flat collection of labels is an heterogenous pool of values to pick from: like anything which doesn’t have clear intent, it becomes subject to interpretation. Luckily, we have a nice tool to give meaning to values: types!
A structured approach to labeling
Thinking with types
If you were writing code, how would you define a type to hold the metadata you care about? Taking an imaginary set of labels as an example, this is what the typical label organization would look like when expressed as code:
1
2
3
4
5
6
7
8
9
10
11
12
13
type Label int
const (
API Label = iota
Bug
CodeReview
DesignReview
Networking
Question
Storage
)
type Metadata []Label
For those unfamiliar with Go, this is simply defining a Label
enumerate, and the Metadata
type as a collection of Label
values.
That’s the “label soup” described above, but we can be more expressive about the intent and constraints:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
type Area int
const (
API Area = iota
Networking
Storage
)
type Kind int
const (
Bug Kind = iota
Feature
Question
)
type Status int
const (
DesignReview Status = iota
CodeReview
)
type Metadata struct {
Areas []Area
Kind Kind
Status Status
}
The Metadata
type properly captures that a given item can have multiple areas, a single kind, and a single status. Similarly, it’s now clear that an item is either a Bug
, a Feature
, or a Question
.
Expressing as labels
Now armed with the model we want, we need a way to represent it with labels. Labels are typically just strings, so we need some kind of micro-format: the way it’s set up on the Docker project and what I use throughout my tools is a straightforward field/value
syntax.
Area | Kind | Status |
---|---|---|
areas/api |
kind/bug |
status/design-review |
areas/networking |
kind/feature |
status/code-review |
areas/storage |
kind/question |
This gives us quite a different approach to labeling: setting a label is assigning a value to a named field of a metadata type. The intended structure:
1
2
3
4
5
Metadata{
Areas: []Area{ API, Networking },
Kind: Bug,
Status: CodeReview,
}
… translates to the following set of labels: areas/api
, areas/networking
, kind/bug
, status/code-review
. Easy enough!
Of course, none of the services I know would technically enforce that an item can have multiple areas/
but a single kind/
. The best we can do is rely on the added explicit purpose on each label, and on the pluralization of areas/
field name to signify a list over a scalar value. On the Docker project, we took the extra step of documenting the labels.
As a bonus: structured labels make it way easier to visualize data according to specific attributes. For example breaking down GitHub issues into areas
, kind
, and version
:
Summing up
Whether on GitHub issues, emails, todo list tasks, or unread articles, labels are often are only tool provided for classification, and getting them right is key to being able to manage large sets of items. Take the time to consider their explicit and implicit meanings, and how much is subject to interpretation. A structured approach to labeling can truly help here.
As the implementer of a service that supports labels, define the default set wisely. And if it happens that your service is targeted at a technical audience, I think it’s worth exploring how a labeling system could embrace a structured approach and offer the ability to specify a model in order to enforce constraints.