A structured approach to labeling

Everything these days have labels: emails, GitHub issues and pull requests, todo list items, saved articles, etc. More importantly, labels often are the only available tool for classification. Yet, bad labels can increase clutter and confuse more than they inform.

In this post I’ll make a case for structuring labels, using GitHub issues and pull requests labels as an example.

Sending the right message

One label, two meanings

It’s often underlooked that a label has two meanings: one when it’s applied, and one when it isn’t. A label adds a piece of information not only to items that bear it, but also by contrast to items which don’t. One striking example is the help wanted label on GitHub: using this label on any subset of items conveys (deliberately or not) that help is not desired on the others.

Take the time to consider the implicit classification of unlabeled items. When help wanted exists as a label, the default situation is implicitly that help isn’t wanted. Rename this label to help not needed and you just reversed that perception. Such details may be of importance when aiming to build a welcoming project.

The label soup

The default set of labels on GitHub is the following: bug, duplicate, enhancement, help wanted, invalid, question, wontfix.

The problem with a flat collection of labels is that there’s no obvious purpose, and no explicit relationship between them. For example, which labels are mutually exclusive? It might be generally accepted that an issue is either a bug or an enhancement, but what about question? Can I have a bug question and an enhancement question? What about a invalid question?

A flat collection of labels is an heterogenous pool of values to pick from: like anything which doesn’t have clear intent, it becomes subject to interpretation. Luckily, we have a nice tool to give meaning to values: types!

A structured approach to labeling

Thinking with types

If you were writing code, how would you define a type to hold the metadata you care about? Taking an imaginary set of labels as an example, this is what the typical label organization would look like when expressed as code:

1
2
3
4
5
6
7
8
9
10
11
12
13
type Label int

const (
    API Label = iota
    Bug
    CodeReview
    DesignReview
    Networking
    Question
    Storage
)

type Metadata []Label

For those unfamiliar with Go, this is simply defining a Label enumerate, and the Metadata type as a collection of Label values.

That’s the “label soup” described above, but we can be more expressive about the intent and constraints:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
type Area int

const (
	API Area = iota
	Networking
	Storage
)

type Kind int

const (
	Bug Kind = iota
	Feature
	Question
)

type Status int

const (
	DesignReview Status = iota
	CodeReview
)

type Metadata struct {
	Areas  []Area
	Kind   Kind
	Status Status
}

The Metadata type properly captures that a given item can have multiple areas, a single kind, and a single status. Similarly, it’s now clear that an item is either a Bug, a Feature, or a Question.

Expressing as labels

Now armed with the model we want, we need a way to represent it with labels. Labels are typically just strings, so we need some kind of micro-format: the way it’s set up on the Docker project and what I use throughout my tools is a straightforward field/value syntax.

Area Kind Status
areas/api kind/bug status/design-review
areas/networking kind/feature status/code-review
areas/storage kind/question  

This gives us quite a different approach to labeling: setting a label is assigning a value to a named field of a metadata type. The intended structure:

1
2
3
4
5
Metadata{
    Areas:  []Area{ API, Networking },
    Kind:   Bug,
    Status: CodeReview,    
}

… translates to the following set of labels: areas/api, areas/networking, kind/bug, status/code-review. Easy enough!

Of course, none of the services I know would technically enforce that an item can have multiple areas/ but a single kind/. The best we can do is rely on the added explicit purpose on each label, and on the pluralization of areas/ field name to signify a list over a scalar value. On the Docker project, we took the extra step of documenting the labels.

As a bonus: structured labels make it way easier to visualize data according to specific attributes. For example breaking down GitHub issues into areas, kind, and version:

Issues breakdown.

Summing up

Whether on GitHub issues, emails, todo list tasks, or unread articles, labels are often are only tool provided for classification, and getting them right is key to being able to manage large sets of items. Take the time to consider their explicit and implicit meanings, and how much is subject to interpretation. A structured approach to labeling can truly help here.

As the implementer of a service that supports labels, define the default set wisely. And if it happens that your service is targeted at a technical audience, I think it’s worth exploring how a labeling system could embrace a structured approach and offer the ability to specify a model in order to enforce constraints.

Arnaud Porterie

Arnaud Porterie
Arnaud Porterie is currently Deputy CTO at Veepee, and was previously a core maintainer of the Docker open-source project and engineering leader for the Engine team inside Docker, Inc.

An engineering leader's journey

I joined Veepee as VP Engineering after leaving a Senior Engineering Managerrole at Docker in mid 2017 with the goal of learning what it ...… Continue reading

Load, comfort, and happiness

Published on February 19, 2020

From spreadsheets to Airtable

Published on March 19, 2017