Violence data: what practitioners need to know

The demand for numerical data on human rights (and human wrongs) has never been higher. Today human rights workers (i.e., aid workers, human rights activists, policy-makers and other practitioners) use numbers to show patterns of abuses, argue for funding, set priorities and figure out what works in service provision and advocacy. The United Nations Security Council has specifically mandated the collection of data on sexual violence during wartime. Advocacy organizations have increasingly deployed numerical data to make the case for attention and funding to particular emergencies. However, while data-driven approaches to human rights are laudable in many contexts, not all data are created equal. Some data illuminate; others just mislead.

The biggest problem with violence data is its uncertain relationship to true patterns of violence. For example, the reported murder rate declined pretty dramatically in my West Philadelphia neighborhood over the last ten years. This might mean that the true murder rate has fallen. Or, it might mean that people have stopped reporting crimes to the police, or that authorities are cooking the books. Without detailed knowledge about how crime reporting in my neighborhood has changed, it’s impossible to say which of these scenarios is accurate (or whether it’s a mix of all three, or something else entirely). But we often use numerical data in situations where we don’t have detailed knowledge of the context—what can be done?

Human rights workers can start by examining what kind of data they have access to. There are, roughly speaking, three kinds of data: statistical inferences, expert guesses and lists. Statistical inferences come from either systematically sampled survey data or multiple systems estimation. With a well-designed survey, the relationship between the population (for example, all households in my neighborhood) and the sample (for example, 100 randomly selected households in my neighborhood in 2006; another 100 randomly selected households in 2016) is known. If we assume that households are equally available to answer a survey, and are likely to give correct answers about whether someone in the household was murdered, then we can make a rigorous statistical inference (or estimate) of the true murder rate the neighborhood. Of course, these are big assumptions, ones that often aren’t met in violent contexts.

Flickr/ILRI (Some rights reserved)

In violent contexts, the big assumptions made by household survey data are often not met.

The second type of data that human rights workers may have access to is an “expert guess”, which is my term for an estimate that involves extrapolation from some existing data to a broader population by someone who knows the situation well—but doesn’t include a rigorous statistical inference. Taking the example of Philadelphia homicides again, I could ask: how many homicides were there in the city last year? I could reason like this:

There were five known homicides in my neighborhood last year, and my neighborhood has about 2% of Philly’s population. I think the level of violence here is average for the city. Five times 50 (the inverse of 2%) equals 250, so there were about 250 killings in Philadelphia last year.

This is the type of extrapolation often used to derive estimates of deaths or other casualties in war zones, where access to many areas is limited and there is little possibility of a survey sample or multiple systems estimate. Yet they rely on strong assumptions: that the violence I know about is accurately reported, that situations are similar in all the areas I’m extrapolating to, and so on.

Because the relationship between lists and true patterns of violence is unknown, these data can be dangerously misleading.A third type of data is list data, sometimes referred to as “convenience” data to distinguish from systematic samples. List data include any source of data derived directly from individual reports: media mentions, reports via crowd-sourcing apps, NGO casefiles, hospital records, police records, border crossing records, or any combination of these. These are the most common data that human rights practitioners see, and in many ways they are extraordinarily compelling. For example, unlike surveys and expert guesses, list data often include detailed data about the targets, timing, location and other details of violence. Unfortunately, because the relationship between lists and true patterns of violence is unknown, these data can be dangerously misleading. Returning to the homicide example above: in the US, we’re generally confident that most murders are reported to the police. But what if Philly were more like Colombia, where it appears that only 60% of murders were reported to any source between 2003 and 2010? Not only would the number of murders be drastically wrong, but the patterns reported could be incorrect also. Human rights practitioners might carefully comb the data, establishing answers about the who, what, when, where and why of violence that are dead wrong.

Understanding what kind of data you have access to is obviously vital for understanding whether you should trust it. However, having a less-than-ideal type of data shouldn’t (necessarily) mean discarding or disregarding it. Instead, human rights workers should ask themselves a series of questions about how and whether data represent underlying reality. Many human rights workers are aware of the ways that political bias can alter violence statistics. However, logistical issues can cause flaws that are just as serious. Thus, human rights workers who need to interpret numerical data should always ask:

Who collected these data? Which groups of victims trust them? Might data collectors’ political or other affiliations affect the violence they report, either intentionally or unintentionally?
What victims are most and least likely to report to this source? Consider political affiliations, cultural taboos, language barriers, geography, age, gender, and networks of (mis)trust.
What types of violence are likely to be reported in this source? Are some victim populations overlooked, such as those who have experienced torture, sexual violence, or property crimes?
When were these data collected? Has the data collection organization’s mission, location, capabilities or access changed over time? Has the population suffering violence moved or changed over time? If so, what would the likely effect on the data be?
Where were these data collected? Did access issues impede data collection? Is the source in a capital city or in a difficult-to-access location? Think about everyday issues like weather, road quality, and staffing as well as more obvious access problems like heavy fighting.

Asking questions like these may not lead to concrete answers. However, human rights workers who have thought through these issues can offer clear, concrete statements about the assumptions that underlie their conclusions. In the long run, carefully considering data quality issues can lead to practical improvements in data collection and interpretation. More importantly, asking questions about data quality takes the magic out of numerical data. It reminds us that, no matter how scientific the information may initially appear, data come from specific, often very messy, processes. If the goal is an accurate understanding of patterns of violence, then no data can be taken at face value.