Quantitative data in human rights: what do the numbers really mean?

Everybody loves rankings. We love comparative information that stacks up organizations or governments relative to others. But are human rights numbers accurate? Can we actually count things like “victims of torture”?

The simple answer is no. The number of violations of rights is fundamentally unobservable. We will never know, for example, how many times during a year the US government violated the Convention against Torture. Recent discussions of this issue include the number of police killings in the United States, the number of people killed in the Syrian civil war, and the amount of sexual violence in conflicts.

For decades now, political scientists, sociologists, economists and other researchers have been creating and studying datasets to better understand what makes governments increase their respect for human rights. Do international treaties make a difference? Are international tribunals helpful? Do written constitutions or democratic elections improve respect for rights?

But if we cannot count abuses, then what justification exists for this research? My current work seeks to strengthen what is often called “quantitative human rights research” by first criticizing conventional practice, and then offering a path forward.

Quantitative researchers have been sweeping a pervasive “dirty little secret” under the rug for decades: the written reports we use to generate cross-national datasets of governments' (lack of) respect for human rights are a record of alleged violations, not a census of actual violations. They represent a small fraction of governments' violations of rights. Yet using these data as if they were actual measures of performance has become standard operating procedure.

All records of violations are what social scientists refer to as a “biased undercount” of the true violations.To appreciate the scope of this problem, consider trying to collect a census of each and every violation by a certain government of the International Covenant on Civil and Political Rights (ICPPR). How would you go about it? Perhaps you’ve immediately concluded that it is not humanly feasible to do this, and you’d be right. As a consequence, all records of violations are what social scientists refer to as a “biased undercount” of the true violations.

In 2007, Courtenay Conrad and I conceived the Ill Treatment & Torture data collection project (ITT). It had two main objectives: 1) Produce data about the state agencies, victims, and types of alleged abuse; and 2) Serve as a demonstration project that addresses the problem of treating allegations as census data.

The ITT project coded all public Amnesty International (AI) documents from 1995-2005 and produced a number of datasets that codify various aspects of AI's public allegations of violations of the Convention Against Torture (CAT). With the exception that we coded considerable detail about specific allegations, our approach was essentially equivalent to existing practices that use advocacy reports as an information source.

However, rather than treat the data as a measure of governments’ (lack of) respect for the CAT, our data measure precisely the concept we are trying to measure: AI's allegations of CAT violations. As previous efforts use advocacy allegations as a measure of performance—fully aware that they are not that—our measure has significantly higher validity.

Justin Norman/Flickr (Some rights reserved)

In Virginia, United States, activists march outside a CIA entrance in protest of the government's use of torture. Data sets such as those used in Amnesty's reports on torture, despite their presentation, usually represent only a fraction of actual violations.

Of course, adopting our conceptual shift creates a problem: measuring advocacy allegations is not as interesting or useful for measuring government violations. Our solution is to statistically model not only government (lack of) respect for rights—which researchers have done for decades—but also the process by which advocacy groups identify violations and make allegations public.

If we model both the process via which a group like AI produces public allegations and government violations of the CAT in what is known as a mixture model, we can get estimates for testing hypotheses about government (lack of) respect for the CAT, despite having only data on public allegations. At first blush this may sound a bit magical—how can one estimate the impact of an independent variable upon another variable for which we have no observable data? We provide a brief sketch of the intuition below, and further detail can be found in our working paper.

While it is impossible to observe the actual number of violations, we are able to observe public allegations made by organizations like AI. Further, and critically, we know quite a bit how groups like AI operate. First, they would rather not report an allegation than have to issue a correction later. As such, we can assume that their public allegations contain very few “false” allegations. Second, we know that they are advocacy organizations that take a variety of organizational factors into account, including whether their supporters and volunteers are likely to be responsive to a call for action, whether news media and other governments are likely to echo the allegation and press for change, and whether the government they are calling out is likely to respond to pressure. It is possible to leverage such knowledge about the allegation production process to then collect data about the extent to which those different pressures are likely to be in play, and then estimate the extent to which public allegations will be an undercount of true allegations.

More concretely, in our research we have built statistical models that can produce estimates of the percentage of violations that occurred but went unreported by organizations like AI. As in all statistical estimates the result is a range, not a precise number, and it is an estimate, not an actual count. For some purposes an estimate is not good enough. But for researchers who want to study the questions described above, estimates are perfectly serviceable, and in our working paper we provide one such example.

The Human Rights Data Analysis Group has pioneered the use of statistical estimates in human rights trials and other advocacy settings, and while much of this work might seem esoteric, there are two key takeaways. First, because we cannot observe, and thus accurately count, all of the violations of rights, we must think carefully about advocacy groups that make claims about changes in the precise levels of violations, or otherwise make comparative claims about changes. Doing so requires considerable statistical expertise that is not yet widely distributed. Second, while we cannot observe and count all violations of rights, quantitative modelers are developing tools that permit them to properly use data from human rights advocacy groups.

Everyone might love numbers and rankings, but how we use these numbers is paramount. And in human rights work, we must acknowledge what we are actually counting and the limitations of that data, rather than continue to knowingly misrepresent the facts.

- - -

The article is based on a podcast interview on "How do we count victims of torture?", produced by the Rights Track.