Human rights datasets are pointless without methodological rigour

There has been an ongoing debate on the desirability and feasability of using human rights databases, more concretely the Cingranelli-Richards (CIRI) human rights dataset, to analyze trends in human rights. Neve Gordon and Nitza Berkovitch’s broadside attack on the alleged bias inherent in cross-national data being used in human rights research was quickly followed by Todd Landman’s critique of this allegation. Chad Clay, one of CIRI’s dataset principal investigators, then joined in to acknowledge some inherent weaknesses in using cross-national datasets but offered a qualified defense for using them.

My contribution to this debate focuses squarely on the inherent methodological weaknesses of the CIRI dataset from the perspective of a quantitative political scientist—weaknesses that are strong enough to merit a complete rethink in the way data are coded.

At first glance, the CIRI dataset offers an impressive array of operational variables, fourteen in total. They include variables on direct human rights violations (e.g., disappearances, torture), limitations on freedoms (of the press, assembly, etc.), and the protection of rights (such as worker’s rights). The coding manual (latest version 5.20.14) provides comprehensive explanations on how data were collected and coded.