Can snowball sampling estimate human trafficking?

The news is full of widely (and wildly) varying estimates of the scale and scope of human trafficking. This is partly because vulnerable individuals are often hidden and are therefore hard to count. You can’t simply do a telephone survey of bonded workers or sex trafficking victims who are held against their will. Given these challenges, there has been a proliferation of techniques for estimating the prevalence of human trafficking, though so far none has clearly proven to be superior to others. In pursuit of data, academics and NGOs such as Walk Free often aggregate estimates from news reports, government administrative statistics, and public perception surveys (including asking about missing persons), among other sources. But they rarely pound the pavement to come up with their own broadly comparable figures.

As a potential fix, some scholars and government agencies seeking to investigate hidden groups have increasingly favored the sampling technique known as ‘snowball sampling’ or, more technically, ‘Respondent-Driven Sampling’ (RDS). In RDS, known trafficking victims are contacted to identify new victims, who are then asked to identify yet more victims, and so on, causing the number of data points to ‘snowball’. Variants of this technique exist under names such as key informant sampling, targeted sampling, and chain-referral sampling. In this article, we discuss why RDS is popular, how it works, and its limitations. We then inquire about the prospects for effectively adapting this technique to human trafficking. Unfortunately, the answer is: not good.

The problem

Researchers from fields as diverse as criminology, economics, political economy and investigative journalism need methods to study clandestine populations and activities, as noted in a review by Peter Andreas. Traditional sampling methods are often inadequate for such groups because “the size and boundaries of the population are unknown,” as Douglas Heckathorn notes, and because of ethical and privacy concerns.

RDS has been used to study vulnerable and hard-to-reach populations as diverse as HIV patients, drug users, street-based individuals and sex workers (see Salganik and Heckathorn 2004). This sampling method taps the social networks of an initial ‘convenience sample’ to draw additional subjects into the study. This new, larger sample is then used to make inferences about the characteristics of a larger population. Much like public opinion polling, the underlying assumption is that this sample population is reasonably representative of the overall target population. Unlike public opinion polling, however, RDS depends on being able to find good samples beyond the original sample.

RDS would seem a natural fit to study human trafficking since human traffickers and trafficking victims are difficult to track, measure and study. It is no wonder, then, that several human trafficking researchers and organizations have gravitated toward RDS. Indeed, the U.S. National Institute of Justice (NIJ) even latched on to it in their recent request for proposals for studies of trafficking prevalence within the U.S.

The limitations of RDS

The problem is that with an RDS hammer, everything looks like a (hidden population) nail. It turns out that productively applying RDS to human trafficking is wishful thinking for a number of reasons.

First, even existing RDS studies on ‘traditional’ hidden populations have limitations. While it may be possible to obtain accurate initial convenience samples for some contexts or topics of study, this is not always the case. Also, members of the initial sample group may not have the capacity to identify additional survey subjects in the field. It is therefore no surprise that Sharad Goel and Matthew J. Salganik’s 2010 study of the methodological validity of RDS finds the technique “substantially less accurate than generally acknowledged and that reported RDS confidence intervals are misleadingly narrow.” This warning suggests that RDS applications may be problematic even in the domain of public health surveillance, where it was originally developed.

Second, there are several challenges to applying RDS to human trafficking. A key constraint is that human trafficking victims are not like existing populations where RDS has been applied. Brunovskis, Anette and Surtees point out that since trafficking involves control over victims, it can be hard to identify victims and they may not be free to identify other victims. Simic et al. 2006 attempted RDS in the context of human trafficking and labor abuses by sampling sex workers in Serbia, Montenegro, and Russia. This and other studies describe the various practical challenges to implementing RDS among trafficking victims, including:

• a necessary focus on more socially-connected respondents, which may cause relatively ‘less’ exploited victims to be oversampled.

• physical and geographical limitations

• lack of diversity of initial ‘seed’ sampling groups

• lack of overlapping social networks

• the need for long-running surveys to obtain sufficient samples

There are also ethical concerns as RDS could potentially expose trafficking victims and put individuals at risk—the very people researchers are hoping to protect.

Third, RDS is resource intensive. While it could be applied to studying small areas, most researchers are interested in studying human trafficking over vast geographies, such as entire cities, states and countries. Given the high costs to conducting RDS, scaling up sufficiently to estimate trafficking prevalence across vast areas is unlikely to be feasible.

The bottom line

In sum, despite the enthusiasm for RDS, the literature indicates it is unlikely to be efficient, practical or cost-effective for large-scale, accurate estimates of human trafficking prevalence. The RDS approach could be useful in very particular human trafficking situations, but it should be used sparingly, if at all. Instead, researchers would do well to consider less invasive and less costly statistical techniques that may or may not sacrifice accuracy compared to RDS, depending on the geographic area of study. Another option is safer, more microscopic ethnographic techniques, even if they may sacrifice generalizability.

The bottom line is that the methodological requirements of RDS simply don’t fit well with the realities of human trafficking. We should certainly continue to study human trafficking and continue working to estimate its prevalence, but it will have to be with methods and approaches that are better suited to the unique constraints imposed by the issue. Moreover, given the limitations of RDS and all other current methods for sampling hidden populations, it is crucial that we admit how little we really know about the prevalence of trafficking in the world today.

Can snowball sampling estimate human trafficking?

The problem

The limitations of RDS

The bottom line

Ashley Greve

Oliver Kaplan

More from Ashley Greve

Can snowball sampling estimate human trafficking?