Victims of Crime Research Digest No. 3

Accessing Hard-to-Reach Populations: Respondent-Driven Sampling

By Sidikat Fashola, Research Assisant with the Research and Statistics Division of the Department of Justice Canada

According to Lauritsen and Archakova (2008), one of the primary challenges of empirical research concerning victims of crime relates to collecting data that is representative of a known population of victims. Recruiting victims of crime to participate in research studies is difficult for a number of ethical and methodological reasons. The General Social Survey on Criminal Victimization, currently undertaken by Statistics Canada every five years, has the most representative sample of victims of crime in Canada. It is limited to prevalence, types, basic impacts of victimization, and risk factors. Researchers seeking to better understand victims’ experiences with the criminal justice system and with programs and services or to better understand victimization must look elsewhere for a morenuanced picture of victimization in Canada.

This article focuses on one particularmethod for recruiting hard-to-reach populations. It is important to study hard-to-reachpopulations for the purposes of informing the evidence-based policy developmentprocess. Having no empirical evidence about the hard-to-reach population(s) ofinterest may limit the effectiveness of policy. Hard-to-reach populationsinclude groups for which no exhaustive list of the population members isavailable; they may be widely distributed, and there may be no specificknowledge about them. There may also be strong privacy concerns; the group maybe engaged in illicit behaviour (for example, HIV-infected drug users, sex-tradeworkers) or may not trust the dominant culture (for example, undocumentedmigrant workers). In particular, this article discusses how respondent-driven sampling(RDS) was recently employed in a research project undertaken by the Researchand Statistics Division of the Department of Justice Canada on the communityimpact of hate crimes. A description of RDS, its advantages and disadvantages, andthe reasons it was chosen for this research project are provided.

Respondent-Driven Sampling

RDS uses a modified snowball sampling method to recruit research participants (Heckathorn 1997, 2002). Snowball sampling, also known as a reputational sampling, relies upon personal contacts of the people interviewed to gather information about other prospective respondents (Trochim 2006). In snowball sampling, initial samples cannot be drawn at random. As a result, snowball samples may be biased because they tend to attract more co-operative participants who volunteer to be part of the study, while less co-operative subjects are excluded. Such samples may also be biased because participants may try to protect friends by not referring them or because participants only recruit friends who share the same characteristics as them. In addition, since referrals occur through network connections, network outsiders are excluded from the sample (Heckathorn 1997, 2002).

RDS is similar to snowball sampling in that respondents recruit their peers, and researchers keep track of who recruited whom into the sample, as well as the number of people each participant reports having in their social network (Heckathorn 1997, 2002). Unlike snowball sampling, RDS requires direct recruitment of peers by their peers, recruitment quotas,[1] and a system of incentives to recruit peers, where respondents are rewarded for their participation and for their referrals (Abdul-Quader et al. 2006).

Similar to a snowball sample, an RDS sample is collected with a chain referral design. The sampling process begins with the selection of a set of people from the target population who serve as “seeds.” After participating in the study, these seeds are provided with a fixed number of recruitment coupons, which they use to recruit other people in the target population with whom they have a pre-existing relationship. Each recruitment coupon has a unique numerical code. Seeds are also requested to report their “degree.”[2] After participating in the study these new sample members are also provided with the same fixed number of coupons, which they then use to recruit others. The new recruits are also asked to report their “degree.” This sampling process continues until the desired sample size is reached (Heckathorn 1997).

The mathematical model upon which RDS analysis is based eliminates some of the biases typically associated with snowball sampling (Heckathorn 1997, 2002). The RDS mathematical model combines principles from Markov chain theory[3] and biased network theory[4] into a single data analysis framework. The RDS mathematical model suggests that if peer recruitment occurs through a sufficiently large number of recruitment waves, the representativeness of the population within the sample will stabilize and further recruitment waves will not change the sample’s representativeness by a significant amount. This process is called “reaching equilibrium” (Heckathorn 1997, 2002). The RDS model of the recruitment process mathematically weights the sample and by doing so creates a sample that is independent from the biases that may have been introduced by the non-random choice of “seeds” from which recruitment began (Heckathorn 1997, 2002). Within this framework, unbiased prevalence estimates for the population of interest can be produced and confidence intervals[5] can be constructed around these estimates (Salganik 2006).

Respondent-driven sampling can be used as a sampling method to recruit research participants from a hard-to-reach population. In addition to the capacity to recruit research participants from hard-to-reach populations, there is an RDS statistical software package available that allows researchers to analyze their data using the RDS mathematical model, called the RDS Analytical Tool (RDSAT) (RDS Incorporated 2006). RDS has been used to study a wide range of “hard-to-reach populations”including injection drug users (Heckathorn and Rosenstein 2002), HIV epidemiology (Frost et al. 2006), sex workers (Johnston et al. 2006), and jazz musicians (Heckathorn and Jeffri 2001, 2003). RDS was developed by Douglas Heckathorn in 1997 as part of an HIV-prevention research project funded by the National Institute on Drug Abuse and targeting drug injectors in various Connecticut cities (Heckathorn n.d.).

Advantages and Disadvantages

RDS’s recruitment method allows researchers to access, in a systematic way, members of typically hard-to-reach populations who may not otherwise be accessible. Because RDS is a probability sampling method, researchers are able to provide unbiased population estimates as well as measure the precision of those estimates. It also has the potential for rapid recruitment because every participant becomes a recruiter. So, for each subsequent participant, there is the potential for exponential growth in recruitment. This is particularly true when participants have large social networks and strong ties within those networks. RDS can be especially successful at rapid recruitment in dense urban environments (Abdul-Quader et al. 2006).

While the potential for rapid recruitment is one of the advantages to using RDS, there is still the possibility that recruitment may be very slow if participants are not recruiting their peers. There are a variety of reasons why rapid recruitment may be a challenge, including small network size, lack of connections among members of the target population, privacy concerns, or a high level of stigma associated with the target population. As a result, recruitment rates may be unpredictable. One solution to privacy concerns would be to provide alternative options to respondents which would allow them to complete the selected data collection method without having to make face-to-face contact with the researchers, for example telephone interviews or a self-administered online questionnaire.

Other disadvantages to using RDS relate to the difficulties that may arise when analyzing collected data. For instance, since RDS must take into account weighting for network size and recruitment patterns, the statistical strength of the sample as it applies to the target population decreases if participants only recruit people who share the same characteristics as themselves. In addition, the RDSAT only provides basic statistical estimates, such as estimates of population proportions, and cannot handle more complicated statistics, such as the sample size required, design effects, and statistical significance between groups. Moreover, researchers using RDS often ignore the fact that their data was collected with a complex sample design and construct confidence intervals as though they had a random sample. This is called the “naïve method” (Salganik 2006, 100).

To estimate the sample size required, Salganik (2006) proposes selecting a sample size for RDS that is twice as large as the sample size that would be needed under simple random sampling.  He also proposes that a bootstrap sampling method be employed to overcome the “naïve method” built into RDS. Bootstrapping is a sampling method whereby the data collected in an initial sample is randomly resampled over and over again. You can take any number of resamples and compute statistics for each resample. The average statistical values from all of the resamples are used to evaluate the statistical accuracy of the original sample’s statistics (Howell 2002). While bootstrapping may not be exact, as confidence intervals are constructed for an imaginary population, the bootstrap method is still argued to be better than the naïve method because resamples are created in a randomized way.

Researching the Community Impact of Hate Crimes

The main purpose of this research was to understand the impact of hate crimes on different communities—geographic as well as ethnic/racial or “identity” communities. The research design involved two case studies where an allegedly hate-motivated crime had occurred.

The first case study was a violent attack on a Sudanese refugee by a group of about 10 men at Victoria Park in Kitchener, Ontario, in 2006. The second case study involved the assault by two men of a Chinese-Canadian male who was fishing near the Mossington Bridge on the Black River in Sutton, Georgina by two men. This particular incident was one of a series of attacks against Asian-Canadian anglers on Lake Simcoe in Georgina.

Data were collected at the two sites where the incidents took place, specifically in Kitchener –Waterloo and the Greater Toronto Area (GTA). At each site, two main communities were selected for data gathering. At the first site, data was collected from the “African identity community” of Kitchener-Waterloo (individuals from the racial/ethnic community of the victim) and the “Kitchener geographic community” (individuals living in the Kitchener, Ontario, region). At the second site, data was collected from the “Chinese identity community” of the GTA (individuals from the racial/ethnic community of the victim) and the “Georgina geographic community” (individuals living in the Georgina, Ontario, region).

A survey was administered to the geographic and identity communities. After describing the allegedly hate-motivated incident, the survey asked a number of questions about the “impact of the event” (Marren 2005) on the community. RDS was chosen as a sampling method for this study because the racial/ethnic identity communities were communities for which no exhaustive list of all their members was available for the purpose of simple random sampling. The RDSAT was employed for data analysis because it allows researchers to make population prevalence estimates with confidence intervals. Using only a sample of individuals from each identity community, it was possible to draw conclusions about the identity community populations with a greater degree of statistical reliability than that which would be possible without the bootstrap method. Stratified random sampling was used to generate a statistically reliable sample for the geographic communities.

Using RDS recruitment (i.e., a chain referral design), the study began with five seeds in each of the two identity communities – the African identity community and the Chinese identity community. Each seed was given four coupons with which to recruit four new participants from his/her social network (degree). The four referred participants received four coupons each and were expected to refer four more participants with the coupons. There were a total of five recruitment waves in the sample. As a reward for their participation, respondents were given the option to enter into a raffle draw for a prize.

Generally, a 95% confidence level is considered statistically reliable. In most cases, this would entail collecting a sample of about 400 for each identity community. Using RDS to collect a sample of 400 for each identity community proved to be a challenge because many participants were unwilling to provide contact information for their friends. There also appeared to be little motivation for them to contact their friends on behalf of the researchers. In total, there were 196 survey respondents from the African identity community in Kitchener and 288 survey respondents from the Chinese identity community of the Greater Toronto Area.


RDS is a sampling method utilized in instances where researchers are attempting to study hard-to-reach populations. RDS combines “snowball sampling” with a mathematical model that weights the sample in such a way that eliminates some of the biases that may have been introduced into the sample by the non-random choice of initial recruits. Less biased prevalence estimates can then be produced and confidence intervals can be constructed around those estimates.

RDS is a relatively new sampling method, and as its use increases, so will researchers’ familiarity with its possibilities and limitations. In this study, RDS did not eliminate some of the challenges in recruiting victims of crime; there remained issues where potential participants did not trust the researchers or just did not want to talk about the incident. As such, recruitment was still difficult. Using RDS, however, contributed to increasing the statistical reliability of the sample compared to using a snowball sample. It also made possible drawing conclusions from the sample about the impact of hate crime on communities of identity with a greater degree of statistical reliability. This, in and of itself, suggests that RDS merits further attention as a sampling method when trying to reach hard-to-reach populations.v