Sampling and Generalization1 In: The SAGE Handbook of Qualitative Data Collection By: Margrit Schreier Edited by: Uwe Flick Pub. Date: 2018 Access Date: October 12, 2021 Publishing Company: SAGE Publications Ltd City: London Print ISBN: 9781473952133 Online ISBN: 9781526416070 DOI: https://dx.doi.org/10.4135/9781526416070 Print pages: 84-97 © 2018 SAGE Publications Ltd All Rights Reserved. This PDF has been generated from SAGE Research Methods. Please note that the pagination of the online version will vary from the pagination of the print book. SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Sampling and Generalization1 Margrit Schreier Introduction In their textbook about empirical research methodology in the social sciences, 6 and Bellamy write: ‘Making warranted inferences is the whole point and the only point of doing social research’ (2012, p. 14). Empirical research, in other words, does not limit itself to describing those instances included in a given study. It wants to go beyond those instances and arrive at conclusions of broader relevance. This is true of qualitative just as much as of quantitative research (on the extent and type of generalizations in qualitative research see Onwuegbuzie and Leech, 2010).2 When Lynd and Lynd (1929), for example, set about studying the community they called ‘Middletown’ in the early twentieth century, their goal was not to provide an in-depth description of this one community. Instead, they wanted to draw conclusions about contemporary life in the US in a Midwestern community on the threshold of industrialization in more general terms. And they selected the community in question very carefully to make sure that the community was indeed typical of Midwestern communities at that time. They looked at climate, population size, growth rate, presence and number of industries, presence of local artistic life, any local problems, and chose ‘Middletown’ on all of those grounds (see Gobo, Chapter 5, this volume). They were thus very much aware that the conclusions we can draw, that is, the kinds of generalizations we can make, are closely connected to the instances we study, that is, our sample. The instances, in 6 and Bellamy's (2012) terms, act as the ‘warrants’ for our conclusions. Qualitative research, with its holistic and in-depth approach, typically limits itself to a few instances or units only, ranging from the single case study (as in the study of ‘Middletown') to a sample size of around 20 to 40 (although sample sizes can, in rare cases, also be considerably larger). These units or instances can be very diverse in nature: not only people, but documents, events, interactions, behaviours, etc. can all be sampled. The numbers in qualitative are much smaller than sample sizes in quantitative research. If qualitative research wants to arrive at conclusions that go beyond the instances studied, but can only include comparatively few units, one would expect qualitative researchers to reflect all the more carefully about selection and generalization. But this is not the case. The topic of sampling has long been neglected (Higginbottom, 2004; Onwuegbuzie and Leech, 2007; Robinson, 2014), although there has been an increased interest in the topic in recent years (e.g. the monograph on the topic by Emmel, 2013, and the increasing attention to sampling in textbooks). What it means to generalize in qualitative research, what kinds of conclusions can be drawn based on the units we have studied, is a topic that is only occasionally touched upon (e.g. Gobo, 2008; Lincoln and Guba, 1979; Maxwell and Chmiel, 2014; Stake, 1978). Page 2 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. In the following, I will start out with some methodological considerations, focusing first on concepts of generalizing, next on sampling and criteria and considerations underlying sampling in qualitative research. This includes the question of sample size and the use of saturation as a criterion for deciding when to stop sampling. In the next section, selected sampling strategies in qualitative research will be described in more detail, followed by considerations of how generalization and selection strategies are related in some selected qualitative research traditions. The chapter closes with describing some recent developments in qualitative sampling methodology. Throughout the chapter, the terms ‘sampling', ‘selecting units', and ‘selecting instances’ will be used interchangeably. Generalizing in qualitative research: Methodological considerations Generalizing in Quantitative Social Science In quantitative research, the concept of generalization is closely linked to that of external validity, specifically to the concept of population generalization, namely the extent to which we can generalize from the sample to the population. This type of generalization has also been termed empirical generalization (Lewis et al., 2014; Maxwell and Chmiel, 2014) or numerical generalization (Flick, 2004). In quantitative research, empirical generalization is typically realized as statistical or probabilistic generalization: It is possible to generalize from the sample to the population to the extent that the sample is indeed representative of the population, and statistics is used to provide the level of confidence or conversely the margin of error that underlies this estimate of representativeness (Williams, 2002). Statistical generalization in this sense has become the default understanding of generalization in the social sciences. But empirical generalization does not equal statistical generalization. Statistics is a tool that in quantitative research is used to warrant the conclusion from the sample to the population. But there may be other ways of justifying this conclusion. It is also worth keeping in mind another characteristic of both statistical and empirical generalization: They are essentially context-free. The conclusion from the sample to the population applies, regardless of the specific context and the specific circumstances (Williams, 2002). When quantitative researchers argue that qualitative research does not allow for generalization, this criticism is typically based on an understanding of generalizability in the sense of statistical generalizability. Indeed, samples in qualitative research are mostly not representative of a population, and using statistics as a warrant underlying the conclusion from sample to population is then not an option (although Onwuegbuzie and Leech, 2010, report that 36 per cent of the qualitative studies they examined used statistical generalization). Some qualitative methodologists have been just as sceptical of achieving generalizability in qualitative research. This is expressed by the famous dictum of Lincoln and Guba (1979, p. 110): ‘The only generalization is that there is no generalization.’ For quantitative methodologists the supposed inability of qualitative research to arrive at empirical generalizations constitutes a criticism of qualitative research. Lincoln and Guba, on the other hand, as well as Denzin (1983), reject the notion of statistical and empirical generalizability precisely Page 3 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. because they do not take context into account – because they are, one might say, too general: It is virtually impossible to imagine any kind of human behaviour that is not heavily mediated by the context in which it occurs. One can easily conclude that generalizations that are intended to be context-free will have little that is useful to say about human behaviour. (Guba and Lincoln, 1981, p. 62) Reconceptualizing Generalization in Qualitative Research Several suggestions have been made for reconceptualizing generalization so as to make it more compatible with the principles underlying qualitative research. These suggestions fall into three groups or types: modifying the notion of empirical generalization, transferability as an alternative conceptualization, and theoretical generalization as an alternative conceptualization (Lewis et al., 2014; Maxwell and Chmiel, 2014; Polit and Beck, 2010; for a more complex and extensive classification see Gobo, 2008; and Gobo, Chapter 5, this volume). Modifying the notion of empirical generalization entails ‘lowering the threshold’ for what can be called a generalization. This applies, for example, to the notion of so-called moderatum generalizations proposed by Williams (2002) for interpretive social research. They are based on the assumption of ‘cultural consistency', of a constant structural element in the area under research, and they involve an inductive inference from the particular instance(s) studied to this underlying structure. Moderatum generalizations constitute a ‘weaker version’ of the type of empirical generalization used elsewhere in the social and the natural sciences. Another reconceptualization of generalization in qualitative research focuses on the concept of transferability (Maxwell and Chmiel, 2014; Schofield, 1990). This notion takes the highly contextualized nature of qualitative research as its starting point, that is, the very characteristic which, from the perspective of the quantitative social sciences, stands in the way of empirical generalization in qualitative research. With transferability, the core concern is not to generalize to an abstract and decontextualized population, but to determine whether the findings obtained for one instance or set of instances in one specific context also apply to other instances in a different context. The extent to which the findings can be transferred from one case to another depends on the similarity between the respective contexts. Assessing the similarity of a ‘source’ and a ‘target’ context in turn requires detailed information about the context in which the study was conducted. Lincoln and Guba (1979) speak of the degree of fittingness between the two contexts and refer to the need for thick description, according to Geertz (1973), of the context in which the first study was carried out. It is noteworthy that the notion of transferability entails, so to speak, a division of tasks between the authors and the readers of a study. It is the responsibility of the authors to provide a sufficiently ‘thick description', but only the reader can assess the degree of fittingness between the context of the study and any other context to which the findings may or may not apply. The idea of transferability underlies several reconceptualizations of generalization that have been proposed in the literature, such as the concept of naturalistic generalization developed by Stake (1978) or the notion of generalization as a working hypothesis suggested by Cronbach (1975). Page 4 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Both the notions of moderatum generalization and transferability are based on considerations of the relationship between a sample and a population (or the relationship between a sample and another sample), that is, what Yin (2014, pp. 57–62) calls a sampling logic. The third alternative conceptualization of generalization, that of theoretical generalization (Schwandt, 2001, pp. 2–3; also called analytic generalization), moves away from the idea of population and sample and is based on what Yin terms a replication logic (2014, pp. 57–62). With theoretical generalization, the purpose of the research is not to generalize to a population or to other instances, but to build a theory or to identify a causal mechanism. Instances are selected either so as to be similar to each other (literal replication) or different from each other in one key aspect (theoretical replication), the same way studies build upon each other, leading to a differentiation of theory. This notion of theoretical generalization is – in somewhat different versions – used in case study research, in grounded theory methodology, and in analytic induction. External and Internal Generalization As mentioned in the previous section, the concept of generalizability has, in the quantitative research tradition, been discussed as one aspect of external validity. This suggests a focus on the relationship between the sample and the population, on how representative the instances included in the sample are of the population. But in qualitative research, where typically few instances are examined in detail, another relationship gains in importance, namely the relationship between our observations and the case in its entirety: how well is the variability within a given instance represented in our observations? Hammersley and Atkinson (1995) point to the importance of adequately representing contexts as well as points in time or a time period. They refer to this as within-case sampling. Along similar lines, Onwuegbuzie and Leech (2005) refer to the ‘truth space’ of an interviewee and whether the data obtained in any given interview adequately represent that truth space (see also the concept of internal validity in Maxwell and Chmiel, 2014). The considerations underlying internal and external generalization are similar in structural terms. Generalizing within an instance is subject to the same restrictions and considerations as generalizing beyond that instance: in both situations, we have to ask ourselves what kind of generalization is appropriate – moderatum empirical, transferability, or theoretical generalization – and select our sampling strategy accordingly. Sampling in qualitative research: Methodological considerations Sampling Strategies: An Overview In the methodological literature, three types of sampling strategies are distinguished: random, convenience, and purposive sampling. In the following, key characteristics and underlying concepts of the three groups of sampling strategies are briefly described. Random sampling Page 5 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Random sampling is typically used in quantitative research, especially in survey-type research, in order to support empirical generalization, that is, generalizing from a sample to a population (on random sampling, see Daniel, 2012, chapter 3). This is possible to the extent that the sample is indeed representative of the population. The importance of random sampling in quantitative research derives from its role in generating such a representative sample. Based on a sampling frame, that is, a list of all members of the population, the sample is chosen such that every member of the population has an equal chance (above zero) of being included in the sample, and the members of the sample are selected using a truly random procedure (e.g. a random number generator). If these steps are followed and if the population and the sample are sufficiently large, the procedure of random sampling results in a sample that is (sufficiently) representative. The margin of error in generalizing from the sample to the population can be specified using confidence intervals and inferential statistics. Various subtypes of random sampling have been developed beside the process of simple random sampling described above, such as systematic, cluster, or stratified random sampling. But a random sample is not necessarily a representative sample (Gobo, 2004). In the first place, random sampling will result in a representative sample only if the above conditions are met. Also, representativeness of a sample with respect to a population constitutes a goal, whereas random sampling is a procedure, a means towards that goal. And random sampling is not the only way of obtaining a representative sample. Other strategies include, for instance, selecting typical cases or even – depending on the population – selecting any case at all (see the section on ‘phenomenology’ below). Purposive sampling The term purposive sampling (also called purposeful sampling) refers to a group of sampling strategies typically used in qualitative research. The key idea underlying purposive sampling is to select instances that are information rich with a view to answering the research question (for an overview see Emmel, 2013; Flick, 2014, chapter 13; Mason, 2002, chapter 7; Patton, 2015, module 30; Ritchie et al., 2014). A large variety of purposeful sampling strategies has been described in the literature, including, for example, homogeneous sampling, heterogeneous sampling, maximum variation sampling, theoretical sampling, sampling according to a qualitative sampling guide, snowball sampling, sampling typical, extreme, intense, critical, or outlier cases, and many others (Patton, 2015, module 30; Teddlie and Yu, 2007). The precise meaning of ‘information rich', however, and therefore the selection of a specific strategy, depends on the research question and on the goal of the study (Marshall, 1996; Palinkas et al., 2015). Describing a phenomenon in all its variations, for example, requires a type of maximum variation sampling (Higginbottom, 2004; Merkens, 2004). If the goal is to generate a theory, theoretical sampling in the tradition of grounded theory methodology will often be the strategy of choice (see the section on ‘theoretical sampling’ below). If a theory is to be tested, selecting an atypical or critical case would be a useful strategy (Mitchell, 1983). And transferability requires a detailed description of specific types of cases, for example, a typical or common case or an intense case (Yin, 2013, pp. 51–5). Several criteria have been used to distinguish between different types of purposive sampling strategies and Page 6 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. different ways of conducting purposive sampling. A first criterion concerns the point in time when a decision concerning the sample composition is made (Flick, 2014, pp. 168–9; Merkens, 2004): this can either be specified in advance, in analogy to the sampling procedure in quantitative research (examples would be stratified purposive sampling or selecting specific types of cases). Or else the composition of the sample can emerge over the course of the study, as in sequential sampling, snowball sampling, theoretical sampling, or analytic induction. This latter emergent procedure is generally considered to be more appropriate to the iterative, emergent nature of qualitative research (Palinkas et al., 2015). In actual research, advance decisions about sample composition are often combined with modifications as they emerge during the research process. A second criterion relates to the relationship between the units in the sample (Boehnke et al., 2010; Palinkas et al., 2015; Robinson, 2014), distinguishing between homogeneous samples (as in criterion sampling, or in selecting typical cases) and heterogeneous samples (e.g. maximum variation sampling, or theoretical sampling). A third criterion for distinguishing between purposeful sampling strategies relates to the underlying goal (Onwuegbuzie and Leech, 2007; Patton, 2015, chapter 5), such as selecting specific types of cases (typical, intense, extreme, etc.), selecting with a view to representativeness or to contrast. Convenience sampling Convenience sampling (also called ad hoc sampling, opportunistic sampling) constitutes the third type of sampling strategy. Here cases are selected based on availability. Asking one's fellow students, for example, to participate in a study for this semester's research project, would constitute a case of convenience sampling. This sampling strategy has a ‘bad reputation’ with both quantitative and qualitative researchers: from the perspective of quantitative research, it fails to produce a representative sample (Daniel, 2012, chapter 3); from the perspective of qualitative research, it has been criticized for insufficiently taking the goal of the study and the criterion of information richness into account. Depending upon the goal of the research and the population under study, ‘any case’ can, however, be perfectly suitable (Gobo, 2008; see the section on ‘phenomenology’ below). Is My Sample Large Enough? The role of sample size in qualitative research The question of sample size in qualitative research is discussed very controversially. Some authors argue that, other than in quantitative research where sample size in relation to the population is crucial for statistical generalization, sample size is irrelevant in qualitative research or at best of secondary concern. According to this position, selecting information-rich instances that are relevant to the research question and sample composition are considered more important than sample size (e.g. Crouch and McKenzie, 2006; Patton, 2015, chapter 5). Others argue that sample size plays a role in qualitative research as well (e.g. Onwuegbuzie and Leech, 2005; Sandelowski, 1995). Page 7 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. In purely practical terms, researchers are often required to specify an approximate sample size, for example, when submitting grant applications or presenting a PhD proposal. Not surprisingly, methodologists also differ when it comes to making recommendations for sample size in such cases. Some authors do make recommendations (for overviews, see Guest et al., 2006; Guetterman, 2015; Mason, 2010). Others argue that deciding on sample size before engaging in data collection contradicts the emergent nature of qualitative research, and call for a sampling process that is constantly adjusted as the research unfolds (e.g. Mason, 2010; Palinkas et al., 2015; Robinson, 2014; Trotter, 2012). A middle way between these two extremes is the suggestion to work with an advance specification of minimal sample size, which is then adjusted during the research process (Francis et al., 2010; Patton, 2015, module 40). Positions concerning recommendations for sample size in qualitative research thus range from specific numbers to ‘it depends'. Key factors to take into consideration when deciding on a suitable sample size include the extent of variation in the phenomenon under study (Bryman, 2016; Charmaz, 2014; Francis et al., 2010; Palinkas et al., 2015; Robinson, 2014), the research goal (Marshall, 1996; Patton, 2015, module 40), the scope of the theory or conclusions (Charmaz, 2014; Morse, 2000). The overall recommendation is that sample size should increase with the heterogeneity of the phenomenon and the breadth and generality of the conclusions aimed for. Depending on the research goal, however, a single instance may be perfectly sufficient (e.g. Patton, 2015, module 40; Yin, 2014, pp. 51–6). Some authors also draw attention to external constraints, such as the time and the resources available for the study or the requirements by external agencies such as review boards (Flick, 2014; Patton, 2015, module 40). Another factor concerns the research tradition in which the study is carried out (Guest et al., 2006; see the section on ‘sampling in different traditions’ below). The advance specification of sample size runs the danger of oversampling, that is, including more instances than necessary (Francis et al., 2010). In his analysis of sample size of the 51 most frequently cited qualitative studies in five research traditions, Guetterman (2015) found a surprisingly high average sample size of 87 participants. Mason, in examining sample size in qualitative dissertations, found sample sizes of 20–30 participants to be most frequent, with a surprising number of sample sizes constituting multiples of 10. Guetterman concludes from this that such round numbers are most likely the result of an advance specification of sample size – which may well be higher than needed. Oversampling carries the methodological danger of allowing for only an insufficient analysis of each individual case (Guetterman, 2015), and it carries the ethical danger of unnecessarily drawing upon the resources of participants (Francis et al., 2010). Sampling in qualitative research should therefore include as many cases as are needed with a view to the research question, but it should not model itself on quantitative standards of ‘the more, the better’ or a preference for round numbers. The criterion of saturation But how exactly do we know that we have included as many cases as needed? The criterion that is most often used in qualitative research to conclude the sampling process is the criterion of saturation. Saturation was initially developed in the context of grounded theory methodology, and it specifies to stop sampling when including more cases does not contribute any new information about the concepts that have been developed Page 8 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. and about their dimensions (Schwandt, 2001, p. 111). Today, however, the concept of saturation is often used in the more general sense of thematic saturation (Bowen, 2008; Guest et al., 2006). But it often remains unclear what the exact criteria are for determining when saturation has been reached (Bowen, 2008; Francis et al., 2010; O'Reilly and Parker, 2012). Guest et al. (2006) examined their own interview analysis for evidence of saturation. They concluded that saturation was reached after 12 interviews, with key themes emerging from the analysis of only 6 interviews. Francis et al. (2010) arrive at similar conclusions. They suggest specifying an initial sample size of n = 10 at which to examine the degree of saturation reached. This initial sample size, they argue, should be combined with a set number of additional cases at which the degree of saturation is to be re-examined; for their own research, they set this number at n = 3. Both Guest et al. (2006) and Francis et al. (2010) emphasize, however, that their recommendation targets interview studies on comparatively homogeneous phenomena. Saturation, despite its prevalence, has been criticized on methodological grounds. Dey (1999) argues that saturation is always a matter of degree. Also, saturation is not always the most appropriate criterion for deciding when to stop sampling (O'Reilly and Parker, 2012). Selected purposive sampling strategies In this section, some selected purposive sampling strategies that are frequently used in qualitative research are presented in more detail: theoretical sampling, stratified purposive sampling, criterion sampling, and selecting specific cases. It should be noted that the different strategies are not mutually exclusive and that several strategies can be combined in one study. Theoretical Sampling Theoretical sampling was developed in the context of grounded theory methodology and is very much a part of the overall iterative grounded theory methodology in combination with a process of constant comparison (Glaser, 1978; Strauss, 1987; for an overview see Draucker et al., 2007). Theoretical sampling takes place in constant interrelation with data collection and data analysis, and it is guided by the concepts and the theory emerging in the research process. More instances and more data are added so as to develop the emerging categories and their dimensions, and relate them to each other: Theoretical sampling means that the sampling of additional incidents, events, activities, populations, and so on is directed by the evolving theoretical constructs. Comparisons between the explanatory adequacy of the theoretical constructs and the empirical indicators go on continuously until theoretical saturation is reached (i.e. additional analysis no longer contributes to anything new about this concept). (Schwandt, 2001, p. 111) In terms of sample composition, theoretical sampling yields a heterogeneous sample that allows for Page 9 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. comparing different instantiations of a concept. The sampling process is emergent and flexible, and the goal of the sampling strategy, as the name says, is to develop a theory that is grounded in the data. While the strategy is well established in the methodological literature and is the default strategy in grounded theory studies, it is nevertheless difficult to find detailed descriptions of how sampling choices are revised and modified in response to emergent concepts (Draucker et al., 2007). The above case study by Wuest (2001) on the experiences of women providing care to others constitutes an exception. While Wuest acknowledges the difficulty of documenting what is essentially a process of following up on various conceptual dimensions simultaneously, she describes the theoretical sampling process and the choices she made at various stages of the research process in exemplary and enlightening detail. Stratified Purposive Sampling Like theoretical sampling, stratified purposive sampling results in a heterogeneous sample that represents different manifestations of the phenomenon under study (Ritchie et al., 2014; termed ‘quota sampling’ in Patton, 2015, p. 268). In contrast to theoretical sampling, however, stratified purposive sampling entails a topdown approach, that is, decisions about the composition of the sample are made before data collection. In a first step, the researcher has to decide which factors are known or likely to cause variation in the phenomenon of interest. In a second step, two to a maximum of four such factors are selected for constructing a sampling guide. Step three involves combining the factors of choice in a cross-table. At this point the researcher has to decide whether all possible combinations of all factors are to be realized or else, if not, which factor combinations will be included. With more than two factors, it is usually not possible to conduct sampling for all possible factor combinations. The resulting sampling guide is displayed in a table, with each factor combination corresponding to a cell. In a final step, the researcher will decide how many units to sample for each cell or factor combination. Depending on how many cells there are in the sampling guide (‘sample matrix’ according to Ritchie et al., 2014), one or two units will typically be included. Stratified purposive sampling is useful for exploring the various manifestations of a phenomenon for similarities and differences. As the term ‘sampling guide’ implies, a sampling guide is not cast in stone and can be modified as the selection process unfolds (Morse, 2000). In the above study, for example, it proved difficult to find participants without any training qualification, especially in the oldest age group, and participants from other age groups or with training qualification were substituted. A sampling guide can and should also be modified if it emerges during the study that factors other than the ones informing the sampling guide affect the phenomenon under study. In this case a combination of concept-driven and data-driven sampling is realized (for an example, see Johnson, 1991). Another, more flexible variant of stratified purposive sampling is maximum variation sampling (Patton, 2015, p. 267). As in stratified purposive sampling, the researcher starts out by identifying factors that lead to variation in the phenomenon under study. But instead of systematically combining these factors, they serve as a broad framework orienting the sampling process, with a view to including as much variation in the sample as possible. In their interview study on how older persons experience living at home, for example, De Page 10 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Jonge et al. (2011) included participants who differed with respect to age, gender, living conditions, type of dwelling, tenure, and location in the city or in the countryside, without specifying which combinations were to be included. They used a matrix, however, to record the specific combination of characteristics represented by the participants in their sample, to ensure and document sufficient variation of their cases. Criterion Sampling In criterion sampling, the objective is to include instances in the sample that match a predefined profile (Coyne, 1997; Patton, 2015, p. 281). Usually this involves a combination of characteristics, which, together, specify a phenomenon under study or a restricted population in which this phenomenon is likely to occur. The resulting sample is homogeneous with respect to the selected criteria (but may be heterogeneous in other respects), and decisions about these criteria are made in advance. This sampling strategy is especially useful for exploring a phenomenon in depth. Sampling in different research traditions One of the factors influencing the type of generalization aimed for and the consequently optimal sample size is the methodological tradition in which a study is carried out (Bryman in Baker and Edwards, 2012; Higginbottom, 2004; Robinson, 2014). Empirical analysis of sample sizes in different qualitative research traditions shows that the number of cases differs quite substantially between approaches (Guetterman, 2015; Mason, 2010). In the following, a few selected approaches will be discussed with a view to sampling and generalization issues: interview studies, the case study, and phenomenology (for grounded theory methodology see the sections on ‘theoretical sampling’ and ‘theoretical saturation'). Interview Studies Many recommendations concerning sample size in qualitative research relate to the use of interview data (see Roulston and Choi, Chapter 15, this volume) in particular (e.g. Crouch and McKenzie, 2006; Mason, 2010). Recommendations range from 10 to 13 units (Francis et al., 2010) up to between 60 and 150 (Gerson and Horowitz, 2002, p. 223). This large variety of recommendations is not surprising, considering that as a method for data collection, the interview can be used within a great variety of different research traditions and with a view towards different kinds of generalization. Instead of examining optimal sample size in interview studies, it seems more promising to look at sample sizes in different approaches where interviews, observation, and other methods for collecting qualitative data are used (cf. the analysis of interview-based dissertations from different traditions in Mason, 2010). The Case Study In the case study, different methods of data collection are combined to allow for an in-depth analysis of one Page 11 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. or several cases and their subunits (Yin, 2014, chapter 1). Descriptive case studies are especially suitable for yielding ‘thick descriptions’ and therefore lend themselves well to generalization in the sense of transferability. Explanatory case studies are more suitable for analytic generalization, being used for generating and for building theory (Mitchell, 1983). In terms of sampling, case studies require complex decisions on multiple levels. In a first step, the case or cases have to be selected. In single case studies, this often involves selecting a case with a view to a population, e.g. a typical case or an intense case (Yin, 2014, pp. 51–6). Single case studies allow for moderatum generalizations based on structural and cultural consistency (Williams, 2002). In multiple case studies, the key concern in sampling is the underlying logic of replication, that is, the question of how the cases relate to each other (Yin, 2014, pp. 56–63). Based on a literal replication logic, cases are selected so as to be similar to each other. If the study follows a theoretical replication logic, cases are selected so as to contrast with each other on relevant dimensions. In a second step, within-case sampling is necessary to ensure internal generalizability (Maxwell and Chmiel, 2014). In principle, any purposive sampling strategy can be used, but maximum variation sampling seems especially useful for representing different aspects of a case, such as persons, points in time, or contexts (Higginbottom, 2004). In his analysis of sample size for case studies in the fields of health and education, Guetterman (2015) found that the number of cases ranged from 1 to 8, and the number of participants or observations ranging from 1 to 700. This suggests that researchers conducting case studies tend to limit themselves to a few cases only, thus allowing for a detailed analysis of each case. The wide range of number of participants and observations, however, indicates that the ‘thickness’ of the resulting descriptions varies considerably. Phenomenology Phenomenology has the aim of identifying the ‘essence’ of the human experience of a phenomenon (overview in Lewis and Staehler, 2010). By aiming to describe the common characteristics of that experience, phenomenology by definition also aims for (empirical) generalization. Because the respective experience is assumed to be universal, the experience of any human being qualified to have that experience is considered a case in point. Consequently, no special sampling strategy is required, that is, convenience sampling would be sufficient: Any individual who meets the conditions for having the experience under study would be a suitable participant, and because of the relative homogeneity of the phenomenon, comparatively small samples would be acceptable. This is, indeed, reflected in smaller sample sizes found in Guetterman's (2015) analysis, ranging from 8 to 52 participants. Similar considerations concerning the assumed universal nature of a phenomenon or an underlying structure are found in studies on the organization of everyday talk (e.g. Schegloff and Sacks, 1974; see Jackson, Chapter 18, this volume) or in objective hermeneutics (overview in Wernet, 2014). Conclusion Page 12 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Recently there has been increasing attention to what qualitative researchers actually do during the sampling process. These articles have already been cited in the previous sections: Onwuegbuzie and Leech (2010) examined studies published in Qualitative Report for whether the authors generalized their findings and what type of generalization they drew upon. Guetterman (2015) and Mason (2010) conducted research on sample sizes in studies from different qualitative traditions, and Guest et al. (2006) examined their own data for evidence of saturation. More empirical studies of this kind are needed to better understand what qualitative researchers actually do in terms of sampling and generalization. Another development is related to the increasing use of mixed methods designs in the social sciences (see Hesse-Biber, Chapter 35, this volume; Morse et al., Chapter 36, this volume). Discussions of sampling in mixed methods designs include descriptions of purposive sampling strategies, thereby contributing to making them more visible in the social science methods literature (e.g. Teddlie and Yu, 2007). Some strategies that are considered purposive in a qualitative research context, such as stratified purposive sampling or random purposive sampling, have in fact been classified as ‘mixed’ strategies within this mixed methods context. The mixed methods research tradition is relevant to developments in purposive sampling in yet another respect. Many purposive sampling strategies, such as stratified purposive sampling or selecting certain types of cases, require prior knowledge about the phenomenon in question and its distribution. This is where mixing methods can be highly useful in informing the purposive sampling process. Within a sequential design, for example, the findings of a first quantitative phase can be used in order to purposefully select instances for a second qualitative phase of the research (for sequential designs see Creswell and Plano Clark, 2011, chapter 3). While the topic of purposive sampling has been receiving increasing attention, this is not the case for the topic of generalizing in qualitative research, and even less so for the relationship between types of generalization and sampling strategies. This will be an important focus of future qualitative research methodology in this area. Case Study: ‘Precarious Ordering' In her study of the experiences of women providing care, Wuest (2001) develops a middlerange theory with a focus on what she calls precarious ordering, based on a total of 65 interviews with women facing a variety of demands for care (caring for children with otitis media with effusion, Alzheimer's disease, and leaving abusive relationships). Precarious ordering involves a two-stage iterative process of negotiating demands for care and own resources, moving from daily struggles to re-patterning care. She starts out her process of theoretical sampling by talking to childrearing middle- Page 13 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. class women, both employed and unemployed. Her initial data analysis points her to the importance of increasing demands and types of demands and of the dissonance between demands and resources. To further explore the role of varying demands, she continues by interviewing women who face higher than average demands (mothers of physically and mentally disabled children) and women who face fewer demands (women with adult children and children away from home). Her interviews with mothers of disabled children lead her to the question of strengths and strategies developed by the women during the coping process and the role of the relationship with the partner. To explore these two factors, she talks to other women with heavy, but different demands for care (sick relatives) and women with diverse partners (e.g. a lesbian partner, a partner from a different culture). The role of resources and the distinction between helpful and unhelpful resources are further highlighted through interviews with women in economically difficult situations. Wuest thus moves through cycles of comparing and contrasting different types of caring demands, different kinds of settings, support systems, and resources, especially in terms of the relationship with a partner. Case Study: ‘Stakeholder Opinions on Priority Setting in Health Care' In the following study we used stratified purposive sampling to explore the range of opinions from different stakeholder groups and their reasons surrounding the setting of priorities in the German health care system (Schreier et al., 2008; Winkelhage et al., 2013). In a first step, we selected six stakeholder groups representing a variety of different positions and roles. We assumed that, because of these different positions, they would be likely to differ in their interests and opinions concerning priority setting in health care: healthy members of the general population, patients, physicians, nursing personnel, representatives of the public health insurance system, and politicians. In a second step, a literature search was carried out for each stakeholder group separately in order to identify factors likely to affect attitudes towards health care. Taking patients as an example, relevant factors included age (18–30, 31–62, above 62), the severity of a patient's disease (light versus severe; as judged by a physician), level of education (no training qualification, training qualification, university degree), and area of origin (former Federal Republic of Germany versus former German Democratic Republic). With four relevant factors, not all factor combinations could be realized, and some cells remain empty. The factors were combined into the following sampling guide, resulting in a sample of 12 participants (see Table 6.1). The study showed, for example, the different interests and attitudes of Page 14 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. physicians compared to patients. Patients were more likely to place hope even in less effective treatments, and they emphasized the importance of obtaining the consent of the family when deciding about life-prolonging measures (Winkelhage et al., 2013). Table 6.1 Sampling guide for stratified purposive sampling of patients Case Study: ‘Meanings of House, Home, and Family among Vietnamese Refugees in Canada' Huyen Dam and John Eyles (2012) used criterion sampling in their study exploring the meanings of house, home, and family among former Vietnamese refugees in the Canadian city of Hamilton. To be included in the study, participants had to be former Boat people who had lived in Hamilton for at least 15 years. Bounding the phenomenon in terms of origin, refugee history, and in terms of place ensured that the experiences of the participants were sufficiently comparable. Requiring the participants to have lived in Hamilton for a period of 15 years allowed the researchers to capture the experience of settling and how this changed over time. The study shows, for example, the importance of culture and family for the participants, and how these core values allow them to re-establish a sense of home after having been uprooted and relocated. Notes 1. I thank Uwe Flick, Giampietro Gobo, and an anonymous reviewer for their helpful comments. Page 15 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. 2. The term ‘qualitative research’ encompasses so many different research traditions and approaches that it constitutes a gross oversimplification to lump them together under this one label. It would be more appropriate to examine each tradition separately. As this is beyond the scope of this chapter, I will continue to use the term throughout, but ask the reader to keep in mind the diversity of approaches. I will also look at a few approaches in more detail in section 5 below. Further Reading Gobo, Giampietro (2004) ‘Sampling, representativeness, and generalizability', in Clive Seale, Giampietro Gobo, Jaber F. Gubrium, and David Silverman (eds), Qualitative Research Practice. London: Sage, pp. 435–56. Maxwell, Joseph A., and Chmiel, Margaret (2014) ‘Generalization in and from qualitative analysis', in Uwe Flick (ed.), SAGE Handbook of Qualitative Data Analysis. London: Sage, pp. 540–53. Patton, Michael Q. (2015) ‘Designing qualitative studies', in Michael Q. Patton (ed.), Qualitative Evaluation and Research Methods (4th edn). Newbury Park: Sage, pp. 243–326. References 6, Perri, and Bellamy, Christine (2012) Principles of Methodology. Research Design in Social Science. London: Sage. Baker, Sarah E., and Edwards, Rosalind (2012) ‘How many qualitative interviews is enough? Expert voices and early career reflections on sampling and cases in qualitative research', National Centre for Research Methods Review Paper, available at http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf. Boehnke, Klaus, Lietz, Petra, Schreier, Margrit, and Wilhelm, Adalbert (2010) ‘Sampling: The selection of cases for culturally comparative psychological research', in Fons van de Vijver and David Matsumoto (eds), Methods of Cross-cultural Research. Cambridge: Cambridge University Press, pp. 101–29. Bowen, G. A. (2008) ‘Naturalistic enquiry and the saturation concept: A research note', Qualitative Research, 8(1): 137–52. Bryman, Alan (2016) Social Research Methods (5th edn). Oxford: Oxford University Press. Charmaz, Kathy (2014) Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis (2nd edn). London: Sage. Coyne, I. T. (1997) ‘Sampling in qualitative research. Purposeful and theoretical sampling: merging or clear boundaries?' Journal of Advanced Nursing, 26(3): 623–30. Page 16 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Creswell, John W., and Plano Clark, Vicky L. (2011) Designing and Conducting Mixed Methods Research (2nd edn). Thousand Oaks, CA: Sage. Cronbach, L. J. (1975) ‘Beyond the two disciplines of scientific psychology', American Psychologist, 30(2): 116–27. Crouch, M., and McKenzie, H. (2006) ‘The logic of small samples in interview-based qualitative research', Social Science Information, 45(4): 483–99. Dam, H., and Eyles, J. (2012) ‘“Home tonight? What? Where?” An exploratory study of the meanings of house, home and family among the former Vietnamese refugees in a Canadian city [49 paragraphs]', Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 13(2): Art. 19, available at http://nbnresolving.de/urn:nbn:de:0114-fqs1202193. Daniel, Johnny (2012) Sampling Essentials. London: Sage. De Jonge, D., Jones, A., Philipps, R., and Chung, M. (2011) ‘Understanding the essence of home: Older people's experience of home in Australia', Journal of Occupational Therapy, 18(1): 39–47. Denzin, Norman K. (1983) ‘Interpretive interactionism', in Gareth Morgan (ed.), Beyond Method. Strategies for Social Research. Beverly Hills, CA: Sage, pp. 129–48. Dey, Ian (1999) Grounding Grounded Theory: Guidelines for Qualitative Inquiry. Bingley: Emerald Group. Draucker, C. B., Martsolf, D. S., Ross, R., and Rusk, T. B. (2007) ‘Theoretical sampling and category development in grounded theory', Qualitative Health Research, 17(8): 1137–48. Emmel, Nick (2013) Sampling and Choosing Cases in Qualitative Research. A Realist Approach. London: Sage. Flick, Uwe (2004) ‘Design and process in qualitative research', in Uwe Flick, Ernst von Kardorff and Ines Steinke (eds), A Companion to Qualitative Research. London: Sage, pp. 146–53. Flick, Uwe (2014) Introduction to Qualitative Research (5th edn). London: Sage. Francis, J. J., Johnston, M., Robertson, C., Glidewell, L., Entwistle, V., Eccles, M. P., and Grimshaw, J. M. (2010) ‘What is an adequate sample size? Operationalising data saturation for theory-based interview studies', Psychology and Health, 25(10): 1229–45. Geertz, Clifford (1973) ‘Thick description: Toward an interpretive theory of culture', in Clifford Geertz (ed.), The Interpretation of Cultures. New York: Basic Books, pp. 3–30. Gerson, Kathleen, and Horowitz, Ruth (2002) ‘Observation and interviewing: Options and choices', in Tim May (ed.), Qualitative Research in Action. London: Sage, pp. 199–224. Page 17 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Glaser, Barney (1978) Theoretical Sensitivity. Mill Valley, CA: Sociology Press. Gobo, Giampietro (2004) ‘Sampling, representativeness, and generalizability', in Clive Seale, Giampietro Gobo, Jaber F. Gubrium and David Silverman (eds), Qualitative Research Practice. London: Sage, pp. 435–56. Gobo, Giampietro (2008) ‘Re-conceptualizing generalization. Old issues in a new frame', in Pertti Alasuutari, Leonard Bickman and Julia Brannen (eds), The SAGE Handbook of Social Research Methods. London: Sage, pp. 193–213. Guba, Egon G., and Lincoln, Y.S. (1981) Effective Evaluation. San Francisco, CA: Jossey-Bass. Guest, G., Bunce, A., and Johnson, L. (2006) ‘How many interviews are enough? An experiment with data saturation and variability', Field Methods, 18(1): 59–82. Guetterman, Timothy C. (2015) ‘Descriptions of sampling practices within five approaches to qualitative research in education and the health sciences [48 paragraphs]', Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 16(2): Art. 25, available at http://nbn-resolving.de/urn:nbn:de:0114-fqs1502256. Hammersley, Martyn, and Atkinson, Paul (1995) Ethnography: Principles in Practice (2nd edn). Milton Park: Routledge. Higginbottom, G. (2004) ‘Sampling issues in qualitative research', Nurse Researcher, 12(1): 7–19. Johnson, Jeffrey C. (1991) Selecting Ethnographic Informants. London: Sage. Lewis, Jane, Ritchie, Jane, Ormston, Rachel, and Morrell, Gareth (2014) ‘Generalising from qualitative research', in Jane Ritchie, Jane Lewis, Carol M. Nicholls and Rachel Ormston (eds), Qualitative Research Practice. A Guide for Social Science Students and Researchers. London: Sage, pp. 347–63. Lewis, Michael, and Staehler, Tanja (2010) Phenomenology: An Introduction. New York: Continuum. Lincoln, Yvonne S., and Guba, Egon G. (1979) Naturalistic Inquiry. Newbury Park, CA: Sage. Lynd, Robert S., and Lynd, Helen M. (1929) Middletown. A Study in Modern American Culture. New York: Harcourt Brace Jovanovich. Marshall, M. N. (1996) ‘Sampling for qualitative research', Family Practice, 13(6): 522–25. Mason, Jennifer (2002) Qualitative Researching (2nd edn). London: Sage. Mason, M. (2010) ‘Sample size and saturation in PhD studies using qualitative interviews [63 paragraphs]', Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 11(3): Art. 8, available at http://nbnresolving.de/urn:nbn:de:0114-fqs100387. Page 18 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Maxwell, Joseph A., and Chmiel, Margaret (2014) ‘Generalization in and from qualitative analysis', in Uwe Flick (ed.), SAGE Handbook of Qualitative Data Analysis. London: Sage, pp. 540–53. Merkens, Hans (2004) ‘Selection procedures, sampling, case construction', in Uwe Flick, Ernst von Kardorff and Ines Steinke (eds), A Companion to Qualitative Research. London: Sage, pp. 165–71. Mitchell, C. (1983) ‘Case and situation analysis', The Sociological Review, 31(2): 187–211. Morse, J. (2000) ‘Editorial: Determining sample size', Qualitative Health Research, 10(1): 3–5. Onwuegbuzie, A. J., and Leech, N. (2005) ‘Taking the “q” out of research: Teaching research methodology courses without the divide between quantitative and qualitative paradigms', Quality & Quantity, 39(3): 267–96. Onwuegbuzie, Anthony J., and Leech, Nancy (2007) ‘A call for qualitative power analyses', Quality & Quantity, 41(1): 105–21. Onwuegbuzie, A. J., and Leech, N. (2010) ‘Generalization practices in qualitative research: A mixed methods case study', Quality & Quantity, 44(5): 881–92. O'Reilly, M., and Parker, N. (2012) ‘“Unsatisfactory saturation”: A critical exploration of the notion of saturated sample sizes in qualitative research', Qualitative Research, 13(2): 190–7. Palinkas, L. A., Horwitz, S. M., Green, C. A., Wisdom, J. P., Duan, N., and Hoagwood, K. (2015) ‘Purposeful sampling for qualitative data collection and analysis in mixed method implementation research', Administration and Policy in Mental Health and Mental Health Services Research, 42(5): 533–44. Patton, Michael Q. (2015) Qualitative Evaluation and Research Methods (4th edn). Newbury Park: Sage. Polit, D. F., and Beck, C. (2010) ‘Generalization in quantitative and qualitative research: Myths and strategies', International Journal of Nursing Studies, 47(11): 1451–8. Ritchie, Jane, Lewis, Jane, Elam, Gilliam, Tennant, Rosalind, and Rahim, Nilufer (2014) ‘Designing and selecting samples', in Jane Ritchie, Jane Lewis, Carol M. Nicholls and Rachel Ormston (eds), Qualitative Research Practice. A Guide for Social Science Students and Researchers. London: Sage, pp. 111–45. Robinson, O. C. (2014) ‘Sampling in interview-based qualitative research: A theoretical and practical guide', Qualitative Research in Psychology, 11(1): 25–41. Sandelowski, M. (1995) ‘Sample size in qualitative research', Research in Nursing & Health, 18(2): 179–83. Schegloff, E. A., and Sacks, H. (1974) ‘A simplest systematics for the organization of turn-taking for conversation', Language, 50(4): 696–735. Schofield, Janet W. (1990) ‘Increasing the generalizability of qualitative research', in Elliot W. Eisner and Page 19 of 20 Sampling and Generalization1 SAGE SAGE Research Methods 2018 SAGE Publications, Ltd. All Rights Reserved. Alan Peshkin (eds), Qualitative Inquiry in Education: The Continuing Debate. New York: Teachers College Press, pp. 201–42. Schreier, Margrit, Schmitz-Justen, Felix, Diederich, Adele, Lietz, Petra, Winkelhage, Jeanette, and Heil, Simone (2008) Sampling in qualitativen Untersuchungen: Entwicklung eines Stichprobenplanes zur Erfassung von Präferenzen unterschiedlicher Stakeholdergruppen zu Fragen der Priorisierung medizinischer Leistungen, FOR655, 12, available at http://www.priorisierung-in-der-medizin.de/documents/ FOR655_Nr12_Schreier_et_al.pdf. Schwandt, Thomas A. (2001) Dictionary of Qualitative Inquiry (2nd edn). Thousand Oaks, CA: Sage. Stake, R. E. (1978) ‘The case study method in social inquiry', Educational Researcher, 7(2): 5–8. Strauss, Anselm (1987) Qualitative Analysis for Social Scientists. Cambridge: Cambridge University Press. Teddlie, C., and Yu, F. (2007) ‘Mixed methods sampling: A typology with examples', Journal of Mixed Methods Research, 1(1): 77–100. Trotter, R. L. (2012) ‘Qualitative research sample design and sample size: Resolving and unresolved issues and inferential imperatives', Preventive Medicine, 55(5): 398–400. Wernet, Andreas (2014) ‘Hermeneutics and objective hermeneutics', in Uwe Flick (ed.), SAGE Handbook of Qualitative Data Analysis. London: Sage, pp. 125–43. Williams, Malcolm (2002) ‘Generalization in interpretive research', in Tim May (ed.), Qualitative Research in Action. London: Sage, pp. 125–43. Winkelhage, J., Schreier, M., and Diederich, A. (2013) ‘Priority setting in health care. Attitudes of physicians and patients', Health 2013, 5(4): 712–19. Wuest, J. (2001) ‘Precarious ordering: Toward a formal theory of women's caring', Health Care for Women International, 22(1–2): 167–93. Yin, Robert K. (2014) Case Study Research. Design and Methods (5th edn). Thousand Oaks, CA: Sage. http://dx.doi.org/10.4135/9781526416070.n6 Page 20 of 20 Sampling and Generalization1