Subido por JEAN CARLOS SÁNCHEZ CAMPOS

ImpactEvaluation

Anuncio
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/315843209
Methods of Impact Evaluation: A Review
Article in SSRN Electronic Journal · April 2017
DOI: 10.2139/ssrn.2943601
CITATIONS
READS
2
1,129
2 authors:
Abhisek Mishra
Tushar Kanti Das
Sambalpur University
Sambalpur University
12 PUBLICATIONS 12 CITATIONS
17 PUBLICATIONS 17 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Sustainable Livelihood of Sambalpuri Handloom Weavers: A Study in Western Odisha View project
Livelihood Security View project
All content following this page was uploaded by Tushar Kanti Das on 29 May 2018.
The user has requested enhancement of the downloaded file.
Methods of Impact Evaluation: A Review
Abhisek Mishra
Research Scholar (Ph.D.)
Department of Business Administration
Sambalpur University
Email: [email protected]
Tushar Kanti Das
Reader, Department of Business Administration
Sambalpur University
Email: [email protected]
Electronic copy available at: https://ssrn.com/abstract=2943601
Abstract:
Government along with non-government organisations initiates various development
programmes for the different strata of the society. The extent of those programmes is known
after the evaluation only. Impact evaluation provides the causal effect of a programme on the
outcome. Measurement of impact of any developmental programme is a difficult task. This
paper provides a systematic review on various quantitative methods available for impact
evaluation and suggests the use of combination of methods for enhancing the robustness of
the impact.
Key words: Impact evaluation, methods, randomisation, regression discontinuity, propensity
score matching.
Electronic copy available at: https://ssrn.com/abstract=2943601
1. Introduction
Economic development is a policy intervention that aims at economic and social well-being
of people. Developmental programmes are initiated by the Government along with the NonGovernment Organisations (NGOs) to bring a change in the outcome (for example, rise in
income, improving well-being, improving livelihood, improving learning) for different strata
of society both at national and international levels.
The extent of benefits of those programmes is known after the evaluation only.
Evaluation is the process of systematic and objective assessment of a programme using a set
of governed standards. In simple words, evaluation answers questions relating to design,
implementation and benefits of a programme.
Evaluation addresses three types of questions; (a) descriptive question that explains
what is taking place (the detailed process and condition), (b)Normative question, makes a
comparison between what has taken place and what should be taking place and lastly (c)
cause-and-effect question that examines the differences in the outcomes after intervention
(Imas & Rist, 2009). Impact evaluation answers the third question.
The impact evaluation is structured with a particular kind of question that is, what is
the impact (causal effect) of the programme on the outcome (Gertler, Martinez, Premand,
Rawlings, & Vermeersch, 2007). The evaluation of a policy is based on two theories;
structural approach and treatment effect approach. The structural approach is applicable
where there is a universal participation; whereas, treatment effect approach is applicable
where there are two groups,(a) treatment group, who take part in the programme and (b)
comparison group, who do not participate in the programme (Heckman, James, & Vytlacil,
2005).
In evaluating any intervention data are required (quantitative or qualitative).
Quantitative data are suitable for measuring the levels and changes in the impacts and by the
help of which an inference is drawn. But, it is less effective in understanding the process;
i.e., the mechanism that how an intervention activates a series of events that are reflected in
the impact of the interventions. On the other hand, qualitative methods are more effective in
understanding the process of impact. Most of the econometric analysis (using quantitative
data) fails to examine the detailed process of project implementation (not using qualitative
data). As a result of which, it becomes difficult to know the reason behind the failure of the
project. That is, whether the failure lies in design or in implementation. In other words, in
all those cases, the research questions are being shaped by the data instead of their data by
the questions. To avoid these cases qualitative data (tape recordings of village meetings,
free-ranging open ended interviews, focus groups) are to be used (Bamberger, Rao, &
Woolcock, 2010).
Due to the importance of both the methods, the concepts like multi-method and
mixed-method come into existence. Creswell and Clark (2007) give a clear distinction
between the multi-method (i.e. qualitative or quantitative method) and mixed method
research (i.e. integration of quantitative and qualitative method).
Greene, Caracelli, and Gruham (1989) also define mixed methods as the method that
includes at least one quantitative method (to collect numbers) and one qualitative method
(to collect words). When in a single study a combination of quantitative and qualitative
approaches is used as research methodology, it refers to the mixed method study
(Tashakkori & Teddlie, 1998). Apart from methods and approaches, combination of
quantitative and qualitative techniques, concepts or languages that are used in a single study
is called mixed method research (Johnson & Onwuegbuzie, 2004).
Both quantitative and qualitative data have their relative importance. Therefore,
researcher should not give primacy to one over another; as different methods are used to
tackle different problems. The combination of techniques provides greater insight instead of
using either of one in isolation (White, 2002). In that context, Bamberger, Rao, and
Woolcock (2010) provide a view that mixed methods (using both quantitative and
qualitative) significantly strengthens the validity and operational utility of designs. Rao and
Woolcock (2003) argue for the use of a judicious mix of both quantitative and qualitative
method is to be used to get a comprehensive evaluation of an intervention. They add that
Parallel integration of quantitative and qualitative method is suitable for large government
projects like national level poverty assessment. Therefore, a combination of both
quantitative and qualitative method is to be used in the evaluation of an intervention.
A description of how and why a programme/ policy is supposed to deliver the desired
results is known as a theory of change ; where as a sequence of inputs, activities and outputs
to improve the outcome constitute a result chain. To establish causality between a
programme and an outcome, impact evaluation methods are used.
The causal effect or impact of a programme ‘p’ on an outcome of interest ‘y’ is given
by: α = (Y│p=1) – ( Y│p=0); where ‘α’ represents the impact of a programme on an
outcome, ( Y│p=1) is the outcome with the programme and ( Y│p=0) is the outcome without
the programme. The estimated impact applied to those units to whom the programme has
been offered, regardless of their enrolment in the program, is called intention-to-treat (ITT).
But, the estimated impact is called as Treatment-on- Treated (TOT), when the units to whom
the programme have been offered and have actually enrolled in that. That means, both the
ITT and TOT becomes same when all units to whom a programme has been offered actually
decided to enrol in that (Gertler et al., 2007).
2. Methods of Impact Evaluation
Evaluation methods in empirical economics fall into five broad categories; each provides an
alternative approach for constructing the counterfactual and to minimise the selection bias.
Alternative evaluation methods depend on several criteria like; (a) nature of the program (i.e.
whether the program/policy is local or national, small scale or global) (b) nature of questions
to be answered, and (c) nature of data available (Blundell & Dias, 2000). Heckman, Smith,
and Clements (1997) and Heckman, Ichimura, and Todd (1998a, b) show that data quality is
also a crucial ingredient for the determination of the appropriate estimation strategy. A
review on various methods of impact evaluation is as follows.
2.1. Randomisation method
Randomisation is defined as the incorporation of a fully controlled bit of randomness in the
process of data generation. The advantage of the physical act of randomisation is mentioned
by Fisher (1935), while discussing Galton’s analysis of a Darwin experiment with 15 pairs
of self-fertilised and cross fertilised seeds. This idea of Fisher was immediately generalised
by Pitman (1937) and then pushed to the natural boundary by Kempthrone (1952) and many
others (Basu, 1980).
Randomised assignment is often used both in large and small scale impact evaluation
work. Because the programme manager ensures that every eligible person or unit has the
same chance of receiving the programme. When the observation becomes very large, there
is a possibility of flow of characteristic through treatment and comparison group; if they are
created through randomised assignment.
An evaluation is internally valid; when it uses a valid comparison group. In the
context of forming comparison group and treatment group Fisher (1935) advocate the use of
Combinatoric analysis (which is rarely used in practice) to calculate the exact probabilities
of each possible outcome (Deaton, 2009).When the impact estimated in the evaluation
sample could be generalised to the total population, that evaluation is an externally valid
one. Randomised assignment is used, when there exists excess demand for a programme
and when a programme needs to be phased until it covers the entire population. (Gertler et
al., 2007).
The use of randomisation method is found in estimating the impacts of various
programmes initiated in different contexts, which are as follows:
Vermeersch (2002) by using randomised technique examine the effect of providing
breakfast on school participation. He finds a higher participation rate (30% higher) in
twenty five treatment schools than twenty five comparison schools.
The effect of providing textbooks in schools in Kenya on test scores is studied by
Glewwe, Kremer, and Moulin (2002). The authors through randomisation technique find no
effect of the intervention on the test scores of bottom 60% students. On the contrary, they
find an increase in the test scores of the students who perform well in the pre-test exam.
Banerjeee, Jacob, and Kremer (2002) evaluate a programme by monitoring the
attendance of both teachers and children. The authors find a decrease in the number of
closing days after the introduction of the program (i.e. 44% in one- teacher and 39% in twoteacher). They also find an increase in girls’ participation.
In the evaluation of a Colombian programme for extending the coverage of secondary
school (where vouchers for private schools are allocated by lottery due to the limitation of
programme’s budget), Angrist, Bettinger, Bloom, King, and Kremer (2002) take the
advantage of randomly assigned treatment and find that lottery winners are 15-20% more
likely to attend private schools, 10% more likely to complete the 8th grade and scores on an
average 0.2 standard deviation higher on standardised tests.
Kremer, Moulin, and Namaunyu (2002) evaluate a programme where an NGO
provides uniforms, textbooks, and classroom construction to seven schools that are
randomly chosen from fourteen poorly performing schools in Kenya and find considerably
low dropout rates in treatment schools.
In evaluating the effect of a twice-yearly school based mass treatment programme on
absenteeism rate in Kenya. In seventy five randomly selected schools de-worming drugs are
provided to students. Miguel and Kremer (2003a) use this method and find a lowered
absenteeism rate (25%) in the treatment schools.
Glewwe, Illias, and Kremer (2003) evaluate a programme where parent school
committees provide gifts to teachers whose students perform well. They conclude an
increase in the test scores of the students who take part in the programme initially, but later
on fall back to the level of comparison group at the end of the programme.
Banerjee, Cole, Duflo, and Linden (2003) evaluate the impact of a remedial education
programme introduced by Pratham, an Indian NGO, where young women are hired from the
communities and are provided with remedial education to children in Government school.
On an average, after two years of the program, they find an increase in the test scores of the
students by standard deviation of 0.39. Moreover, the authors conclude that the bottom level
children gain the largest out of this programme. They also conclude that hiring remedial
education teachers from community is ten times more cost effective than hiring new
teachers.
Schultz (2004) studies the impact of transferring funds to poor mothers in rural
Mexico on their children’s enrolment in school. The authors based on the randomised
assignment find a positive effect on the enrolment after the intervention.
Apart from the above, randomised method is also used in various fields for studying
the impact, clear from the below mentioned cases. For the targeted wage subsidy
programme in U.S., Burtless (1985), Woodbury and Spiegelman (1987) and Dubin and
Rivers (1993) use randomised technique of evaluation.
While assessing the impact of credit, Karlan and Zinman (2010) using randomised
experiment concludes that marginal loans produce significant net benefits for borrowers
over a wide range of outcome.
In assessing the effect of performance based payment on the use and quality of
maternal and child health services provided by health-cares in Rwanda, Basinga et al.
(2011) conclude that financial performance incentives (i.e. payment for performance)
improves the use and quality of maternal and child health services.
Randomised promotion method is similar to that of the randomised offerings. Under
this method, the units are selected randomly to whom promotion is provided for the
treatment; instead of randomly selecting units to whom treatment is offered. By doing so,
programme is left open for every unit. There are three types of units under randomised
promotion method,(1) Always- always they want to enrol in the programme, (2) Enrol- IfPromoted- they enrol only when additional promotion is provided, (3)Never- they never
enrol in the programme; whether the promotion is offered or not (Gertler et al., 2007).
The use of this technique is found in Newman et al. (2002); where the authors
evaluate the effect of a programme where social investment funds are provided for small
scale investments in education, health and water infrastructure in Bolivia. The authors find a
little impact on educational outcomes, except for a decrease of about 2.5 percent in the
drop-out rates.
Gertler, Martinez and Vivo (2008) use this method in the evaluation of a maternal and
child health insurance programme introduced by the Argentina Government after 2001
economic crisis. They find an improvement in the health status of the population. Gertler
(2008) also evaluate the impact of a maternal and child health insurance programme in
Argentina by using the method.
In the present day scenario, the popularity of clustered randomised controlled trial is
increasing in evaluating the impact interventions which are applicable to intact groups of
individuals (Abe & Gee, 2014). When clustered data are taken into consideration, evaluators
select from a variety of methods that appropriately account for the correlation between the
study participants. The choice of method depends upon factors like (a)professional
judgement and prior training of the evaluator, (b) the disciplinary field (i.e. public health,
education etc.) in which the study is conducted. Therefore, the evaluator is required to do a
sensitivity analysis1 (Thabane et al., 2013).
A clustered randomised controlled trial (RCT) refers to an experiment in which intact
groups of individuals are randomly assigned to receive an offer to participate or not. When
the randomisation occurs either in group level or in individual level, it is referred as unit of
randomisation. The potential cross contamination between control and treatment conditions
can be avoided when RCT is applied (Raudenbush, 1997).
In order to determine the impact of the programme on the outcome in an RCT, an
analyst applies standard t-test or ordinary least square (OLS) regression to test the estimate
effects on treatment condition with the assumption that individuals are members of existing
1
carrying out the analysis by using different methods and checking the consistency of result across such
methods.
groups and may not be completely independent of each other. So, the degree of the
interdependency of individuals with in a cluster is measured quantitatively by the intra-class
correlation coefficient (ICC).
The ICC value ranges from 0 to 1. The values closer to 1 indicate high degree of
correlation and low otherwise. Abe and Gee (2014) address four methods for analysing
clustered data. First one is, hierarchical linear modelling (also called as random coefficient
models, mixed- level models or multi-level models. Second one is, feasible generalised least
square, which accounts for modelling the correlated nature of errors and provides an
efficient estimate than standard OLS (Cameron & Miller, 2011). Third one is, generalised
estimating equation (GEE), which is an extension of general linearised model (Burton,
Gurrin, & Sly, 1998), and the last one is ordinary least square.
Various studies have heterogeneous results on these four methods. Hubbard et al.
(2010) argue that GEE model describes more accurately the population parameter in
comparison to HLM. Adding one more advantageous feather on GEE, Gelman (2006) note
that HLM or mixed-level model allows the parameters to vary whereas GEE does not. From
the literature in the medical field, Galbraith, Daniel, and Vissel (2010) by using simulated
data find out how the results vary across the range of methods. The use of RCT is found in
the followings:
Li et al.(2017) using RCT studies the effect of conditional cash transfer(CCT) on
matriculation of junior high school students into rural China’s high schools. The authors
find no effect of CCT voucher on the students’ performances.
In studying the impact of matching grants (a common type of private sector
development programme), McKenzie, Assaf and Cusolito (2017) use RCT and find a
positive impact of the programme through more product innovation, updated accounting
system, increase in capital investment and growth in sales after one year.
2.2. Regression Discontinuity Design
In impact evaluation, Regression discontinuity design method is used for a programme that
has a continuous eligibility index with a clearly defined cut-off score to determine the
eligibility of the participants. The impact estimator under RDD is E (YT │M= m-ε) – E(YC
│M= m+ε); where, Mi denotes score received by unit ‘i’ in a proxy means test and m
denotes the cut-off point for eligibility with the condition that, Ti=1 for Mi≤ m and Ti=0
otherwise. The various studies based on RDD are as follows.
Duflo (2003) uses this method in studying the impact of expanded old-age pension
programme (by regressing anthropometric outcomes on a number of covariates) by taking
the eligible age requirements for men and women as instruments of receiving pension. The
author finds a significant positive effect on the status of girls in raising weight and height
(in the case of pension received by women).
Matinez (2004) studies the effect of old age pension programme on consumption and
finds a result consistent with the presence of credit constraint. In assessing the effect of
social assistance programme, which is funded through the Canadian Assistance plan, in
Quebec, Canada in labour market outcome, Lemieux and Milligan (2005) by using
regression discontinuity design method (limiting the sample to men) find that access to
greater social assistance benefits reduces employment by about 4.5 percent for men.
Further, the impact of school fee reduction programme on school enrolment (in the
city of Bogota, Colombia) is studied by Barrera, Linden and Urquiola (2007). They use
regression discontinuity design method and find a positive impact on school enrolment
rates. Similarly, regression discontinuity design method is also used to evaluate a social
safety net initiative in Jamaica.
In 2001, the Government of Jamaica initiates a programme namely, Programme of
Advancement through Health and Education (PATH), where grants are given to children in
eligible poor households on the condition of regular attendance and health visits. Levy and
Ohis (2007) by using regression discontinuity design find that PATH programme increases
school attendance for children ages 6 to 17 by an average of 0.5 days per month.
Filmer and Schady (2009) studies the impact of scholarship in school enrolment and
test scores of poor students in Colombia and find an increment in school enrolment and test
scores by approximately 25 percent after providing scholarship to poor students.
Other works based on this method have also provided regression discontinuity as an
alternative for experimental method which is clear from the study of Buddlemeyer and
Skofias (2003). The study explains that when policy discontinuities are regularly enforced,
regression discontinuity method will be useful. Cook, Shadish and Wong (2006) supported
the result of Buddlemeyer and Skofias (2003) by adding that the randomised and nonrandomised experiments provide the same results if regression discontinuity method is
applied.
Although various works has been done using RDD technique, it goes through a
number of drawbacks like, (a) there will be specification error, (b) When RDD is used, large
evaluation samples are required to obtain sufficient statistical power. Besides these
limitations, RDD yields unbiased estimate of the impact in the neighbourhood of the
eligibility cut-off.
2.3. Propensity score matching (PSM) method
Propensity score matching (PSM) has become a popular approach in estimating causal
treatment effects (Caliendo & Kopeninig, 2008). Propensity score matching method pairs
each programme participants with a single nonparticipant, where pairing is done on the basis
of the degree of similarity in the estimated probability of participating in the programme.
This method aims to select comparators according to their propensity score as: P (Z) = Pr (T=
1│Z) (0< P (Z) <1), where Z is a vector of pre-exposure control variables. This method is
based on the assumption that, there is no unobserved difference in the treatment and
comparison group.
This method is applied in all situations where there exists a group of treated
individuals and a group of untreated individuals. LaLonde (1986) use this method in
measuring the impact of training programme on trainee’s earnings. Dehejia and Wahba (1999
using National Supported Work (NSW) data re evaluates the impact of training programme
(earlier studied by LaLonde,1986) using matching estimator. In the medical field (in
pharmacoepidemilogic research), Perkins et al. (2000) use this method. In the field of human
resource, the effect of union membership on wages of employees is studied by Bryson
(2000). In studying the effect of migration decision on wage growth, Ham, Li and Reagan
(2004) use matching method. Trujillio, Portillo and Vernon (2005) evaluate the impact of
Colombia’s subsidised health programme on the utilisation of medical care. Further, Brand
and Halaby (2006) analyse the effect of attending elite colleges on career outcome by using
this method. Use of this method is also found in Gebrehiwot and Van der veen(2015) for
accessing the impact of a food security programme. Similarly, Guerzoni and Raiteri (2015)
use this method in examining the impact of various technological policies upon firm’s
innovative behaviour. Further, Mohapatra and Sahoo (2016) use the same in assessing the
impact of women’s participation in the microfinance programme on their empowerment.
Attempts are made to test Propensity Score Matching (PSM) against randomised
evaluation which has shown mixed results. Dehejia and Wahba (1999) by using NSW data
conclude that matching approaches are generally more reliable than general econometric
estimators as they find matching estimators are able to produce a result which replicates the
result using experimental method. In the evaluation of a social drop out programme in U.S,
by studying the PROGRESA data, Diaz and Handa (2006) find that PSM performs well when
the same survey instrument is used for measuring outcomes for both control and treatment
group. In the evaluation of a U.S. training programme, Heckman, Smith and Clements (1997)
and Heckman et.al (1998) emphasise on the use of same survey instrument.
Along with the above advantages, Lalonde (1986) conclude that non experimental
methods are subjected to specification errors and also suggest being aware while
implementing these methods. Rubin and Thomas (2000) provide the disadvantage of using
propensity score matching. It indicates that the impact estimated based on full unmatched
samples generally provide more biased result. They also advocate that variables with weak
predictive ability for outcomes are also helpful in reducing biasness if propensity score
matching method is used. By giving a contradictory view Smith and Todd (2005a) argue that
PSM does not solve the selection problem which has studied by Lalonde. As a reply, Dehejia
(2005) explains regarding wrongly applied specifications in Smith and Todd (2005a).
Further, Agodini and Dynarski (2004) find no consistent evidence of PSM replicating
experimental results.
Pipeline Comparison
In the pipeline comparison method, comparison group is created by taking the people who
apply for a program but doesn’t receive it (Ravillion, 2008). The examples of this method are
found in Chase (2002)2 and Galasso and Ravillion (2004)3.
2
Chase (2002) use it where community applies for social fund in Armenia.
3
Galasso and Ravillion (2004) use this method while evaluating a large social protection program in Argentina.
Although matching design is applied to any programme, the limitation of this method
is that it requires extensive data sets on large samples; also, this method is practically used
only when all other methods (i.e. Randomised selection, Regression discontinuity design and
difference- in- differences method) are not possible to use.
2.4. Differance-in-Differances (DD) Design
The difference-in-differences (DD) design takes into account any difference between the
treatment and the comparison groups over a constant period of time. Though it takes the
differences in the two (i.e. treatment and comparison), it does not help in eliminating the
differences between treatment and comparison that change over time. Thus, when DD
method is used, researchers must assume that in the absence of the programme the outcome
in the treatment group moves in cyclical way with the outcome of the comparison group.
The use of double difference method is seen in Binswanger, Khanderker and
Rosenzweg (1993), where they estimate the impact of rural infrastructure on agricultural
productivity in India by using district level data.
Pitt and Khandker (1998) use DD design to estimate the impact of gender-wise
participation in the Grameen Bank and other two micro credit programmes on labour supply,
schooling, household expenditure and assets in Bangladesh. Impact of building schools on
schooling and earnings in Indonesia by Duflo (2001) is another example of double difference
design.
Galiani, Gertler and Schargrodsky (2005) use a double difference design to study the
impact of privatising water services on child mortality in Argentina and suggest that
privatisation of water services reduces child mortality. The classic design for a double
difference estimator tracks the difference overtime between participants and nonparticipants.
Ditella and Schargrodsky (2005) by using DD design examine an increase in police forces on
the reduction of crime. Chaudhury and Parajuli (2008) use this method in examining the
impact of female school stipend programme on school enrolment in the Punjab province of
Pakistan taking panel data (from 2003) and follow up data (in 2005). Das (2016) uses this
method in examining the impact of Mahatma Gandhi National Rural Employment Guarantee
Agency (MGNREGA, an employment generation programme of GOI ensuring 100 days of
work in a year) on livelihood security using NSS data.
2.5. Instrumental Variable (IV) Method
Instrumental Variable (IV) method is one of the signature techniques in econometric tool kit
(Angrist & Krueger, 2001). This method is used when it is not possible to create a
comparison group randomly. It consists of using one or more variables (instruments). First of
all, it is used to predict the programme participation; then programme impact is estimated
basing upon the prediction. Geographic placement of programme, political variables and
discontinuities created by the programme design are three popular sources of instrumental
variables in evaluating anti-poverty programme in developing countries (Ravallion, 2008).
Examples of IV method is found in Angrist and Lavy (1999), where the authors by
taking class size of 40 as an instrumental variable (Maimonides’rule) shows a significant and
substantial increase in the test scores of grade four and grade five students due to reduce in
the class size.
Basley and Case (2000) studies the impact of compensation on wages and
employment by taking the presence of women in the state parliaments as the IV for workers’
compensation insurance. Similarly, Paxson and Schandy (2002) by using IV method studies
the impact of Peruvian Social Fund (FONCODES) and lays down a conclusion that the
investment bears a positive effect on school attendance rates for young children.
Following the Case and Deaton (1998), Duflo (2003) by taking eligibility as the
instrumental variable factor find that pension received by women has a greater impact on the
anthropometric status of girls and find no effect on boys. On the other hand, no effect of the
programme is found when pension is received by men. Another example of this approach is
found in the study of the impacts of nutrition programme in rural Colombia, where food and
child care facilities aere provided through local community centres (Attanasio and VeraHernandez 2004). Other examples of IV method are found in Duflo and Pande (2007)4.
Conclusion
From the above review based on various methods of impact evaluation, it is articulated that
most of the studies are based on experimental method (i.e. randomisation method); as studies
based on this method provides most robust estimate. Researchers have advocated this method
as ideal for estimating the main impact of a programme (Ravillion, 2008).
At the same time, studies of Lalonde (1986) and Dehejia and Wahba (1998, 1999)
suggest that non-experimental methods are more reliable and also provide the same result as
experimental method. But, omitted variables (Glazerman et al., 2002) and publication bias
(Delong & Lang, 1992) are the major problems when non-experimental methods are used.
Thus, none of these evaluation tools is ideal in all circumstances. Therefore, there is
the need of combination of methods to increase the robustness of the estimated counterfactual
and offset the limitations of single methods. The best example of combination of method is
found in Cattaneo, Galiani, Gertler, Martinez and Titiunik (2009), where they use both
difference-in-differences and matching methods to study the impact of PiscoFirme on child
cognitive development. By implementing matched difference-in-differences method the
researcher offsets the risk account for unobserved characteristics (that happens when only
4
Duflo and Pande (2007) studied the impact of dam construction on poverty.
propensity score matching is used) that explains the reason of getting enrolled in a
programme, which affects the outcomes also. Therefore, when the researcher combines
propensity score matching with double differences method, first of all, needs to perform
“matching” based on the observable characteristics and then diference-in- differences method
is used to estimate a counterfactual for change in the outcomes in each subgroup matched
units. Finally, an average is made across the matched subgroups from the double differences
(Gertler et al., 2007).
References
Abe,Y., & Gee, K.A. (2014). Sensitivity analysis for clustered data: An illustration from a
large-scale clustered randomized controlled trial in education, Evaluation and
program planning, 47, 26-34.
Agodini, R, & Dynarski, M. (2004). Are experiments the only option? A look at dropout
prevention programs, Review of Economics &Statistics, 86(1), 180-194.
Angrist, J., & Lavy,V. (1999). Using Maimonides’rule to estimate the effect of class size on
scholastic achievement, Quarterly Jourmal of Economics, 114 (2), 533-575.
Angrist, J., & Krueger, A.B. (2001). Instrumental variables and the search for identification:
From Supply and Demand to Natural Experiments, NBER, Working Paper No. 8456.
Angrist, J, Bettinger, E., Bloom, E., King, E., & Kremer,M., (2002). Vouchers for private
schooling in Colombia: Evidence from a Randomised Natural Experiment, Americian
Economic Review, 92(5), 1535-38.
Attanasio, O., & Vera-Hernandez, A.M. (2004). Medium and long run effects of nutrition and
child care: Evaluation of a community nursery programme in rural Colombia,
Working paper EWP04/06, Centre for the Evaluation of Development Policies,
Institute of Fiscal Studies, London.
Bamberger.M., Rao.V., & Woolcock, M. (2010). Using mixed methods in monitoring &
evaluation experiences from international development, Policy research working
paper, The world Bank.
. Banerjee, A., Cole, S., Duflo, E., & Linden, L. (2007). Remedying education: Evidence
from two randomized experiments in India, Quarterly Journal of Ecnomics, 122(3),
1235-1264.
Barrera, O., Linden, L., & Urquiola, M. (2007). The effects of user fee reductions on
Enrollment: Evidence from a Quasi-Experiment, Colombia University and World
Bank, Washington DC.
Basu,D. (1980). Randomization analysis of experimental Data: The Fisher randomization
test, Journal of Americian Statistical Association, 75, 305-325.
Basinga, P., Gertler, P.J. , Binagwaho, A., Soueat,A.L., Sturdy,J., & Vermersch,C.M (2011).
Effect on Maternal and Child health Services in Rwanda of Payment to primary
health-care providers for performance: an impact evaluation, The Lancet, 377(9775),
1421-8.
Besley, T., & Case, A. (2000).Unnatural experiments? Estimating the incidence of
endogenous policies, Economic Journal ,110, 672-694.
Binswanger, H., Khandker, S.R., & Rosenzberg, M. (1993). How infrastructure and financial
institutions affect agricultural output and investment in India, Journal of Development
Economics, 41, 337-366.
Blundell, R., & Dias, C.M. (2002). “Evaluation methods for Non- Experimental Data”,
Institute for Fiscal studies, 21 (4).
Brand, J.E., & Halaby, C.N. (2006). Regression and matching estimates of the effects f elite
college attendance on educational and career achievement, Journal of Royal Statistical
Society, Series A, 168(3), 473-512.
Buddlemeyer, H., & Skofias, E.
(2003). An Evaluation on the Performance of
RegressionDiscontinuity Design on PROGRESA, Institute for Study of Labor,
Discussion Paper No.827.
Burtless, G.(1985). Are targeted wage subsidies harmful? Evidence from a wage voucher
experiment. Economic perspectives, 9(2), 63-84.
Burton, P., Gurrin, L., & Sly, P. (1998). Tutorial in biostatistics, extending the simple linear
regression model to account for correlated responses: An introduction to the
Generalised estimating equations and multi level modelling, Statistics in Medicine,
17, 1261-1291.
Cameron, A.C., & Miller, D.L.(2011). Robust inference with clustered data, Handbook of
Empirical Economics and Finance.
Case, A., & Deaton, A. (1998). Large cash transfers to the elderly in South Africa, Economic
Journal, 108, 1330–1361.
Chaudhury, N., & Parajuli, D. (2008). Conditional cash transfer and female schooling: the
impact of female school stipend program on public school enrolments in Punjab,
Pakistan, World Bank Policy Research, Working paper No. 4102.
Cook, T.D., Shadish, W.R., & Wang, V.C.(2006). Three conditions under which experiments
and observational studies produce comparable causal estimates: New findings from
within-study comparisons, Journal of policy Analysis and Management, 27(4), 724750.
Chase, R. (2002). Supporting communities in transition: The impact of the American Social
Investment fund, World Bank Economic Review, 16(2), 219-240.
Creswell, J., & Clark, P.V. (2007). Designing and conducting mixed method research, sage,
Thousand oaks.
Deaton, A.S. Instruments for development : Randomization in the tropics, and the search for
the elusive keys to economic development, NBER working paper 14690, available at
http://www.nber.org/papers/w14690 (accessed on 9th Dec. 2016)
Dehejia, R., & Wadba, S. (1999). Causal effects in non-experimental studies: Re-evaluating
the evaluation of training programs, Journal of the American Statistical Association,
94, 1053-1062.
Dehejia, R. H., & Wahba, S. (2002). Propensity score matching methods for non
experimental causal studies, Review of Economics and Statistics, 84(1), 151-161.
Dehejia, R. (2005). Practical Propensity Score Matching: A reply to Smith and Todd, Journal
of Econometrics, Vol.125 (1-2), 355-364.
DeLong, J.B., & Long, K. (1992). Are all economic hypothesis false?, Journal of Political
Economy, 100(6), 1257-1272.
Diaz, J.J., & Honda, S. (2006). An assessment of Propensity Score matching as a nonexperimental impact estimator: Evidence from a Mexican Poverty Program, The
Journal of Human Resources, 41(2), 319-345.
Ditella, R., & Schargrodsky, E. (2005). Do police reduce crime? Estimates using the
Allocation of Police forces after a Terrorist Attack, American Economic Review, 94
(1), 115-33.
Dubin, J.A., & Rivers, D. (1993). Experimental estimates the impact of wage subsidies,
Journal of Econometrics, 56 (1/2), 219-242.
Duflo, E. (2001). Schooling and labour market consequences of school construction in
Indonesia: Evidence from an unusual policy experiment, American Economic Review,
9(4), 795-813.
Duflo, E. (2003). Grandmothers and granddaughters: Old age pension and intra household
allocation in South Africa, World Bank Economic Review, 17(1), 1-26.
Duflo, E., & Pande, R. (2007). Dams, Quarterly Journal of Economics, 122 (2), 601–646.
Filmer, D., & Schady, N. (2009). School Enrollment, Selection and Test Scores, World Bank
Policy Research Working Paper 4998, World Bank, Washington DC.
Fisher,R.A.(1935). The design of experiments, 8th edition, 1960, New York.
Galasso, E., & Rovallion, M. (2004). Social Protection in a crisis: Argentina’s plan Jefes Y
Jefas, World Bank Economic Review, 18(3), 367-399.
Galbraith,S., Daniel,J.A., & Vissel,B. (2010). A study of clustered data and approaches to its
analysis, The Journal of Neuroscience, 30(32), 10601-10608.
Galiani, S., Gertler, P., & Schorgrodsky, E. (2005). Water for life: The impact of the
Privatisation of Water Services on Child Mortality, Journal of Political Economy,
113(1), 83-119.
Gelman, A. (2006).
Multilevel (hierarchical) Modelling: What it can and cannot do,
Technometric, 48(3).
Gertler.P., Martinez.S., Premand.P., Rawling.B. L., & Vermeersch. J.M.C., (2007). Impact
evaluation in practice, Washington, DC: World Bank.
Gertler,P., Martinez.S., & Vivo,S. (2008). Child-mother provincial investment project plan
Nacer, University of California Berkley and World Bank, Washington DC. In
Gertler.P., Martinez.S., Premand.P., Rawling.B. L., & Vermeersch. J.M.C., (2007).
Impact evaluation in practice, Washington, DC: World Bank.
Glazerman, S., Levy, D., & Myers, D. (2003). Nonexperimental replications of Social
Experiments: A systematic Review, Mathematica Policy Research,Inc, Princeton,NJ
Glewwe, P., Kremer, M., & Moulin, S. (2002). Many children left behind? Text books and
Test Scores in Kenya, American Economic Journal: Applied Economics, 1(1), 112135.
Glewwe,P., Ilias,N., & Kremer,M. (2003). Teacher incentives, (Harvard University: Working
paper No.9671), NBER working paper series
Greene, J., Caracelli, V., & Graham W. (1989). Toward a Conceptual Framework for MixedMethod Evaluation Designs”, Educational Evaluation and Policy Analysis, Vol. 11,
255-274.
Ham, J., Li, X., & Reagan, P. (2004). Propensity score matching, a distance based measure of
migration, and the wage growth of young men, working paper. In Caliendo and
Kopeinig (2008). Some practical guidance for the implementation of propensity score
matching, Journal of Economic Survey, 22(1), 31-72.
Heckman, J., smith, J., & Clements, N. (1997).Making the most out of programme
evaluations and social experiments: Accounting for heterogeneity in programme
impacts, Review of Economic Studies, 64(4), 487-535.
Heckman,J., Ichimura,H., & Todd, P.E.,(1998a).Characterising selection bias using
experimental data, Econometrica,66(5), 1017-1098.
_______________(1998b).Matching as an Econometric Evaluation Estimator, Review of
Economic Studies, April, 261-294.
Heckman,James,J., & Vytlacil,E. (2005). Structural equations, Treatment Effects and
Economic Policy Evaluations, Econometrica, 73(3), 669-738.
Imas, L. G.M., & Rist, R.C. (2009). The road to Results: Designing and conducting Effective
Development Evaluations, Washington, DC: World Bank.
Johnson, B., & Onwuegbuzie, A. (2004). Mixed Methods Research: A Research Paradigm
Whose Time Has Come, Educational Researcher, 33(7), 14-26.
Karlan,D., & Zinman,J. (2010). Expanding credit access using randomised supply decisions
to estimate the impacts, Review of Financial Studies, Society for Financial Studies,
23(1), 433-464.
Kremer,M., Moulin,S., & Namunyu,R., (2002).Decentralization: A cautionary tale, Harvard
University.
Lalonde, R.J. (1986). Evaluating the econometric evaluations of training programs with
experimental data, The Americian Economic Review, 76( 4), 604-620.
Lee, B.K., Lessler, J., & Stuart, E.A. (2010). Improving propensity score weighting using
machine learning , Statistics in medicine, 29,337-346.
Levy, D., & Ohls, J. (2007). Evaluation of Jamaica’s PATH Program: Final Report,
Mathematica Policy Research, Inc; Ref. 8966-090, Washington DC.
Limieux,T., & Milligun,K. (2005). Incentive Effects of Social Assistance: A Regression
Discontinuity Approach, NBER working Paper 10541, National Bureau of Economic
Research, Cambridge, MA.
Li, F., Song, Y., Yi, H., Zhang, L., & Shi, Y. (2017). The impact of conditional cash transfers
on the matriculation of junior high school students into rural China’s high schools,
Journal f Development Effectiveness, 9(1), 41-60.
Martinez, S.(2004). Pensions, Poverty and Household Investments in Bolivia, University of
California, Berkeley, CA.
McKienzie,D., Assaf, N., & Cusolito, A.P. (2017). The additionality impact of a matching
grant programme for small firms: experimental evidence from Yemen, Journal of
Development Effectiveness, 9(1), 1-14.
Miguel,E., & Kremer, M.,(2003a). Worms: Identifying Impacts on Education and Health in
the presence of Treatment Externalities”, Econometrica.
Newman, J., Pradhan, M., Rawlings, L.B., Ridder, G., Coa, R., & Evia, J.L. (2002). An
Impact evaluation of Education , Health and Water supply investments by the
Bolivian Social Investment Fund, World Bank Economic Review, 16 (2), 241-274.
Paxson, C., & Schady, N.R. (2002). The allocation and impact of social funds: Spending on
school infrastructure in Peru, World Bank Economic Review, 16, 297–319.
Perkins, S.M., Tu,W., Underhill, M.G., Zhou, X., & Murrouy,M. (2000). The use of
propensity scores on pharmacoepidemiologic research, Pharmacoepidemiology and
Drug Safety, 9(2), 93-101.
Pitman, E. J.G. (1937). Significance tests which can be applied to samples from any
population III. The analysis of Variance test, Biometrika, 29, 322-335.
Pitt,M.M., & Khandker,S.R. (1998). The impact of group based credit programs on poor
households in Bangladesh Does gender of participants matter?, Journal of political
economy, 100(3).
Rao.V., & Woolcock. M., (2003). Integrating qualitative and quantitative approaches in
program evaluation. Qualitative and quantitative approaches, Chapter 8:165-190.
Raudenbush,S.W., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data
analyses methods, Thousand Oaks,CA: Sage publication.
Ravallion, M., & Wodon,Q. (2000). Does child labour displace schooling? Evidence on
behavioural responses to an enrolment subsidy, Economic Journal, 110, 158-176.
Ravallion, M. (2008).Evaluating anti- poverty programs, Handbook of Development
Economics, 4, chapter 59, 3788-3840.
Rubin, D.B., & Thomas, N. (2000). Combining propensity score matching with additional
adjustments for prognostic covariates, Journal of American Statistical Association,
95, 573- 585.
Schultz, P.T. (2004). School subsidies for evaluating the Mexican Progresa poverty program,
Journal of Development Economics, 74(1), 199-250.
Smith, J. & Todd, P. (2005). Does matching overcome Lalonde’s critique of nonexperimental estimators?, Journal of Econometrics, 125(1-2), 305-353.
Tashakkori, A., & Teddlie, C. (1998).Mixed Methodology, Combining Qualitative and
Quantitative Approaches, Sage, Thousand Oaks.
Thabane,L., Mbuagbaw,L., Zhang,S., Samaan, Z., Marcucci,M., Ye, C., & Kosa, D.(2013). A
tutorial on sensitivity analyses in clinical trials: The what, when and how, BMC
Medical Research methods, 13(1), 92
Vermeersch, C., (2002). School Meals, Educational achievements and school
competition:
Evidence from a Randomised Experiment, mimeo, Harvard University.
White.H. (2002). “Combining quantitative and qualitative approaches in poverty analysis,
World development”, 30(3), 511-522.
Woodbury, S.A., & Spiegelman, R.G. (1987). Bonuses to Workers and Employers to reduce
Unemployment: Randomised Trials in Illinois, American Economic Review, 77(4),
View publication stats
Descargar