Subido por SERGIOBLANES

Error Rates in Forensic Child Sexual Abuse Evaluations

Anuncio
Psychol. Inj. and Law (2010) 3:133–147
DOI 10.1007/s12207-010-9073-0
Error Rates in Forensic Child Sexual Abuse Evaluations
Steve Herman & Tiffany R. Freitas
Received: 14 January 2010 / Accepted: 29 March 2010 / Published online: 27 May 2010
# Springer Science+Business Media, LLC. 2010
Abstract When mental health, medical, and social work
professionals and paraprofessionals make false positive or
false negative errors in their judgments about the validity of
allegations of child sexual abuse, the consequences can be
catastrophic for the affected children and adults. Because of
the high stakes, practitioners, legal decision makers, and
policy makers should have some idea of the magnitude and
variability of error rates in this domain. A novel approach
was used to estimate individual error rates for 110
professionals (psychologists, physicians, social workers,
and others) who conduct or participate in forensic child
sexual abuse evaluations. The median estimated false
positive and false negative error rates were 0.18 and 0.36,
respectively. Estimated error rates varied markedly from
one participant to the next. For example, the false positive
error rate estimates ranged from 0.00 to 0.83. These
estimates are based on participants’ self-reported substantiation rates and on their subjective frequency distributions
for the probability of truth for the abuse allegations they
evaluate.
Keywords Child sexual abuse . Forensic evaluation .
Judgment . Overconfidence . Accuracy
This research was supported in part by grant #P20MD001125 from the
National Institutes of Health.
S. Herman (*) : T. R. Freitas
Department of Psychology, University of Hawaii at Hilo,
200 West Kawili Street,
Hilo, HI 96720, USA
e-mail: [email protected]
URL: http://steveherman.com
Each year in the USA, caseworkers for Child Protective
Services (CPS) agencies conduct approximately 150,000
forensic evaluations in cases of alleged or suspected child
sexual abuse (US Department of Health and Human
Services 2009). Tens of thousands of additional child
sexual abuse (CSA) evaluations by mental health, social
work, and medical1 professionals and paraprofessionals
(henceforth, collectively referred to as mental health
professionals or MHPs) occur in other settings, for
example, in Child Advocacy Centers (National Children's
Alliance 2007), in specialized sexual assault clinics (Davies
et al. 1996), and in disputed child custody cases (Bow et al.
2002). Forensic CSA evaluations by MHPs are also
common in other countries (Hershkowitz et al. 2007b;
Lamb 1994; Sternberg et al. 2001; Wilson 2007).
Some have argued that MHPs who perform forensic
CSA evaluations should not explicitly state their opinions
about the “ultimate legal issue”—whether or not abuse
allegations are true—because of the lack of a sound
scientific basis for such opinions (e.g., Kuehnle and Sparta
2006). However, in most forensic CSA evaluations, MHPs
are expected to provide explicit or implicit opinions about
the likelihood that the abuse allegations are true (Berliner
and Conte 1993). For example, CPS caseworkers in the
US are generally required to classify abuse allegations as
1
Psychiatrists, other physicians, and nurses collect and evaluate both
medical evidence (the physical signs of abuse) and psychosocial
evidence (verbal reports during interviews with the alleged child
victim and others, documentation of past verbal reports, opinions of
other professionals, and behavioral observations). This article focuses
on psychosocial evaluations. To the extent that medical personnel
evaluate the validity of abuse allegations on the basis of psychosocial
data, they are performing psychosocial evaluations.
134
either substantiated2 or unsubstantiated following their
investigations.
When MHPs make false positive errors—judging false
or erroneous CSA allegations to be true—the consequences can be catastrophic for the adults and children
involved (e.g., Bikel 1997; Boyer and Kirk 1998; Bruck
et al. 1998; Ceci and Bruck 1995; Fukurai and Butler
1994; Garven et al. 1998; Humphrey 1985; Johnson 2004;
Nathan and Snedeker 2001; Rabinowitz 2003; Robinson
2005, 2007; Rosenthal 1995; San Diego County Grand
Jury 1992, 1994; Schreiber et al. 2006; Seattle Post
Intelligencer 1998; Wexler 1990). Although there are
fewer anecdotal reports of false negative errors by MHPs,
the reports that do exist indicate that the consequences of
these errors can also be dire (e.g., Pawloski 2005; Swarns
and Cooper 1998; Waldman and Jones 2008).
With so much riding on MHPs’ judgments about the
validity of allegations of CSA, one might think that legal
decision makers would be keenly interested in the “known
or potential error rate”3 for these judgments. In fact,
relatively little legal attention has been focused directly on
the accuracy of either professional or lay judgments about
the validity of allegations of CSA. For example, the Oregon
Supreme Court recently ruled that an expert opinion that an
allegation of sexual abuse was true was admissible as
scientific evidence, even though the opinion was based
primarily on an expert’s assessment of the credibility of the
verbal reports of the alleged child victim and others (State
v. Southard 2009). In their analysis of the indicia of
scientific validity for the proffered diagnosis of sexual
abuse, the Court relied primarily on the general acceptance
criterion (State v. Brown 1984; Frye vs. United States
1923), summing up their analysis as follows: “Because
there was no physical evidence of sexual abuse in this case,
the [experts] based [their] diagnosis on: (1) the boy's
2
The verb to substantiate and the adjective substantiated are used
throughout this article in the sense that is familiar in this context: to
describe a data collection and evaluation process which culminates in
an MHP’s judgment that there is enough evidence that an allegation of
sexual abuse is true to warrant State intervention. This usage may be
misleading and confusing to those not familiar with child abuse
jargon, because it is not consistent with the general meaning of the
verb to substantiate. The Oxford English Dictionary defines substantiate as to “provide evidence to support or prove the truth of” (Soanes
and Stevenson 2005). However, there are many cases in which MHPs
“substantiate” abuse allegations that are not supported by any
evidence other than the child’s report. In other words, in many cases
that they “substantiate,” MHPs do not actually find or provide any
new evidence that “supports or proves the truth of” an abuse
allegation. In essence, “substantiation” often boils down to an MHP
expressing an opinion that a child is telling the truth. Despite this
fundamental definitional problem, for the sake of convenience and
readability, the terms to substantiate and substantiated are used
throughout this article in the familiar way.
3
This phrase comes from the well-known US Supreme Court
decision, Daubert vs. Merrell Dow Pharmaceuticals, Inc. (1993).
Psychol. Inj. and Law (2010) 3:133–147
reported behaviors and (2) [their] determination that the
boy's reports of sexual abuse were credible…. The experts
were all qualified, the techniques used are generally
accepted, the procedures rely on specialized literature in
the field, and the procedures used are not novel” (State v.
Southard 2009, p. 137). The Court did not address the issue
of the known or potential error rate for expert judgments
about the validity of CSA allegations. The question of
whether or not a diagnostic technique produces accurate
results is arguably more important than whether or not the
technique is generally accepted by a scientific or clinical
community, supported by “specialized literature,” novel, or
used by qualified experts. Interestingly, the Court went on
to rule that the experts’ opinion that the abuse allegations
were true should, nevertheless, have been excluded at trial
because the potential for prejudice—the likelihood that
jurors would ascribe too much weight to the experts’
opinion—outweighed the probative value of that opinion.
The Court explained that, in their opinion, the jury was just
as capable of judging the credibility of the child and other
witnesses as were the experts and that, therefore, the
incremental probative value of the experts’ opinion that
the allegations were true was “minimal” (State v. Southard
2009, p. 141).
A notable exception to the US courts’ general failure to
clearly focus on the accuracy of expert and lay opinions
about the validity of allegations of CSA comes from the US
Supreme Court, which recently overturned a death penalty
sentence in a child sexual abuse case. Among other reasons
for their decision, the Court, in effect, acknowledged that
false positive errors were more likely in professional and
lay judgments about the validity of CSA allegations than in
judgments about other types of allegations of criminal
conduct:
The problem of unreliable, induced, and even imagined child testimony means there is a “special risk of
wrongful execution” in some child rape cases.…
Studies conclude that children are highly susceptible
to suggestive questioning techniques like repetition,
guided imagery, and selective reinforcement…. Similar criticisms pertain to other cases involving child
witnesses; but child rape cases present heightened
concerns because the central narrative and account of
the crime often comes from the child herself. She and
the accused are, in most instances, the only ones
present when the crime was committed.
(Kennedy v. Louisiana 2009, p. 2663)
Like the legal community, the scientific community has
shown a relative lack of interest in attempting to estimate
error rates for professional and lay judgments about the
validity of CSA allegations. Although there is a widespread
(but not universal) consensus among prominent researchers
Psychol. Inj. and Law (2010) 3:133–147
in this field that MHPs’ judgments about the validity of
uncorroborated allegations of sexual abuse lack a firm
scientific foundation (Fisher 1995; Fisher and Whiting
1998; Goodman et al. 1998; Horner et al. 1993; Melton and
Limber 1989; Poole and Lindsay 1998), there are very few
empirical studies or reviews that have focused directly on
this issue.
Why have both the legal and scientific communities paid
relatively little attention to the accuracy question and, by
contrast, so much attention to other related issues such as
the influence of questioning methods on the reliability of
children’s reports of past events? After all, the main reason
that researchers are interested in studying questioning
methods is because certain questioning methods can reduce
the reliability of children’s reports, and unreliable reports
from children can lead to errors in expert and lay judgments
about the validity of CSA allegations. Yet there have been
hundreds of research studies and books that have focused
on interview techniques and protocols for forensic child
interviews (e.g., Lamb et al. 2008), but only a handful of
studies and reviews that have focused directly on the
accuracy of expert or lay judgments about the validity of
CSA allegations. One obvious reason for this discrepancy is
that most legal decision makers and researchers apparently
believe that there is no reliable scientific way to study
judgment accuracy in this domain (cf. Horowitz et al. 1995).
It is true that we cannot directly evaluate judgment
accuracy in this domain because we cannot conduct
experiments in which some children are randomly assigned
to be sexually abused in order to determine whether or not
professionals can distinguish between abused and nonabused children. Nor can we subject children to the kinds of
intensive suggestive questioning techniques that sometimes
produce false reports and false memories of sexual abuse in
order to see if professionals can distinguish between true
and false reports, because such experiments would pose a
severe risk of psychological harm to the children involved.
Field studies of real forensic evaluations are also hampered
by the absence, in most cases, of a reliable gold standard
that could be used to independently assess the accuracy of
evaluator judgments.
Despite these methodological constraints, there have
been a handful of empirical studies that have provided data
that is relevant to estimating error rates in professional
judgments about the validity of CSA allegations (Finlayson
and Koocher 1991; Horner and Guyer 1991a,b; Horner et
al. 1992; McGraw and Smith 1992; Realmuto et al. 1990;
Realmuto and Wescoe 1992; Shumaker 2000). These
studies have focused on the subset of CSA cases in which,
from the evaluator’s point of view, there is no strong
corroboration for a child’s verbal report of sexual abuse. Of
course, these are precisely the cases in which MHPs’
opinions are most likely to play critical roles. Herman
135
(2005) analyzed data from the aforementioned studies, and
other empirical studies of clinical judgment, and concluded
that the overall error rate for professionals’ judgments about
the validity of uncorroborated allegations of CSA was
greater than 0.24, but he did not provide separate estimates
of false positive and false negative error rates.
A ground-breaking study by Hershkowitz et al. (2007a,
b) provides the best empirical data so far on MHPs’ ability
to distinguish between uncorroborated true and false reports
of sexual abuse made by children during forensic interviews. In that study, 42 experienced forensic evaluators
read transcripts of 24 actual investigative interviews that
were conducted with alleged child victims of sexual abuse.
The researchers combed through a large historical database
to select transcripts from 12 cases in which there was strong
independent evidence that the child’s interview report was
true and from 12 cases in which there was strong
independent evidence that the child’s report was false.
Study participants were asked to judge the validity of the
children’s verbal reports of sexual abuse without access to
the independent evidence. The false positive and false
negative error rates were 0.44 and 0.33, respectively; the
overall error rate was 0.39. See the original report by
Hershkowitz et al. and an analysis by Herman (2009) for
more on this important study.
Another body of psychological research that is relevant
to assessing the accuracy of evaluator judgments about
CSA allegations is the research on the ability of laypersons
and experts to determine when people are lying. Although
studies in this area are usually referred to as “deception
detection” studies, they might be better described as
“honesty and deception detection” studies because they
usually evaluate participants’ ability to correctly detect false
and true statements. The reason this research is important to
assessing the accuracy of MHPs’ judgments about CSA
allegations is because, in many cases, MHPs’ judgments are
based primarily on their determinations that a child or adult
is either telling the truth or lying. The evidence from many
experimental studies is remarkably consistent: the majority
of laypersons and professionals have little or no ability to
discriminate between true and false statements about past
events made by either children or adults. In a meta-analysis
of 206 deception detection studies, Bond and DePaulo
(2006) found that the average rate of correct classification
was 54%, only slightly higher than the expected chance
accuracy rate of 50% (equivalent to a correlation of r≈0.08
between judgments and reality). It is disturbing to note that
in the 19 studies in the Bond and DePaulo meta-analysis
that directly compared the abilities of professionals (MHPs,
police officers, detectives, judges, customs officials, polygraph examiners, job interviewers, federal agents, and
auditors) and laypersons to correctly classify true and false
statements, the experts, the people we rely on to ferret out
136
the truth, were just as inaccurate as the laypersons:
laypersons were correct 55.7% of the time in these 19
studies, the experts, 54.1% of the time.
One experimental study that specifically examined the
abilities of professionals and laypersons to discriminate
between true and false statements made by young children
(ages 5-6), adolescents, and adults found that adults are no
better at detecting truth and deception in young children
than they are at detecting them in adolescents and adults
(Vrij et al. 2006). In the Sam Stone experiment, Leichtman
and Ceci (1995) used suggestive interviewing techniques
to induce children to make false reports about classroom
events. They showed videotapes of true and false
statements made by children who participated in the
experiment to approximately 1,500 experienced child
professionals (judges, child development researchers, and
MHPs). These professionals “generally failed to detect
which of the children’s claims were accurate and which
were inaccurate, despite being confident in their judgments… The very children who were least accurate were
rated as most accurate” (Ceci and Bruck 1995, p. 281).
Other experimental studies have also consistently found
that most adults have little or no ability to discriminate
between true and false statements made by children
(Connolly et al. 2008; Edelstein et al. 2006; Leach et al.
2004; Talwar et al. 2006; Vrij 2002; Vrij and van
Wijngaarden 1994).
To make matters worse, professionals tend to be
overconfident in their abilities to detect honesty and
deception in both children and adults, and there is little or
no correlation between confidence or experience and
deception detection ability (Aamodt and Custer 2006;
Colwell et al. 2006; DePaulo et al. 1997; DePaulo and
Pfeifer 1986; Elaad 2003; Garrido et al. 2004; Kassin 2002;
Mann et al. 2004; Vrij and Mann 2001). The absence of
significant confidence-accuracy and experience-accuracy
correlations means that confident, experienced experts, the
ones who are most likely to influence legal decisions, are
likely to be just as inaccurate in their judgments about
honesty and deception as are less experienced or less
confident experts.
A New Approach to Estimating the Accuracy of MHPs’
Judgments
The current study was designed to implement a new
approach to estimating false positive and false negative
error rates for MHPs’ judgments about the validity of CSA
allegations. In this retrospective survey study, MHPs who
frequently conduct or participate in forensic CSA evaluations were asked to estimate (a) the percent of their
evaluations in which they substantiated sexual abuse
Psychol. Inj. and Law (2010) 3:133–147
allegations and (b) the percent of their cases that fell into
each of five mutually exclusive probability-of-truth bins.
The five probability-of-truth bins were 0-20%, 20-40%, 4060%, 60-80%, and 80-100%. Using these data, it was
possible to calculate for each participant (a) the implied
probability-of-truth threshold required for substantiation
and (b) the implied false positive, false negative, and
overall error rates.
Methods
Data were collected over the Internet using a customdesigned, branching, Web-based survey. The Web application that was used to deliver the survey was written in the
Perl programming language. The survey was lengthy, about
40 Web pages (because this was a branching survey, the
exact number of pages varied); it took about 1 h to
complete. The survey included questions about participants'
backgrounds, education, and training. There were numerous
items that focused on characteristics of participants’ caseloads, substantiation rates, knowledge of relevant research,
and other topics related to forensic CSA evaluations.
Participants were recruited through personal contacts, a
targeted postal mailing to psychologists who conduct child
custody evaluations, and announcements that were sent to
email lists for forensic MHPs (e.g., the PSYLAW-L, child
custody, and IPSCAN email lists). The study announcements stated that English-literate professionals or paraprofessionals who had conducted or participated in at least
ten forensic CSA evaluations were eligible to participate.
Respondents who asserted via email or telephone that they
met the eligibility criteria were emailed a unique ID and
password that they used to access the survey Website.
Participants who completed the survey were paid $50. As in
many other online and paper-based survey studies, it is
impossible to estimate the response rate, since there is no
way of knowing how many eligible MHPs read a request to
participate in the study.
The survey was completed by 123 individuals over a
5-month period. Thirteen surveys (11%) were dropped
because the professional subgroup was too small (i.e., only
five law enforcement investigators completed the survey),
important data was missing or entered incorrectly and could
not be corrected (four surveys), or the responses appeared
to be random or inconsistent (four surveys), leaving a total
of 110 complete, consistent protocols. Because data was
checked for validity as it was entered by participants, and
participants were prompted to correct likely data entry
errors before being allowed to proceed, the rates of missing
data and entry errors were low, with no single variable
missing or likely incorrect for more than 5% of all
respondents.
Psychol. Inj. and Law (2010) 3:133–147
Survey Items
Because no available survey instruments adequately
addressed the topics covered in the current study, the
survey items were written for this study. The wording of
questions was refined using think-aloud protocol sessions
with forensic MHPs. Written feedback was also obtained
from a number of MHPs who completed early versions of
the survey.
Results
Sample characteristics are described in Table 1. The study
included a broad sampling of child abuse professionals and
paraprofessionals. About half of the sample held advanced
degrees (Ph.D., M.D., or equivalent). The primary child
abuse professions (psychology, medicine, and social
work) were well represented. About half the sample
worked primarily for Child Advocacy Centers or child
protection agencies (including both employees and
contractors). The median participant devoted about 30%
of his or her time to forensic CSA evaluations; 25% of
participants devoted 80% or more of their time to CSA
evaluations. The median number of hours devoted to
each CSA evaluations was six; but about 20% of
participants reported spending 3 or fewer hours per
evaluation. Participants who devoted 3 or less hours to
each evaluation were, in general, participants who served
primarily as forensic child interviewers on multidisciplinary teams. A substantial subset of participants, about
45%, worked on a fee-for-service basis; Table 1 data on
hourly and total fees is limited to this subset.
The eligibility criteria specified a minimum of ten
evaluations, so ten was the smallest number of evaluations
performed. The median participant had conducted or
participated in 162 forensic CSA evaluations. About 10%
of participants had participated in 2,000 or more evaluations. There was one participant who reported 8,000
evaluations. This participant and others with very high
numbers of evaluations were contacted via email to ensure
that these were not data entry errors. These were not data
entry errors, but came from participants who worked in
specialized sexual assault clinics in urban areas where they
had performed several child examinations each day for
many years. In all, study participants had conducted or
participated in a cumulative total of approximately 90,000
forensic CSA evaluations.
Table 2 shows basic data on participants’ self-reported
substantiation rates, substantiation thresholds, and
probability-of-truth distributions. In addition to the means
and standard deviation, Table 2 shows means weighted by
number of cases, medians, and ranges for each variable.
137
The wording of the questions that were used to collect the
data shown in Table 2 was as follows:
[Substantiation rate:] In what percent of your CSA
evaluations do you (or your team) classify the abuse
allegations as substantiated?
[Substantiation threshold:] What is the minimum level
of probability that the allegations are true that you
would require in order to classify an abuse allegation
as substantiated? Please estimate this probability
using a scale from 0% probability (you are absolutely
certain that the abuse allegations are false) to 50%
probability (you believe that there is a 50% chance
the allegations are true and a 50% chance that the
allegations are false) to 100% probability (you are
absolutely certain that the abuse allegations are true).
[Probability-of-truth:] To answer the questions on this
page, please consider your own personal beliefs about
the probability that the CSA allegations are true in the
cases you (or your team) evaluate…. Please try to
estimate the percent of cases that fall into each of
these categories: (the total should be 100%)
1. The probability that the allegations are true is
80-100% (you are certain or fairly certain that the
allegations are true).
2. The probability that the allegations are true is
60-80%.
3. The probability that the allegations are true is
40-60%.
4. The probability that the allegations are true is
20-40%.
5. The probability that the allegations are true is
0-20% (you are certain or fairly certain that the
allegations are false).
As can be seen from the data in Table 2, the range of
responses to each of these questions was extreme, for
example, the substantiation rates ranged from 0.00 to 0.95.
The patterns of subjective frequency distributions in response
to the probability-of-truth of question also varied widely;
four representative distributions are shown in Fig. 1.
The first line in Table 3 shows participants’ self-reported
estimates of their own error rates. Specifically, the rates
shown in the columns headed “1–PPV” and “1–NPV”
show the median response and range of responses to these
questions:
When people make decisions about the validity of
allegations of child sexual abuse, they sometimes make
incorrect classifications. You may discover that a
mistake has been made if new evidence emerges after
138
Table 1 Sample Characteristics
(N = 110)
Psychol. Inj. and Law (2010) 3:133–147
Variable
%
Female gender
Age
US resident
Ethnicity/race
African American
Asian/Pacific Islander
Hispanic
White
Highest degree
AA or some college
BA or equivalent
MA or equivalent
PhD, MD, or equivalent
Profession
Caseworker
70
M (SD)
Median
Range
45 (12)
47
26-75
11 (8)
0.43 (0.31)
731 (1,309)
13 (13)
10
0.30
162
6
1–30
0.00–1.00
10–8000
1–60
$122 ($87)
$1,565 ($1,498)
$100
$1,000
$13–$360
$60–$6,000
92
3
8
4
85
4
8
37
51
21
Social worker
Counselor
Nurse
Physician
Forensic psychologist
Other psychologist
Employer(s)
Child advocacy center
Child protection agency
Medical or hospital setting
Court-appointed
Law enforcement agency
Defense in civil case
Defense in criminal case
Plaintiff in civil case
Prosecution in criminal case
Other
Annual work income (US dollars)
19
2
7
23
16
12
$10,000-29,999
$30,000-49,999
$50,000-69,999
$70,000-99,999
$100,000 or more
Years performing CSA evaluations
Proportion of time devoted to CSA evaluations
Total CSA evaluations performed
Number of hours per evaluations
Member of multi-disciplinary team
Hourly rate in US dollars (n=58)
Total fee in US dollars per
evaluation (n=51)
4
28
24
16
30
29
26
14
14
3
3
4
1
3
5
40
Psychol. Inj. and Law (2010) 3:133–147
Table 2 Self-reported substantiation rate, substantiation
threshold, and probability of
truth (N=110)
The second column shows the
mean and standard deviation
weighted by the number of cases
that each participant reported
having evaluated
139
M (SD)
Weighted M (SD)
Median
0.57 (0.24)
0.66 (0.16)
0.50
0.68
0.00-0.95
0.40–1.00
0.47
0.17
0.15
0.09
0.12
0.30
0.14
0.10
0.10
0.10
0.00–1.00
0.00–1.00
0.00-0.80
0.00–1.00
0.00-0.90
Substantiation rate
0.47 (0.24)
Substantiation threshold
0.68 (0.15)
Probability-of-truth of allegations evaluated
80-100%
0.41 (0.28)
60-80%
0.17 (0.16)
40-60%
0.14 (0.14)
20-40%
0.10 (0.12)
0-20%
0.17 (0.20)
an evaluation is complete. For example, an allegation
is classified as unsubstantiated, and then, perhaps
months later, the alleged perpetrator confesses. Even
the most accurate evaluators are bound to make some
errors if they evaluate enough cases. One type of
mistake is when a true allegation of abuse is classified
as unsubstantiated (or inconclusive or unfounded).
Another type of mistake is when a false or erroneous
allegation of abuse is classified as substantiated.
[1–NPV:] Please try to estimate what percent of all of
your (or your team's) decisions to classify an abuse
allegation as unsubstantiated (or inconclusive or
unfounded) are incorrect. In other words, what
percent of all of the CSA allegations that you (or
your team) classify as unsubstantiated (or inconclusive or unfounded) are actually true allegations?
[1–PPV:] Now, please try to estimate what percent of all
of your (or your team’s) decisions to classify an abuse
Fig. 1 Examples of subjective
distributions for the probabilityof-truth for allegations evaluated
allegations as substantiated are incorrect. In other
words, what percent of all of the CSA allegations that
you (or your team) classify as substantiated are actually
false or erroneous allegations?
The other figures shown in the first row of Table 3 were
either reported directly by participants (i.e., Substantiation
Threshold) or calculated from the self-reported substantiation
rates and the self-reported values for 1-PPV and 1-NPV.
Analysis
The data shown in the first row of Table 2 indicates that the
median study participant believed that he or she erred in 5%
of his or her decisions to classify allegations as substantiated and in 10% of his or her decisions to classify
allegations as unsubstantiated. The estimated overall error
rate based on these self-reported figures is 0.08. This
Frequency
Participant 10
Participant 29
1
1
.8
.8
.6
.6
.4
.4
.2
.2
0
.00-.20 .20-.40 .40-.60 .60-.80 .80-1.0
0
Frequency
Participant 63
1
.8
.8
.6
.6
.4
.4
.2
.2
.00-.20 .20-.40 .40-.60 .60-.80 .80-1.0
Probability allegation is true
.00-.20 .20-.40 .40-.60 .60-.80 .80-1.0
Participant 94
1
0
(0.25)
(0.11)
(0.11)
(0.10)
(0.15)
Range
0
.00-.20 .20-.40 .40-.60 .60-.80 .80-1.0
Probability allegation is true
140
Psychol. Inj. and Law (2010) 3:133–147
Table 3 Error rate estimates: median and range (N=110)
Estimate source
Participant self-report
Calculated,
0% overextremity
Calculated, 10%
overextremity
Calculated, 20%
overextremity
Calculated, 30%
overextremity
Substantiation
threshold
Med (range)
Error
FPR
FNR
1–PPV
1–NPV
Med (range)
Med (range)
Med (range)
Med (range)
Med (range)
0.68 (0.40-1.00)
0.70 (0.19-1.00)
0.08 (0.00-0.44)
0.28 (0.11-0.75)
0.05 (0.00-0.90)
0.18 (0.00-0.83)
0.11 (0.00-0.80)
0.36 (0.02-1.00)
0.05 (0.00-0.60)
0.12 (0.01-0.62)
0.10 (0.00-0.97)
0.39 (0.09-0.84)
0.66 (0.25-0.90)
0.33 (0.19-0.70)
0.27 (0.00-0.88)
0.40 (0.03-1.00)
0.20 (0.11-0.60)
0.41 (0.18-0.78)
0.62 (0.31-0.80)
0.37 (0.26-0.65)
0.34 (0.00-0.91)
0.43 (0.03-1.00)
0.27 (0.20-0.58)
0.43 (0.26-0.71)
0.58 (0.38-0.70)
0.41 (0.34-0.60)
0.40 (0.00-0.93)
0.46 (0.04-1.00)
0.35 (0.30-0.55)
0.45 (0.34-0.64)
Med median, Substantiation threshold the minimum probability of truth required for substantiation, Error total error rate (false positives+false
negatives), FPR false positive rate, FNR false negative rate; 1-PPV 1-positive predictive value, the proportion of all substantiations that are
erroneous, 1-NPV 1- negative predictive value, the proportion of all non-substantiations that are erroneous
estimate is significantly lower than the lower bound
estimate of 0.24 that was calculated by Herman (2005) for
uncorroborated allegations. A potential problem with these
self-reported estimates is that, when people are asked to
directly estimate their own error rates, a number of
cognitive heuristics and situational factors are likely to
result in a bias towards underestimation. In this case, these
factors include overconfidence in judgments made under
conditions of uncertainty (Baron 2008; Lichtenstein et al.
1982); a lack of useful corrective feedback, which is
generally necessary for learning from mistakes (Dawes
1994); confirmation bias (Poole and Lamb 1998); and other
biases, heuristics, and cognitive illusions (Baron 2008;
Poole and Lamb 1998).
In order to provide error rate estimates that may be less
subject to underestimation biases, the following approach
was taken. First, continuous probability density functions
were fitted to the five-bin subjective probability-of-truth
distributions; an example, Participant 32 is shown in Fig. 2.
Participant 32 was chosen to illustrate the estimation of
error rates process because her error rates were close to the
median rates. Curves were fitted using weighted beta
distributions; one distribution for unimodal distributions
and two for bimodal distributions. The curve fits were good
for most participants, although there were a few distributions that did not result in good fits (e.g., the distribution for
Participant 94 in Fig. 1). For participants with good curve fits
there were virtually no differences between error rate
estimates based on empirical integration using the fitted
curves and a simpler approach in which all of the cases in each
bin (or part of a bin) were assigned the average probability for
that bin (or that part of the bin). Because the simpler approach
produced virtually the same results as integration when there
were good curve fits, the simpler approach was used for all
estimates. Second, the self-reported substantiation rate for
4
.8
4
.7
3
3
.4
2
Density
Frequency
.5
Density
.6
2
Probability threshold for substantiation (.67) →
.3
1
.2
1
Unsubstantiated
area = .50
.1
0
0
.00 - .20
.20 - .40
.40 - .60
.60 - .80
Probability that an allegation is true
.80 - 1.0
Fig. 2 Self-reported probability distribution for participant 32
0
0
.1
.2
.3
.4
.5
.6
.7
Probability that an allegation is true
Substantiated
area = .50
.8
.9
1
Fig. 3 Calculation of the substantiation threshold (0.67) from the
substantiation rate (0.50) for Participant 32
Psychol. Inj. and Law (2010) 3:133–147
141
each participant was combined with the probability-of-truth
distributions to calculate the implicit minimum probability-oftruth required to substantiate an allegation. This procedure is
illustrated in Fig. 3 for Participant 32. Finally, the area under
the curve for each participant was divided into four subareas
representing true positives, false positives, true negatives,
and false negatives. To do this, the participants’ own
subjective probabilities were used, as illustrated in Fig. 4,
so that, for example, it was assumed that 90% of the cases at
the 90% probability-of-truth level represented true allegations. Implied error rates (overall error, the false positive
error rate, the false negative error rate, 1 – the positive
predictive value, and 1 – the negative predictive value) were
calculated. The median calculated error rates and ranges are
shown in the second row of Table 3.
Although the median calculated error rates shown in the
second row of Table 3 are much higher than the median
self-reported estimates shown in the first row, they are,
nevertheless, likely to represent underestimates of the true
median error rates. Judgment and decision researchers have
found that when experts and laypersons make judgments
about the probability that their own judgments about
uncertain quantities or events are correct, they often
overestimate the probability that their judgments are correct
(Baron 2008). When experts make difficult judgments about
whether or not an event occurred or will occur, overconfidence is often manifested as overextremity (Griffin and
Brenner 2004).
The overextremity effect has been demonstrated in
empirical studies of judgments about the truth or falsity of
various kinds of factual assertions, and it is especially likely to
occur when the judgment task is difficult, error rates are high,
corrective feedback is lacking, and judges believe that the
probability that their judgments are correct is high (Baron
4
True negatives (.31)
False negatives (.19)
False positives (.08)
True positives (.42)
Density
3
Unsubstantiated
area = .50
2
Substantiated
area = .50
Probability threshold for substantiation (.67) →
Discussion
1
0
0
.1
.2
.3
.4
.5
.6
.7
Probability that an allegation is true
.8
.9
Overall error rate = .28, false positive rate = .21, false negative rate = .32
1 - positive predictive value = .17, 1 - negative predictive value = .39
Fig. 4 Estimated error rates for participant 32
2008). For example, when college students estimated that the
probability that judgments they made were correct was
100%, they were actually correct about 70% of the time, a
30% overextremity effect (Wright and Phillips 1980). In
another study, when physicians estimated that the probability
that patients had pneumonia was 90%, the actual probability
was about 20% (Christensen-Szalanski and Bushyhead
1981). In one oft-cited study in which psychologists made
postdictions about the behavior of a real person after reading
clinical case study information about him, when the
participants believed that 53% of their judgments were
correct, they were only correct 28% of the time; furthermore,
although participants’ confidence in their judgments increased as they received more and more clinical data about
the target person, their judgment accuracy did not improve
(Oskamp 1965). Other experimental studies have also found
overconfidence effects in MHPs’ clinical judgments (Faust et
al. 1988; Moxley 1973).
In the current context, overextremity means, for example, that the true average probability-of-truth for all of the
allegations that are judged to fall into the 80-100% bin is
likely to be lower than 90% and, similarly, the true average
probability-of-truth for all of the cases that are judged to
fall into the 0-20% bin is likely to be higher than 10%. In
order to provide a rough illustration of the impact of three
possible levels (10%, 20%, and 30%) of overextremity in
probability-of-truth estimates on error rates, the estimated
probability-of-truth distributions for each participant were
compressed and error rates were recalculated. For example,
to calculate error rates for 10% overextremity (which means
that only 90% of all allegations judged to be 100% likely to
be true are actually true), the maximum probability of truth
was set to 90% and the minimum to 10%. All five of the
bins were compressed and adjusted so that the total area
contained in the five bins was still 100%. For example, for
an overextremity of 10%, bin 1 was adjusted to range from
probability-of-truth of 10% to 26%, bin 2 from 26% to
42%, Bin 3 from 42% to 58%, Bin 4 from 58% to 75%, and
Bin 5 from 74% to 90%. Estimated error rates, adjusted for
10%, 20%, and 30% overextremity, are shown in rows three
through five of Table 3.
1
This study used self-report data from professionals who
conduct or participate in forensic CSA evaluations to
estimate error rates for each participant’s judgments about
the validity of the CSA allegations they evaluate. The two
most important conclusions of this study are (a) error rates
are high, higher than most practitioners and legal decisionmakers realize and (b) error rates vary markedly from one
practitioner to the next.
142
If the results of the current study are correct and
generalizable, then 0.18, 0.36, and 0.28 represent optimistic
estimates of the median false positive, false negative, and
overall error rates. These are optimistic estimates, likely to
be lower than the true medians, because they do not take
overextremity bias into account. Variability in these estimated error rates, as shown in Table 3, is extreme. For example,
the estimated overall error rates (with overextremity = 0%)
ranged from 0.11 to 0.75 in the study sample.
There are a number of limitations to this study. First, the
study used a self-selected sample of convenience, and
caution is required in generalizing to the population of
MHPs who conduct or participate in forensic CSA evaluations. Second, the study is based on retrospective selfreports. Self-reports are liable to multiple types of biases.
An attempt has been made to estimate the effects of
different possible levels of one type of expected bias—
overconfidence, in the specific form of overextremity—but
other systematic biases may also have influenced participants’ reports. Third, the survey was delivered over the
Internet. Internet delivery may have biased responses in
unknown ways. No attempt was made to verify the
identities and professional status of participants, so it is
possible that some participants were imposters, although
participants did need to provide a valid email address and a
valid name and postal mailing address in order to receive
their $50 check for participation. Completing this long
survey involved a considerable investment of time, about
1 h, and there was evidence of the kind of random or
inconsistent responding that might be expected from
imposters in only a small number of protocols (four out
of 123); these four protocols were excluded from the
analysis. Fourth, there was no attempt to verify the
accuracy of information supplied by participants. For
example, there was no review of documentation of
participants’ evaluations in order to assess the accuracy of
their self-reported substantiation rates. Fifth, the approach
taken here to the estimation of error rates is novel.
Surprisingly, clear examples of past applications of this
seemingly obvious approach to error rate estimation could
not be located. The general validity of this approach can
and should be experimentally tested by judgment and
decision researchers in contexts in which independent
criteria for assessing the correctness of judgments are
available. Pending further validation of this approach,
results and conclusions should be viewed as tentative.
Arguments that indirectly support the external validity of
the results of the current study include the following. First,
key data collected in this retrospective survey study were
consistent with data collected in field studies. For example,
the mean self-reported substantiation rate in the current
study was 0.47. The weighted mean substantiation rate
across 3,841 forensic CSA evaluations that were examined
Psychol. Inj. and Law (2010) 3:133–147
in eight other studies was 0.52 (Bowen and Aldous 1999;
Drach et al. 2001; Elliott and Briere 1994; Everson and
Boat 1989; Haskett et al. 1995; Jones and McGraw 1987;
Keary and Fitzpatrick 1994; Oates et al. 2000). The mean
substantiation rates in these eight studies ranged from 0.25
to 0.63. Second, the high error rates found in this study are
consistent with the widespread consensus among leading
researchers that MHPs’ opinions about the validity of
uncorroborated abuse allegations do not have a firm
foundation in empirical science and with empirical findings
from a handful of studies that have directly or indirectly
addressed the issue of evaluator accuracy. In a review and
analysis of relevant empirical studies, Herman (2005)
concluded that the overall error rate for professionals’
judgments about the validity of uncorroborated CSA
allegations exceeded .24.
The findings of this study—if correct and generalizable—
indicate that judgment errors by MHPs are common. It is
natural to ask what, can be done to improve diagnostic
performance in this judgment domain. As Swets et al. (2000)
note, there are two general approaches to improving
diagnostic performance: the first is to improve the accuracy
of diagnostic procedures. The second is to adjust decision
thresholds in order to maximize overall utility. The potential
application of each approach to improving diagnostic
performance in forensic CSA evaluations is considered in
turn.
Improving Judgment Accuracy
There are several promising, feasible approaches to improving the accuracy of judgments about the validity of
CSA allegations. The most obvious reform would be to
require the use of the National Institute of Child Health and
Human Development (NICHD) forensic child interview
protocol (Lamb et al. 2008, pp. 283-315) in all forensic
child interviews. There is now a considerable body of
empirical evidence indicating that use of this interview
protocol leads to more accurate and detailed reports of past
events by child interviewees (Lamb et al. 2008; Lamb et al.
2007). In Hershkowitz et al. (2007a,b), the use of NICHD
interview protocol reduced the false negative rate from 0.62
to 0.05. Unfortunately, the use of the NICHD protocol did
not impact the false positive rates, which were 0.40 for
nonprotocol and 0.48 for NICHD protocol interviews.
There is no other forensic child interview protocol that
comes close to the NICHD in terms of empirical validation.
Mandating the use of the NICHD interview protocol in all
(or almost all4) forensic child interviews in cases of alleged
or suspected CSA would be likely to have an immediate
4
See Lamb et al. (2008) for a discussion of situations in which
modifications to the protocol may be necessary.
Psychol. Inj. and Law (2010) 3:133–147
Other steps that may lead to improved accuracy include
the following:
1. Develop, test, and refine empirically based decision aids
for evaluating CSA allegations (cf. Baird and Wagner
2000; Baird et al. 1999; Herman 2005; Milner 1994).
2. Update published practice guidelines for forensic CSA
evaluations (e.g., American Academy of Child and
Adolescent Psychiatry 1997; American Professional
Society on the Abuse of Children 1997) in order to
incorporate evidence about the high risks of error when
evaluators make judgments about uncorroborated allegations. As an adjunct to these guidelines, it would be
helpful to create written checklists that could be used in
the field to document compliance with best investigation and evaluation practices. The use of simple
checklists has been shown to be a remarkably effective
way of reducing morbidity and mortality in medical
practice (e.g., Haynes et al. 2009).
3. Video- or audio-tape all child and adult interviews from
start to finish (Berliner and Lieb 2001).
4. Given the abysmal performance of most human lie
detectors (Bond and DePaulo 2006), it is time to take a
closer look at both new (brain imaging) and old
(polygraph) technological approaches to lie detection.
In fact, numerous prosecutors and police investigators
continue to use polygraph examinations to make
decisions about which CSA allegations to pursue.
Other promising approaches to improving practice in this
area are described in a recent volume edited by Kuehnle
and Connell (2009).
Maximizing Utility
As Swets et al. (2000) explain, a second approach to
improving diagnostic performance is to adjust decision
thresholds. In some cases, adjusting decision thresholds can
reduce overall error rates. In most cases, decision thresholds
are adjusted in order to control the balance of false positive
and false negative errors. In the current context, policy
makers and legal decision makers could adjust decision
thresholds by either increasing or decreasing the strength of
evidence required for substantiation.
There is a tradeoff between false positive and false
negative error rates. As the evidentiary threshold for
substantiation is raised, the false positive rate will decrease
and the false negative error rate will increase. If the
threshold is set so high that no allegations are substantiated,
then there would be no false positives and all of the true
allegation cases would be false negatives. Conversely, if all
allegations were substantiated, then there would be no false
negatives and all of the false allegation cases would
represent false positives.
The effect of raising or lowering the substantiation
threshold for a typical study participant, Participant 32, is
shown in Fig. 5. Figure 5 reflects an assumption of 10%
overextremity. If Participant 32 substantiates in 50% of
cases (as she reported doing), then her estimated false
positive and false negative error rates are 0.28 and 0.35,
respectively; her estimated overall error rate is 0.32. The
ratio of false negatives to false positives at 50% substantiation is 1.8:1. The lowest total error rate, 0.29, would
1
Overall error
False positives
False negatives
.9
.8
Self-reported substantiation rate →
.7
.6
Error rates
and dramatic positive impact on the false negative error
rate. Thanks in large part to research conducted by Irit
Hershkowitz, Michael Lamb, and their colleagues, the use
of the NICHD interview protocol has already been
mandatory in all CSA investigations in Israel for several
years (Hershkowitz et al. 2007a).
There are other reforms that might improve the accuracy
of judgments about the validity of CSA allegations. An
allegation of CSA is an allegation that a serious crime has
been committed. MHPs are not trained to investigate crimes
or to collect the corroborative evidence (e.g., confessions,
DNA evidence) needed to prove that a crime has or has not
occurred. It may be time for primary responsibility for CSA
investigations to be turned over to those who are trained to
investigate crimes, the police (cf. Cross et al. 2005; Walsh
1993). In Sweden, the UK, and other countries, police
already often take the lead role in investigating allegations
of CSA and performing investigative interviews (Cederborg
et al. 2000; Lamb et al. 2009). In the US, CPS caseworkers
perform most initial investigations of allegations of sexual
abuse by parents or caretakers. A few US jurisdictions are
experimenting with having law enforcement personnel
conduct all initial investigations of CSA allegations,
including allegations of parental or caretaker abuse (Cross
et al. 2005).
143
.5
.4
.3
.2
.1
0
0
.1
.2
.3
.4
.5
.6
Substantiation rate
.7
.8
.9
Fig. 5 Error rate tradeoffs for participant 32 (overextremity=10%)
1
144
occur if Participant 32 lowered her evidentiary threshold so
that she substantiated in 70% of cases; at 70% substantiation, the ratio of false negatives to false positives would be
1: 2.2. To achieve a false negative to false positive ratio of
10:1 (cf. Blackstone 1769, “it is better that ten guilty
persons escape than that one innocent suffer”) would
require lowering the substantiation rate to 0.24. To achieve
a ratio of 100:1 (cf. Franklin 1785/1907, “it is better that a
hundred guilty Persons escape than one innocent Person
should suffer”) would require lowering the substantiation
rate to 0.05.
Conclusion
This study contributes to our understanding of the disturbing magnitude and variability of error rates in professionals’
judgments about the validity of allegations of CSA.
Findings of this study are consistent with the conclusions
of the few past studies that provide data that are relevant to
the empirical assessment of error rates in this domain. This
study casts severe doubt on legal decision makers’ current
reliance on MHPs’ decisions to substantiate uncorroborated
sexual abuse allegations. Policy makers and legal decisionmakers should be informed that the median false positive
rate for experts’ opinions about CSA allegations is likely to
exceed 0.18, that false positive rates are highly variable,
and that there are no reliable ways to identify accurate
experts.
Policy makers, with the assistance of scientific researchers, should carefully examine the tradeoff between false
positive and false negative errors and determine whether or
not current practices are consistent with core sociolegal
values. If policy makers determine, for example, that
current false positive rates are too high relative to false
negative rates, then laws and policies should be enacted to
modify the balance of error by adjusting evidentiary
requirements for substantiation.
References
Aamodt, M. G., & Custer, H. (2006). Who can best catch a liar?: A
meta-analysis of individual differences in detecting deception.
The Forensic Examiner, 15, 6–11.
American Academy of Child and Adolescent Psychiatry. (1997). Practice
parameters for the forensic evaluation of children and adolescents
who may have been physically or sexually abused. Journal of the
American Academy of Child and Adolescent Psychiatry, 36, 423–
442. doi:10.1097/00004583-199703000-00026.
American Professional Society on the Abuse of Children. (1997).
Psychosocial evaluation of suspected sexual abuse in children
(2nd ed.). Chicago: Author.
Baird, C., & Wagner, D. (2000). The relative validity of actuarial- and
consensus-based risk assessment systems. Children and Youth
Psychol. Inj. and Law (2010) 3:133–147
Services Review, 22, 839–871. doi:10.1016/S0190-7409(00)
00122-5.
Baird, C., Wagner, D., Healy, T., & Johnson, K. (1999). Risk
assessment in child protective services: Consensus and actuarial
model reliability. Child Welfare Journal, 78, 723–748.
Baron, J. (2008). Thinking and deciding (4th ed.). New York:
Cambridge University Press.
Berliner, L., & Conte, J. R. (1993). Sexual abuse evaluations:
Conceptual and empirical obstacles. Child Abuse & Neglect,
17, 111–125. doi:10.1016/0145-2134(93)90012-t.
Berliner, L., & Lieb, R. (2001). Child sexual abuse investigations:
Testing documentation methods. Olympia, WA: Washington State
Institute for Public Policy.
Bikel, O. (Writer). (1997). Innocence lost: The plea [Television
broadcast]. Boston and Washington, DC: Frontline/Public Broadcasting Service. Information about this television broadcast
retrieved on June 11, 2008 from: http://www.pbs.org/wgbh/
pages/frontline/shows/innocence/.
Blackstone, W. (1769). Commentaries on the laws of England (Vol.
4). Retrieved May 30, 2007, from Yale University, The Avalon
Project Website: http://www.yale.edu/lawweb/avalon/blackstone/
blacksto.htm.
Bond, C. F., Jr., & DePaulo, B. M. (2006). Accuracy of deception
judgments. Personality and Social Psychology Review, 10, 214–
234. doi:10.1207/s15327957pspr1003_2.
Bow, J. N., Quinnell, F. A., Zaroff, M., & Assemany, A. (2002).
Assessment of sexual abuse allegations in child custody cases.
Professional Psychology: Research and Practice, 33, 566–575.
doi:10.1037/0735-7028.33.6.566.
Bowen, K., & Aldous, M. B. (1999). Medical evaluation of sexual
abuse in children without disclosed or witnessed abuse. Archives
of Pediatric and Adolescent Medicine, 153, 1160–1164.
Boyer, P. J., & Kirk, M. (Writers). (1998). The child terror [Television
broadcast]. Boston and Washington, DC: Frontline/Public Broadcasting Service. Information about this television broadcast
retrieved on June 11, 2008 from: http://www.pbs.org/wgbh/
pages/frontline/shows/terror/.
Bruck, M., Ceci, S. J., & Hembrooke, H. (1998). Reliability and
credibility of young children’s reports. From research to policy
and practice. American Psychologist, 53, 136–151. doi:10.1037/
0003-066X.53.2.136.
Ceci, S. J., & Bruck, M. (1995). Jeopardy in the courtroom: A
scientific analysis of children's testimony. Washington, DC:
American Psychological Association.
Cederborg, A. C., Orbach, Y., Sternberg, K. J., & Lamb, M. E.
(2000). Investigative interviews of child witnesses in Sweden.
Child Abuse & Neglect, 24, 1355–1361. doi:10.1016/S01452134(00)00183-6.
Christensen-Szalanski, J. J., & Bushyhead, J. B. (1981). Physicians’
use of probabilistic information in a real clinical setting. Journal
of Experimental Psychology: Human Perception and Performance, 7, 928–935. doi:10.1037/0096-1523.7.4.928.
Colwell, L. H., Miller, H. A., Miller, R. S., & Lyons, P. M., Jr. (2006).
US police officers’ knowledge regarding behaviors indicative of
deception: Implications for eradicating erroneous beliefs through
training. Psychology, Crime & Law, 12, 489–503. doi:10.1080/
10683160500254839.
Connolly, D. A., Price, H. L., Lavoie, J. A. A., & Gordon, H. M.
(2008). Perceptions and predictors of children’s credibility of a
unique event and an instance of a repeated event. Law and
Human Behavior, 32, 92–112. doi:10.1007/s10979-006-9083-3.
Cross, T. P., Finkelhor, D., & Ormrod, R. (2005). Police involvement
in child protective services investigations: Literature review and
secondary data analysis. Child Maltreatment, 10, 224–244.
doi:10.1177/1077559505274506.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).
Psychol. Inj. and Law (2010) 3:133–147
Davies, D., Cole, J., Albertella, G., McCulloch, L., Allen, K., &
Kekevian, H. (1996). A model for conducting forensic interviews
with child victims of abuse. Child Maltreatment, 1, 189–199.
doi:10.1177/1077559596001003002.
Dawes, R. M. (1994). House of cards: Psychiatry and psychotherapy
built on myth. New York: The Free Press.
DePaulo, B. M., & Pfeifer, R. L. (1986). On-the-job experience and
skill at detecting deception. Journal of Applied Social Psychology,
16, 249–267. doi:10.1111/j.1559-1816.1986.tb01138.x.
DePaulo, B. M., Charlton, K., Cooper, H., Lindsay, J. J., &
Muhlenbruck, L. (1997). The accuracy-confidence correlation in
the detection of deception. Personality and Social Psychology
Review, 1, 346–357. doi:10.1207/s15327957pspr0104_5.
Drach, K. M., Wientzen, J., & Ricci, L. R. (2001). The diagnostic
utility of sexual behavior problems in diagnosing sexual abuse in
a forensic child abuse evaluation clinic. Child Abuse & Neglect,
25, 489–503. doi:10.1016/S0145-2134(01)00222-8.
Edelstein, R. S., Luten, T. L., Ekman, P., & Goodman, g S. (2006).
Detecting lies in children and adults. Law and Human Behavior,
30, 1–10. doi:10.1007/s10979-006-9031-2.
Elaad, E. (2003). Effects of feedback on the overestimated capacity to
detect lies and the underestimated ability to tell lies. Applied
Cognitive Psychology, 17, 349–363. doi:10.1002/acp.871.
Elliott, D. M., & Briere, J. (1994). Forensic sexual abuse evaluations of
older children: Disclosures and symptomatology. Behavioral
Sciences and the Law, 12, 261–277. doi:10.1002/bsl.2370120306.
Everson, M. D., & Boat, B. W. (1989). False allegations of sexual
abuse by children and adolescents. Journal of the American
Academy of Child and Adolescent Psychiatry, 28, 230–235.
doi:10.1097/00004583-198903000-00014.
Faust, D., Hart, K., & Guilmette, T. J. (1988). Pediatric malingering:
the capacity of children to fake believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology,
56, 578–582. doi:10.1037/0022-006X.56.4.578.
Finlayson, L. M., & Koocher, G. P. (1991). Professional judgment and
child abuse reporting in sexual abuse cases. Professional Psychology:
Research and Practice, 22, 464–472. doi:10.1037/07357028.22.6.464.
Fisher, C. B. (1995). American Psychological Association’s (1992)
Ethics Code and the validation of sexual abuse in day-care
settings. Psychology, Public Policy, and Law, 1, 461–478.
doi:10.1037/1076-8971.1.2.461.
Fisher, C. B., & Whiting, K. A. (1998). How valid are child sexual abuse
validations? In S. J. Ceci & H. Hembrooke (Eds.), Expert witnesses
in child abuse case: What can and should be said in court (pp.
159–184). Washington, DC: American Psychological Association.
Franklin, B. (1785/1907). The writings of Benjamin Franklin (Vol. 9).
New York: Macmillan.
Frye v. United States, 293 F. 1013 (D. C. Cir. 1923).
Fukurai, H., & Butler, E. W. (1994). Sociologists in action: The
McMartin sexual abuse case, litigation, justice, and mass hysteria.
American Sociologist, 25, 44–71. doi:10.1007/BF02691989.
Garrido, E., Masip, J., & Herrero, C. (2004). Police officers’ credibility
judgments: Accuracy and estimated ability. International Journal
of Psychology, 39, 254–275. doi:10.1080/00207590344000411.
Garven, S., Wood, J. M., Malpass, R. S., & Shaw, J. S., III. (1998).
More than suggestion: The effect of interviewing techniques from
the McMartin Preschool case. Journal of Applied Psychology, 83,
347–359. doi:10.1037/0021-9010.83.3.347.
Goodman, G. S., Emery, R. E., & Haugaard, J. J. (1998).
Developmental psychology and law: Divorce, child maltreatment, foster care and adoption. In W. Damon, I. E. Sigel, & K. A.
Renninger (Eds.), Handbook of child psychology (5th ed., Vol. 4,
pp. 775–874). Hoboken, NJ: John Wiley & Sons Inc.
Griffin, D., & Brenner, L. (2004). Perspectives on probability
judgment calibration. In D. J. Koehler & N. Harvey (Eds.),
145
Blackwell handbook of judgment and decision making (pp. 177–
198). Malden, MA: Blackwell Publishing.
Haskett, M. E., Wayland, K., Hutcheson, J. S., & Tavana, T. (1995).
Substantiation of sexual abuse allegations: Factors involved in
the decision-making process. Journal of Child Sexual Abuse, 4,
19–47. doi:10.1300/J070v04n02_02.
Haynes, A., Weiser, T., Berry, W., Lipsitz, S., Breizat, A., Dellinger,
E., et al. (2009). A surgical safety checklist to reduce morbidity
and mortality in a global population. New England Journal of
Medicine, 360, 491–499. doi:10.1056/NEJMsa0810119.
Herman, S. (2005). Improving decision making in forensic child
sexual abuse evaluations. Law and Human Behavior, 29, 87–120.
doi:10.1007/s10979-005-1400-8.
Herman, S. (2009). Forensic child sexual abuse evaluations: Accuracy,
ethics, and admissibility. In K. Kuehnle & M. Connell (Eds.),
The evaluation of child sexual abuse allegations: A comprehensive guide to assessment and testimony. Hoboken, NJ: John
Wiley and Sons.
Hershkowitz, I., Fisher, S., Lamb, M. E., & Horowitz, D. (2007a).
Improving credibility assessment in child sexual abuse allegations: The role of the NICHD investigative interview protocol.
Child Abuse & Neglect, 31, 99–110. doi:10.1016/j.chiabu.2006.
09.005.
Hershkowitz, I., Horowitz, D., & Lamb, M. E. (2007b). Individual and
family variables associated with disclosure and nondisclosure of
child abuse in Israel. In M. E. Pipe, M. E. Lamb, Y. Orbach, & A.
C. Cederborg (Eds.), Child sexual abuse: Disclosure, delay, and
denial (pp. 65–75). Mahwah, NJ: Lawrence Erlbaum Associates.
Horner, T. M., & Guyer, M. J. (1991a). Prediction, prevention, and
clinical expertise in child custody cases in which allegations of
child sexual abuse have been made: I. Predictable rates of
diagnostic error in relation to various clinical decision making
strategies. Family Law Quarterly, 25, 217–252.
Horner, T. M., & Guyer, M. J. (1991b). Prediction, prevention, and
clinical expertise in child custody cases in which allegations of
child sexual abuse have been made: II. Prevalence rates of child
sexual abuse and the precision of “tests” constructed to diagnose
it. Family Law Quarterly, 25, 381–409.
Horner, T. M., Guyer, M. J., & Kalter, N. M. (1992). Prediction,
prevention, and clinical expertise in child custody cases in
which allegations of child sexual abuse have been made: III.
Studies of expert opinion formation. Family Law Quarterly,
26, 141–170.
Horner, T. M., Guyer, M. J., & Kalter, N. M. (1993). Clinical expertise
and the assessment of child sexual abuse. Journal of the
American Academy of Child and Adolescent Psychiatry, 32,
925–931. doi:10.1097/00004583-199309000-00006.
Horowitz, S. W., Lamb, M. E., Esplin, P. W., Boychuk, T., & ReiterLaverly, L. (1995). Establishing the ground truth in studies of
child sexual abuse. Expert Evidence, 4, 42–51.
Humphrey, H. H. (1985). Report on Scott County investigations.
Minneapolis, MN: Minnesota Attorney General.
Johnson, J. (2004). Conviction tossed after 19 years. A man's
molestation trial is nullified after several witnesses retract
testimony they gave as children., Los Angeles Times, p. 1
(California; Metro; Metro Desk; Part B), May 1.
Jones, D. P., & McGraw, J. M. (1987). Reliable and fictitious accounts
of sexual abuse to children. Journal of Interpersonal Violence, 2,
27–45. doi:10.1177/088626087002001002.
Kassin, S. M. (2002). Human judges of truth, deception, and
credibility: Confident but erroneous. Cardozo Law Review, 23,
809–816.
Keary, K., & Fitzpatrick, C. (1994). Children’s disclosure of sexual
abuse during formal investigation. Child Abuse & Neglect, 18,
543–548. doi:10.1016/0145-2134(94)90080-9.
Kennedy v. Louisiana, 128 S. Ct. 2641 (2009).
146
Kuehnle, K., & Connell, M. (Eds.). (2009). The evaluation of child
sexual abuse allegations: A comprehensive guide to assessment
and testimony. Hoboken, NJ: John Wiley & Sons Inc.
Kuehnle, K., & Sparta, S. N. (2006). Assessing child sexual abuse
allegations in a legal context. In S. N. Sparta & G. P. Koocher (Eds.),
Forensic mental health assessment of children and adolescents
(pp. 129–148). New York, NY: Oxford University Press.
Lamb, M. E. (1994). The investigation of child sexual abuse: An
interdisciplinary consensus statement. Child Abuse & Neglect,
18, 1021–1028. doi:10.1016/0145-2134(94)90127-9.
Lamb, M. E., Orbach, Y., Hershkowitz, I., Esplin, P. W., & Horowitz,
D. (2007). A structured forensic interview protocol improves the
quality and informativeness of investigative interviews with
children: A review of research using the NICHD Investigative
Interview Protocol. Child Abuse & Neglect, 31, 1201–1231.
doi:10.1016/j.chiabu.2007.03.021.
Lamb, M. E., Hershkowitz, I., Orbach, Y., & Esplin, P. W.
(2008). Tell me what happened: Structured investigative interviews of child victims and witnesses. Hoboken, NJ: John Wiley
and Sons.
Lamb, M. E., Orbach, Y., Sternberg, K. L., Aldridge, J., Pearson, S.,
Stewart, H. L., et al. (2009). Use of a structured investigative
protocol enhances the quality of investigative interviews with
alleged victims of child sexual abuse in Britain. Applied
Cognitive Psychology, 23, 449–467. doi:10.1002/acp.1489.
Leach, A., Talwar, V., Lee, K., Bala, N., & Lindsay, R. C. L. (2004).
“Intuitive” lie detection of children's deception by law enforcement officials and university students. Law & Human Behavior,
28, 661–685. doi:10.1007/s10979-004-0793-0.
Leichtman, M. D., & Ceci, S. J. (1995). The effects of stereotypes and
suggestions on preschoolers' reports. Developmental Psychology,
31, 568–578. doi:10.1037/0012-1649.31.4.568.
Lichtenstein, S., Fischoff, B., & Phillips, L. D. (1982). Calibration of
probabilities: The state of the art to 1980. In A. Tversky & D.
Kahneman (Eds.), Judgement under uncertainty: Heuristics and
Biases. New York: Cambridge University Press.
Mann, S., Vrij, A., & Bull, R. (2004). Detecting true lies: Police
officers’ ability to detect suspects’ lies. Journal of Applied
Psychology, 89, 137–149. doi:10.1037/0021-9010.89.1.137.
McGraw, J. M., & Smith, H. A. (1992). Child sexual abuse allegations
amidst divorce and custody proceedings: Refining the validation
process. Journal of Child Sexual Abuse, 1, 49–62. doi:10.1300/
J070v01n01_04.
Melton, G. B., & Limber, S. (1989). Psychologists’ involvement in cases
of child maltreatment. Limits of role and expertise. American
Psychologist, 44, 1225–1233. doi:10.1037/0003-066X.44.9.1225.
Milner, J. S. (1994). Assessing physical child abuse risk: The child
abuse potential inventory. Clinical Psychology Review, 14, 547–
583. doi:10.1016/0272-7358(94)90017-5.
Moxley, A. W. (1973). Clinical judgment: the effects of statistical
information. Journal of Personality Assessment, 37, 86–91.
Nathan, D., & Snedeker, M. (2001). Satan’s silence: Ritual abuse and
the making of a modern American witch hunt. Lincoln, NE:
Authors Choice Press.
National Children's Alliance. (2007). Press release. Retrieved June 1,
2008 from the National Children's Alliance Website: http://www.
nca-online.org/pages/page.asp?page_id=6835.
Oates, R. K., Jones, D. P., Denson, D., Sirotnak, A., Gary, N., &
Krugman, R. D. (2000). Erroneous concerns about child sexual
abuse. Child Abuse & Neglect, 24, 149–157. doi:10.1016/S01452134(99)00108-8.
Oskamp, S. (1965). Overconfidence in case-study judgments.
Journal of Consulting Psychology, 29, 261–265. doi:10.1037/
h0022125.
Pawloski, J. (2005). Abuse warning ignored. Albuquerque Journal.
Article citation retrieved on June 11, 2008 from the Albuquerque
Psychol. Inj. and Law (2010) 3:133–147
Journal archive Website: http://www.abqjournal.com/archives/
search_newslib.htm.
Poole, D. A., & Lamb, M. E. (1998). Investigative interviews of
children: A guide for helping professionals. Washington, DC:
American Psychological Association.
Poole, D. A., & Lindsay, D. S. (1998). Assessing the accuracy of
young children's reports: Lessons from the investigation of child
sexual abuse. Applied & Preventive Psychology, 7, 1–26.
doi:10.1016/S0962-1849(98)80019-X.
Rabinowitz, D. (2003). No crueler tyrannies: Accusation, false
witness, and other terrors of our times. New York: Free Press.
Realmuto, G. M., & Wescoe, S. (1992). Agreement among
professionals about a child’s sexual abuse status: Interviews
with sexually anatomically correct dolls as indicators of abuse.
Child Abuse & Neglect, 16, 719–725. doi:10.1016/0145-2134
(92)90108-4.
Realmuto, G. M., Jensen, J. B., & Wescoe, S. (1990). Specificity and
sensitivity of sexually anatomically correct dolls in substantiating
abuse: A pilot study. Journal of the American Academy of Child
and Adolescent Psychiatry, 29, 743–746. doi:10.1097/00004583199009000-00011.
Robinson, B. A. (2005). “McMartin” ritual abuse cases in Manhattan
Beach, CA. Retrieved June 11, 2008 from the Ontario Consultants
on Religious Tolerance Website: http://www.religioustolerance.org/
ra_mcmar.htm.
Robinson, B. A. (2007). 42 Multi-victim/multi-offender court cases
with allegations of sexual and physical abuse of multiple
children. Retrieved June 11, 2008 from the Ontario Consultants
on Religious Tolerance Website: http://www.religioustolerance.
org/ra_case.htm.
Rosenthal, R. (1995). State of New Jersey v. Margaret Kelly Michaels:
An overview. Psychology, Public Policy, and Law, 1, 246–271.
doi:10.1037/1076-8971.1.2.246.
San Diego County Grand Jury. (1992). Child sexual abuse, assault,
and molest issues. San Diego, CA: Author.
San Diego County Grand Jury. (1994). Analysis of child molestation
issues. San Diego, CA: Author.
Schreiber, N., Bellah, L. D., Martinez, Y., McLaurin, K. A., Strok,
R., Garven, S., et al. (2006). Suggestive interviewing in the
McMartin Preschool and Kelly Michaels daycare abuse cases:
A case study. Social Influence, 1, 16–47. doi:10.1080/
15534510500361739.
Seattle Post Intelligencer. (1998). Special report: A record of abuses in
Wenatchee. Retrieved July 10, 2004 from http://seattlepi.
nwsource.com/powertoharm/
Shumaker, K. R. (2000). Measured professional competence
between and among different mental health disciplines when
evaluating and making recommendations in cases of suspected
child sexual abuse. Dissertation Abstracts International, 60
5791B.
Soanes, C., & Stevenson, A. (Eds.). (2005). The Oxford dictionary of
English (2nd ed.). Oxford, UK: Oxford University Press.
State v. Brown, 297 Or 404 (1984).
State v. Southard, 347 Or 127 (2009).
Sternberg, K. J., Lamb, M. E., Davies, G. M., & Westcott, H. L.
(2001). The memorandum of good practice: Theory versus
application. Child Abuse & Neglect, 25, 669–681. doi:10.1016/
S0145-2134(01)00232-0.
Swarns, R. L., & Cooper, M. (1998). 5 held in sex abuse of several
children despite monitoring. New York Times. Retrieved June
11, 2008 from the New York Times Website: http://nytimes.
com.
Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological
science can improve diagnostic decisions. Psychological
Science in the Public Interest, 1, 1–26. doi:10.1111/15291006.001.
Psychol. Inj. and Law (2010) 3:133–147
Talwar, V., Lee, K., Bala, N., & Lindsay, R. C. L. (2006). Adults’
judgments of children’s coached reports. Law and Human
Behavior, 30, 561–570. doi:10.1007/s10979-006-9038-8.
U.S. Department of Health and Human Services. (2009). Child
maltreatment 2007. Washington, DC: U.S. Government Printing
Office.
Vrij, A. (2002). Deception in children: A literature review and
implications for children's testimony. In H. L. Westcott, G. M.
Davies, & R. H. C. Bull (Eds.), Children's testimony: A
handbook of psychological research and forensic practice (pp.
175–194). Chichester: Wiley and Sons.
Vrij, A., & Mann, S. (2001). Who killed my relative? Police Officers'
ability to detect real-life high-stake lies. Psychology, Crime &
Law, 7, 119–132. doi:10.1080/10683160108401791.
Vrij, A., & van Wijngaarden, J. J. (1994). Will the truth come out?
Two studies about the detection of false statements expressed by
children. Expert Evidence, 3, 78–83.
147
Vrij, A., Akehurst, L., Brown, L., & Mann, S. (2006). Detecting lies in
young children, adolescents and adults. Applied Cognitive
Psychology, 20, 1225–1237. doi:10.1002/acp.1278.
Waldman, H., & Jones, D. P. (2008). Why wasn't he stopped?
Hartford Courant, p. A1, February 24.
Walsh, B. (1993). The law enforcement response to child sexual abuse
cases. Journal of Child Sexual Abuse, 2, 117–121. doi:10.1300/
J070v02n03_11.
Wexler, R. (1990). Wounded innocents. Amherst, NY: Prometheus Books.
Wilson, K. (2007). Forensic interviewing in New Zealand. In M. E.
Pipe, M. E. Lamb, Y. Orbach, & A. C. Cederborg (Eds.), Child
sexual abuse: Disclosure, delay, and denial (pp. 265–280).
Mahwah, NJ: Lawrence Erlbaum Associates.
Wright, G. N., & Phillips, L. D. (1980). Cultural variation in
probabilistic thinking: Alternative ways of dealing with uncertainty. International Journal of Psychology, 15, 239–257.
doi:10.1080/00207598008246995.
Descargar