Document

Anuncio
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
Real-Time Quantitative PCR Data Analysis
Josep Lluís Mosquera
UNITAT D’ESTADÍSTICA I BIOINFORMÀTICA
OUTLINE
● Recapitulation
● Normalization
○ Absolute Quantification
○ Relative Quantification
● Data Analysis Pipeline
● Software
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RECAPITULATION (1): Basic Concepts
● RT-qPCR is a method for determining the amount of
nucleic acid present in a sample.
●
∆Rn: increment of fluorescent signal
at each time point.
●
Baseline: cycles in which a signal is
accumulating but is beneath the limits
of detection.
●
Threshold: arbitrary level of
fluorescence chosen on the basis of
the baseline variability.
●
Ct: the fractional PCR cycle number at
which the fluorescence is greater than
the threshold.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RECAPITULATION (2): Basic Equations
● Target Reporter Fluorescence is determined by
R
Ct
=Ro⋅
(1+Eexp)
Ct
● Amplification Efficiency (at threshold)
E
(−1s )
exp
= 10
−1
● Fluorescence increase id proportional to the amount of
target DNA
I = k ⋅ R Ct
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
PIPELINE OF RT-QPCR DATA ANALYSIS
1. Quality assessment
2. Normalisation
3. Data visualisation
4. Testing for statistical significance
5. Anotation/Mapping features
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
QUALITY ASSESSMENT
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION
When analyzing results of RT-qPCR assays you are faced with
several
uncontrolled
variables,
which
can
lead
to
misinterpretation of the results.
Uncontrolled variation:
The amount of starting material
Enzymatic efficiencies
Differences between: tissues, individuals, experimental conditions
…
To correct systematic variation
BUT NOT biological variation
⇒
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION
NORMALIZATION: Methods
The most commonly known and used methods of
normalization:
Normalization to the original number of cells
Normalization to the total RNA mass
Normalization to one or more housekeeping genes
Normalization to an internal or external calibrator
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Quantification Methods
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Biological Meaning and Quantification Methods
BIOLOGICAL QUESTIONS:
ANALYSIS METHODS:
If I’d like to know…
… what can I do?
1)
1)
Absolute Quantification, or
2)
Relative Quantification
2)
the number of viral particles
in a given amount of blood, or
the fold change of p53 mRNA
in an “equivalent amount” of
cancerous vs. normal tissue
… are commonly used to address with
these two scenarios
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
ABSOLUTE QUANTIFICATION
● Absolute quantification requires a standard curve of known copy numbers
● It can be constructed using several standards
Most frequently used quantification standards. From Nucleic Acid Research Group, (NARG) survey 2007,
http://www.abrf.org/NARG/
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
DATA ANALYSIS: Absolute Quantification. Standard Curve
● Absolute quantification is achieved by comparing CT values
of each sample to a standard curve
● Standard curve is obtained by
●
Using different known concentrations,
●
for which CT are calculated
●
and plotted vs the (log) (known) quantity
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
DATA ANALYSIS: Absolute Quantification. Standard Calibration Curve
EXAMPLE:
● Determining Absolute Copy Number from Absolute Quantification
SAMPLE
REPLICATE
Ct
COPIES
A
1
18.61
204.577
A
2
18.41
234.115
A
3
19.87
172.300
Average
203.664 ± 30.917
B
1
17.06
564.789
B
2
17.07
563.823
B
3
17.00
591.173
Average
574.928 ± 14.381
● The standard curve is used only for interpolation but not for
extrapolation (relation may not be linear outside the limits tested)
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION (1)
● Relative quantification is the most widely used technique.
● Gene expression levels are calculated by the ratio between the
amount of target gene and an endogenous reference gene, which is
present in all samples.
● The reference gene has to be chosen so that its expression does
not change under the experimental conditions or between
different tissues (Cook NL et al., 2008).
● There are simple and more complex methods for relative
quantification, depending on the PCR efficiency, and the number of
reference genes used.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION (2)
Most common approaches are
Livak or ∆∆Ct method
Pfafl method
Relative Standard Curve Method
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
● The simplest one: a direct comparison of Ct values, target gene vs
reference gene.
● PCR efficiencies of both should be
• close to 100 % and
• not differ by more than 10 %.
● Involves the choice of a calibrator sample
• the untreated sample,
• the time = 0 sample, or
• Any sample you want to compare your unknown to.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
1) Normalize ∆Ct of the target gene to the reference gene is
calculated for each sample
∆Ct = Cttarget – Ctreference
2) Normalize the ∆Ct of the test sample to the ∆Ct of the calibrator
∆∆Ct = (Cttarget – Ctreference)test – (Cttarget – Ctreference)calibrator
3) Calculate the fold difference in expression
2-∆∆Ct = normalized expression ratio
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
EXAMPLE:
SAMPLE
GENE
Ct p53 (target)
Ct GAPDH (reference)
Control (calibrator)
15.0
16.5
Tumor (test)
12.0
15.9
1) ∆Ctcalibrator = 15.0 – 16.5 = -1.5
and
∆Cttest = 12.0 – 15.9 = -3.9
2) ∆∆Ct = ∆Cttest – ∆ctcalibrator = -3.9 – (-1.5) = -2.4
3) 2-∆∆Ct = 2-(-2.4) = 5.3
Tumor cells express p53 at a 5.3-fold higher level than control cells
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Pfaffl Methods
● If difference in PCR efficiencies > 10%, between the reference
gene and the target gene ⇒ ∆∆Ct method is inaccurate
● The value used is calculated with Pfaffl method
∆Cttarget(calibrator −test )
RQ =
E
E ∆Ct
target
reference
(calibrator −test )
reference
where
Egene : is the efficieny of the target, gene =target or refence
∆Ct
target
(calibrator − test ) = Cttarget (calibrator ) − Cttarget (test )
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Relative Standard Curve Method
● It is used to determine changes in amount of a given sample
relative to another, internal, control sample
● Does NOT require standards with known concentrations
1) Normalize the target gene to the reference gene
Qty
SampleTest
Qty
target
(test )
reference
Calibrator =
(test )
Qty
Qty
target
(calibrator )
reference
(calibrator )
2) Normalize the sample test to the calibrator
Qty
RQ =
Qty
target
target
(test )
Qty
reference
(test )
(calibrator )
Qty
reference
(calibrator )
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
DATA ANALYSIS: Relative Standard Curve Method
Example:
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Other Methods
● There are many different normalization methods among others
●
Geometric mean calculates the average Ct value for each sample, and scales all Ct
values according to the ratio of these mean Ct values across samples.
●
Scale rank invariant computes the pairwise rank-invariant features, but then takes
only the features found in a certain number of samples, and used the average Ct
value of those as a scaling factor for correcting all Ct values.
●
Normal rank invariant computes all rank-invariant sets of features between pairwise
comparisons of each sample against a reference, such as a pseudo-mean. The rankinvariant features are used as a reference for generating a smoothing curve, which
is then applied to the entire sample.
●
Quantile makes the distribution of Ct values more or less identical across samples.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS
● Two main types of analyses
● Comparative analyses
● Relatively rigorous
● Check a predefined hypotheses
● Relies on statistical testing
● Expression profiling:
● Search for trends and patterns in the data
● Exploratory, hypothesis generating approach
● Less rigorous
● Cluster analysis or PCA
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS : Basic Premises
● Statistical analyses of RT-qPCR data relies on three
assumptions
● One gene-at-a-time
● We are sampling from two different (unknown) independent
populations
● There exist unknown mechanisms that contribute to
variability
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS: From Assumptions to Strategies
● Use random sampling and randomization to obtain independent
and representative samples
● Apply experimental design principles to minimize confounding
variability
● Perform statistical testing
● DO NOT FORGET about multiple testing adjustments
● Standard statistical approach:
● Confirmatory study Reject or
● Accept predefined hypothesis
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS: Comparing Two Groups
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS: Comparing More Than Groups
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
SOFTWARE
SOURCE
SOFTWARE
ABI DataAssist
GeneExpression
Biogazelle REST – Relative Expression Software Tool
Bioconductor HTqPCR, ddCt,…
Integromics StatMiner
bioMCC GenEx
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
UEB CAN HELP YOU…
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
REMEMBER!!!!
To consult the statistician after an
experiment is finished is often
merely to ask him to conduct a post
mortem examination. He can
perhaps say what the experiment
died of.
1.
Father
of
modern
Mathematical Statistics and
Developer of Experimental
Design and ANOVA.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
Sir Ronald A.Fisher1
Descargar