TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 Real-Time Quantitative PCR Data Analysis Josep Lluís Mosquera UNITAT D’ESTADÍSTICA I BIOINFORMÀTICA OUTLINE ● Recapitulation ● Normalization ○ Absolute Quantification ○ Relative Quantification ● Data Analysis Pipeline ● Software TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RECAPITULATION (1): Basic Concepts ● RT-qPCR is a method for determining the amount of nucleic acid present in a sample. ● ∆Rn: increment of fluorescent signal at each time point. ● Baseline: cycles in which a signal is accumulating but is beneath the limits of detection. ● Threshold: arbitrary level of fluorescence chosen on the basis of the baseline variability. ● Ct: the fractional PCR cycle number at which the fluorescence is greater than the threshold. TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RECAPITULATION (2): Basic Equations ● Target Reporter Fluorescence is determined by R Ct =Ro⋅ (1+Eexp) Ct ● Amplification Efficiency (at threshold) E (−1s ) exp = 10 −1 ● Fluorescence increase id proportional to the amount of target DNA I = k ⋅ R Ct TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 PIPELINE OF RT-QPCR DATA ANALYSIS 1. Quality assessment 2. Normalisation 3. Data visualisation 4. Testing for statistical significance 5. Anotation/Mapping features TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 QUALITY ASSESSMENT TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 NORMALIZATION When analyzing results of RT-qPCR assays you are faced with several uncontrolled variables, which can lead to misinterpretation of the results. Uncontrolled variation: The amount of starting material Enzymatic efficiencies Differences between: tissues, individuals, experimental conditions … To correct systematic variation BUT NOT biological variation ⇒ TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 NORMALIZATION NORMALIZATION: Methods The most commonly known and used methods of normalization: Normalization to the original number of cells Normalization to the total RNA mass Normalization to one or more housekeeping genes Normalization to an internal or external calibrator TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 NORMALIZATION: Quantification Methods TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 NORMALIZATION: Biological Meaning and Quantification Methods BIOLOGICAL QUESTIONS: ANALYSIS METHODS: If I’d like to know… … what can I do? 1) 1) Absolute Quantification, or 2) Relative Quantification 2) the number of viral particles in a given amount of blood, or the fold change of p53 mRNA in an “equivalent amount” of cancerous vs. normal tissue … are commonly used to address with these two scenarios TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 ABSOLUTE QUANTIFICATION ● Absolute quantification requires a standard curve of known copy numbers ● It can be constructed using several standards Most frequently used quantification standards. From Nucleic Acid Research Group, (NARG) survey 2007, http://www.abrf.org/NARG/ TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 DATA ANALYSIS: Absolute Quantification. Standard Curve ● Absolute quantification is achieved by comparing CT values of each sample to a standard curve ● Standard curve is obtained by ● Using different known concentrations, ● for which CT are calculated ● and plotted vs the (log) (known) quantity TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 DATA ANALYSIS: Absolute Quantification. Standard Calibration Curve EXAMPLE: ● Determining Absolute Copy Number from Absolute Quantification SAMPLE REPLICATE Ct COPIES A 1 18.61 204.577 A 2 18.41 234.115 A 3 19.87 172.300 Average 203.664 ± 30.917 B 1 17.06 564.789 B 2 17.07 563.823 B 3 17.00 591.173 Average 574.928 ± 14.381 ● The standard curve is used only for interpolation but not for extrapolation (relation may not be linear outside the limits tested) TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION (1) ● Relative quantification is the most widely used technique. ● Gene expression levels are calculated by the ratio between the amount of target gene and an endogenous reference gene, which is present in all samples. ● The reference gene has to be chosen so that its expression does not change under the experimental conditions or between different tissues (Cook NL et al., 2008). ● There are simple and more complex methods for relative quantification, depending on the PCR efficiency, and the number of reference genes used. TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION (2) Most common approaches are Livak or ∆∆Ct method Pfafl method Relative Standard Curve Method TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method ● The simplest one: a direct comparison of Ct values, target gene vs reference gene. ● PCR efficiencies of both should be • close to 100 % and • not differ by more than 10 %. ● Involves the choice of a calibrator sample • the untreated sample, • the time = 0 sample, or • Any sample you want to compare your unknown to. TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method 1) Normalize ∆Ct of the target gene to the reference gene is calculated for each sample ∆Ct = Cttarget – Ctreference 2) Normalize the ∆Ct of the test sample to the ∆Ct of the calibrator ∆∆Ct = (Cttarget – Ctreference)test – (Cttarget – Ctreference)calibrator 3) Calculate the fold difference in expression 2-∆∆Ct = normalized expression ratio TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method EXAMPLE: SAMPLE GENE Ct p53 (target) Ct GAPDH (reference) Control (calibrator) 15.0 16.5 Tumor (test) 12.0 15.9 1) ∆Ctcalibrator = 15.0 – 16.5 = -1.5 and ∆Cttest = 12.0 – 15.9 = -3.9 2) ∆∆Ct = ∆Cttest – ∆ctcalibrator = -3.9 – (-1.5) = -2.4 3) 2-∆∆Ct = 2-(-2.4) = 5.3 Tumor cells express p53 at a 5.3-fold higher level than control cells TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION: Pfaffl Methods ● If difference in PCR efficiencies > 10%, between the reference gene and the target gene ⇒ ∆∆Ct method is inaccurate ● The value used is calculated with Pfaffl method ∆Cttarget(calibrator −test ) RQ = E E ∆Ct target reference (calibrator −test ) reference where Egene : is the efficieny of the target, gene =target or refence ∆Ct target (calibrator − test ) = Cttarget (calibrator ) − Cttarget (test ) TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 RELATIVE QUANTIFICATION: Relative Standard Curve Method ● It is used to determine changes in amount of a given sample relative to another, internal, control sample ● Does NOT require standards with known concentrations 1) Normalize the target gene to the reference gene Qty SampleTest Qty target (test ) reference Calibrator = (test ) Qty Qty target (calibrator ) reference (calibrator ) 2) Normalize the sample test to the calibrator Qty RQ = Qty target target (test ) Qty reference (test ) (calibrator ) Qty reference (calibrator ) TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 DATA ANALYSIS: Relative Standard Curve Method Example: TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 NORMALIZATION: Other Methods ● There are many different normalization methods among others ● Geometric mean calculates the average Ct value for each sample, and scales all Ct values according to the ratio of these mean Ct values across samples. ● Scale rank invariant computes the pairwise rank-invariant features, but then takes only the features found in a certain number of samples, and used the average Ct value of those as a scaling factor for correcting all Ct values. ● Normal rank invariant computes all rank-invariant sets of features between pairwise comparisons of each sample against a reference, such as a pseudo-mean. The rankinvariant features are used as a reference for generating a smoothing curve, which is then applied to the entire sample. ● Quantile makes the distribution of Ct values more or less identical across samples. TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 STATISTICAL ANALYSIS ● Two main types of analyses ● Comparative analyses ● Relatively rigorous ● Check a predefined hypotheses ● Relies on statistical testing ● Expression profiling: ● Search for trends and patterns in the data ● Exploratory, hypothesis generating approach ● Less rigorous ● Cluster analysis or PCA TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 STATISTICAL ANALYSIS : Basic Premises ● Statistical analyses of RT-qPCR data relies on three assumptions ● One gene-at-a-time ● We are sampling from two different (unknown) independent populations ● There exist unknown mechanisms that contribute to variability TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 STATISTICAL ANALYSIS: From Assumptions to Strategies ● Use random sampling and randomization to obtain independent and representative samples ● Apply experimental design principles to minimize confounding variability ● Perform statistical testing ● DO NOT FORGET about multiple testing adjustments ● Standard statistical approach: ● Confirmatory study Reject or ● Accept predefined hypothesis TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 STATISTICAL ANALYSIS: Comparing Two Groups TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 STATISTICAL ANALYSIS: Comparing More Than Groups TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 SOFTWARE SOURCE SOFTWARE ABI DataAssist GeneExpression Biogazelle REST – Relative Expression Software Tool Bioconductor HTqPCR, ddCt,… Integromics StatMiner bioMCC GenEx TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 UEB CAN HELP YOU… TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 REMEMBER!!!! To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. 1. Father of modern Mathematical Statistics and Developer of Experimental Design and ANOVA. TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012 Sir Ronald A.Fisher1