Shapiro-Wilk Test: Normality Testing Explained

Shapiro-Wilk Original Test Basic Concepts We present the original approach to performing the Shapiro-Wilk Test. This approach is limited to samples between 3 and 50 elements. By clicking here you can also review a revised approach using the algorithm of J. P. Royston which can handle samples with up to 5,000 (or even more). The basic approach used in the Shapiro-Wilk (SW) test for normality is as follows:   Arrange the data in ascending order so that x1 ≤ … ≤ xn. Calculate SS as follows:  If n is even, let m = n/2, while if n is odd let m = (n–1)/2 Calculate b as follows, taking the ai weights from Table 1 (based on the value of n) in the Shapiro-Wilk Tables. Note that if n is odd, the median data value is not used in the calculation of b.  Calculate the test statistic W = b2 ⁄ SS  Find the value in Table 2 of the Shapiro-Wilk Tables (for a given value of n) that is closest to W, interpolating if necessary. This is the p-value for the test. For example, suppose W = .975 and n = 10. Based on Table 2 of the Shapiro-Wilk Tables the p-value for the test is somewhere between .90 (W = .972) and .95 (W = .978).  Examples Example 1: A random sample of 12 people is taken from a large population. The ages of the people in the sample are shown in column A of the worksheet in Figure 1. Is this data normally distributed? Figure 1 – Shapiro-Wilk test for Example 1 We begin by sorting the data in column A using Data > Sort & Filter|Sort (see Sorting and Filtering) or the Real Statistics QSORT function (see Sorting and Removing Duplicates), putting the results in column B. We next look up the coefficient values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Tables, putting these values in column E. Corresponding to each of these 6 coefficients a1,…,a6, we calculate the values x12 – x1, …, x7 – x6, where xi is the ith data element in sorted order. E.g. since x1 = 35 and x12 = 86, we place the difference 86 – 35 = 51 in cell H5 (the same row as the cell containing coefficient a1). Column I contains the product of the coefficients and difference values. E.g. cell I5 contains the formula =E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell E14). We next calculate SS as DEVSQ(B4:B15) = 2008.667 (cell E13). Thus W = b2 ⁄ SS = 44.1641^2/2008.667 = .971026 (cell E15). We now look for .971026 when n = 12 in Table 2 of the Shapiro-Wilk Tables and find that the p-value lies between .50 and .90. The W value for .5 is .943 and the W value for .9 is .973. Interpolating .971026 between these value (using linear interpolation), we arrive at p-value = .873681. Since p-value = .87 > .05 = α, we retain the null hypothesis that the data are normally distributed. Since this p-value is based on linear interpolation, it is not very accurate, but the important thing is that it is much higher than the alpha value, and so we can retain the null hypothesis that the data is normally distributed. Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests for Normality and Symmetry (repeated in column A of Figure 2) are normally distributed. Figure 2 – Shapiro-Wilk test for Example 2 As we can see from the analysis in Figure 2, p-value = .0432 < .05 = α, and so we reject the null hypothesis and conclude with 95% confidence that the data are not normally distributed, which is quite different from the results using the KS test that we found in Example 2 of Kolmogorov-Smironov Test, but consistent with the QQ plot shown in Figure 5 of that webpage. Real Statistics Support Real Statistics Function: The Real Statistics Resource Pack contains the following functions. SHAPIRO(R1, FALSE) = the Shapiro-Wilk test statistic W for the data in R1 SWTEST(R1, FALSE, h) = p-value of the Shapiro-Wilk test on the data in R1 SWCoeff(n, j, FALSE) = the jth coefficient for samples of size n SWCoeff(R1, C1, FALSE) = the coefficient corresponding to cell C1 within sorted range R1 SWPROB(n, W, FALSE, interp) = p-value of the Shapiro-Wilk test for a sample of size n for test statistic W The functions SHAPIRO and SWTEST ignore all empty and non-numeric cells. The range R1 in SWCoeff(R1, C1, FALSE) should not contain any empty or non-numeric cells. When performing the table lookup, the default is to use the recommended type of interpolation (interp = TRUE). To use linear interpolation, set interp to FALSE. See Interpolation for details. For Example 1 of Chi-square Test for Normality, SHAPIRO(A4:A15, FALSE) = .874 and SWTEST(A4:A15, FALSE, FALSE) = SWPROB(15,.874,FALSE,FALSE) = .0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality). Note that SHAPIRO(R1, TRUE), SWTEST(R1, TRUE), SWCoeff(n, j, TRUE), SWCoeff(R1, C1, TRUE) and SWPROB(n, W, TRUE) refer to the results using the Royston algorithm, as described in Shapiro-Wilk Expanded Test. For compatibility with the Royston version of SWCoeff, when j ≤ n/2 then SWCoeff(n, j, False) = the negative of the value of the jth coefficient for samples of size n found in the Shapiro-Wilk Tables. When j = (n+1)/2, SWCoeff(n, j, FALSE) = 0 and when j > (n+1)/2, SWCoeff(n, j, FALSE) = -SWCoeff(n, n–j+1, FALSE). Reference Shapiro, S.S. & Wilk, M.B. (1965) An analysis of variance for normality (complete samples). Biometrika, Vol. 52, No. 3/4. http://webspace.ship.edu/pgmarr/Geo441/Readings/Shapiro%20and%20Wi lk%201965%20%20An%20Analysis%20of%20Variance%20Test%20for%20Normality.pdf

Shapiro-Wilk Test: Normality Testing Explained

Documentos relacionados

Productos

Apoyo

Shapiro-Wilk Test: Normality Testing Explained

Documentos relacionados

Añadir este documento a la recogida (s)

Añadir a este documento guardado

Sugiéranos cómo mejorar StudyLib