Standard score








Compares the various grading methods in a normal distribution. Includes: Standard deviations, cumulative percentages, percentile equivalents, Z-scores, T-scores


In statistics, the standard score is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured. Observed values above the mean have positive standard scores, while values below the mean have negative standard scores. The standard score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios; see normalization for more).


Standard scores are also called z-values, z-scores, normal scores, and standardized variables. They are most frequently used to compare an observation to a standard normal deviate, though they can be defined without assumptions of normality.


Computing a z-score requires knowing the mean and standard deviation of the complete population to which a data point belongs; if one only has a sample of observations from the population, then the analogous computation with sample mean and sample standard deviation yields the t-statistic.




Contents






  • 1 Calculation from raw score


  • 2 Applications


    • 2.1 Z-test


    • 2.2 Prediction intervals


    • 2.3 Process control


    • 2.4 Comparison of scores measured on different scales: ACT and SAT


    • 2.5 Percent of observations below a z-score


    • 2.6 Cluster analysis and multidimensional scaling


    • 2.7 Principal components analysis


    • 2.8 Relative importance of variables in multiple regression: Standardized regression coefficients




  • 3 Standardizing in mathematical statistics


  • 4 T-score


  • 5 See also


  • 6 References


  • 7 Further reading


  • 8 External links





Calculation from raw score


If the population mean and population standard deviation are known, the standard score of a raw score x[1] is calculated as


z=x−μσ{displaystyle z={x-mu over sigma }}z={x-mu  over sigma }

where:




μ is the mean of the population.


σ is the standard deviation of the population.


The absolute value of z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.


Calculating z using this formula requires the population mean and the population standard deviation, not the sample mean or sample deviation. But knowing the true mean and standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured.


When the population mean and the population standard deviation are unknown, the standard score may be calculated using the sample mean and sample standard deviation as estimates of the population values.[2][3][4][5]


In these cases, the z score is


z=x−S{displaystyle z={x-{bar {x}} over S}}{displaystyle z={x-{bar {x}} over S}}

where:




{displaystyle {bar {x}}}{bar {x}} is the mean of the sample.

S is the standard deviation of the sample.



Applications



Z-test



The z-score is often used in the z-test in standardized testing – the analog of the Student's t-test for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.



Prediction intervals



The standard score can be used in the calculation of prediction intervals. A prediction interval [L,U], consisting of a lower endpoint designated L and an upper endpoint designated U, is an interval such that a future observation X will lie in the interval with high probability γ{displaystyle gamma }gamma , i.e.


P(L<X<U)=γ,{displaystyle P(L<X<U)=gamma ,}P(L<X<U)=gamma ,

For the standard score Z of X it gives:[6]


P(L−μσ<Z<U−μσ)=γ.{displaystyle Pleft({frac {L-mu }{sigma }}<Z<{frac {U-mu }{sigma }}right)=gamma .}Pleft({frac {L-mu }{sigma }}<Z<{frac {U-mu }{sigma }}right)=gamma .

By determining the quantile z such that


P(−z<Z<z)=γ{displaystyle Pleft(-z<Z<zright)=gamma }Pleft(-z<Z<zright)=gamma

it follows:


L=μ, U=μ+zσ{displaystyle L=mu -zsigma , U=mu +zsigma }{displaystyle L=mu -zsigma , U=mu +zsigma }


Process control


In process control applications, the Z value provides an assessment of how off-target a process is operating.



Comparison of scores measured on different scales: ACT and SAT


When scores are measured on different scales, they may be converted to z-scores to aid comparison.[7] give the following example comparing student scores on the SAT and ACT high school tests. The table shows the mean and standard deviation for total score on the SAT and ACT. Suppose that student A scored 1800 on the SAT, and student B scored 24 on the ACT. Which student performed better relative to other test-takers?



















SAT
ACT
Mean
1500
21
Standard deviation
300
5

The z-score for student A is z=x−μσ=1800−1500300=1{displaystyle z={x-mu over sigma }={1800-1500 over 300}=1}{displaystyle z={x-mu  over sigma }={1800-1500 over 300}=1}


The z-score for student B is z=x−μσ=24−215=0.6{displaystyle z={x-mu over sigma }={24-21 over 5}=0.6}{displaystyle z={x-mu  over sigma }={24-21 over 5}=0.6}


Because student A has a higher z-score than student B, student A performed better compared to other test-takers than did student B.



Percent of observations below a z-score


Continuing the example of ACT and SAT scores, if it can be further assumed that both ACT and SAT scores are normally distributed (which is approximately correct), then the z-scores may be used to calculate the percent of test-takers who received lower scores than students A and B. The R function pnorm() and the Excel function NORM.DIST() give the probability that a random observation from a normal distribution will have a z score less than a specified z score.


For Student A, with a z score of 1 on the SAT, NORM.DIST(1,0,1,TRUE) = 0.84, indicating that 84% of students taking the SAT scored lower than Student A. For Student B, with a z score of 0.6 on the ACT, NORM.DIST(0.6,0,1,TRUE) = 0.73, indicating that 73% of students taking the ACT scored lower than Student B.



Cluster analysis and multidimensional scaling


"For some multivariate techniques such as multidimensional scaling and cluster analysis, the concept of distance between the units in the data is often of considerable interest and importance … When the variables in a multivariate data set are on different scales, it makes more sense to calculate the distances after some form of standardization."[8]



Principal components analysis


In principal components analysis, "Variables measured on different scales or on a common scale with widely differing ranges are often standardized."[9]



Relative importance of variables in multiple regression: Standardized regression coefficients


Standardization of variables prior to multiple regression analysis is sometimes used as an aid to interpretation.[10]
(page 95) state the following.


"The standardized regression slope is the slope in the regression equation if X and Y are standardized… Standardization of X and Y is done by subtracting the respective means from each set of observations and dividing by the respective standard deviations… In multiple regression, where several X variables are used, the standardized regression coefficients quantify the relative contribution of each X variable."


However, Kutner et al.[11] (p 278) give the following caveat: "… one must be cautious about interpreting any regression coefficients, whether standardized or not. The reason is that when the predictor variables are correlated among themselves, … the regression coefficients are affected by the other predictor variables in the model … The magnitudes of the standardized regression coefficients are affected not only by the presence of correlations among the predictor variables but also by the spacings of the observations on each of these variables. Sometimes these spacings may be quite arbitrary. Hence, it is ordinarily not wise to interpret the magnitudes of standardized regression coefficients as reflecting the comparative importance of the predictor variables."



Standardizing in mathematical statistics



In mathematical statistics, a random variable X is standardized by subtracting its expected value E⁡[X]{displaystyle operatorname {E} [X]}operatorname {E} [X] and dividing the difference by its standard deviation σ(X)=Var⁡(X):{displaystyle sigma (X)={sqrt {operatorname {Var} (X)}}:}sigma (X)={sqrt {operatorname {Var} (X)}}:


Z=X−E⁡[X]σ(X){displaystyle Z={X-operatorname {E} [X] over sigma (X)}}Z={X-operatorname {E} [X] over sigma (X)}

If the random variable under consideration is the sample mean of a random sample  X1,…,Xn{displaystyle X_{1},dots ,X_{n}} X_{1},dots ,X_{n} of X:


=1n∑i=1nXi{displaystyle {bar {X}}={1 over n}sum _{i=1}^{n}X_{i}}{bar {X}}={1 over n}sum _{i=1}^{n}X_{i}

then the standardized version is



Z=X¯E⁡[X]σ(X)/n{displaystyle Z={frac {{bar {X}}-operatorname {E} [X]}{sigma (X)/{sqrt {n}}}}}{displaystyle Z={frac {{bar {X}}-operatorname {E} [X]}{sigma (X)/{sqrt {n}}}}}.


T-score



In educational assessment, T-score is a standard score Z shifted and scaled to have a mean of 50 and a standard deviation of 10.[12][13][14]


In bone density measurements, the T-score is the standard score of the measurement compared to the population of healthy 30-year-old adults.[15]



See also



  • Omega ratio

  • Standard normal deviate



References





  1. ^ E. Kreyszig (1979). Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7..mw-parser-output cite.citation{font-style:inherit}.mw-parser-output q{quotes:"""""""'""'"}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-lock-limited a,.mw-parser-output .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}


  2. ^ Spiegel, Murray R.; Stephens, Larry J (2008), Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5


  3. ^ Mendenhall, William; Sincich, Terry (2007), Statistics for Engineering and the Sciences (Fifth ed.), Pearson / Prentice Hall, ISBN 978-0131877061


  4. ^ Glantz, Stanton A.; Slinker, Bryan K.; Neilands, Torsten B. (2016), Primer of Applied Regression & Analysis of Variance (Third ed.), McGraw Hill, ISBN 978-0071824118


  5. ^ Aho, Ken A. (2014), Foundational and Applied Statistics for Biologists (First ed.), Chapman & Hall / CRC Press, ISBN 978-1439873380


  6. ^ E. Kreyszig (1979). Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 6. ISBN 0-471-02140-7.


  7. ^ Diez, David; Barr, Christopher; Çetinkaya-Rundel, Mine (2012), OpenIntro Statistics (Second ed.), openintro.org


  8. ^ Everitt, Brian; Hothorn, Torsten J (2011), An Introduction to Applied Multivariate Analysis with R, Springer, ISBN 978-1441996497


  9. ^ Johnson, Richard; Wichern, Wichern (2007), Applied Multivariate Statistical Analysis, Pearson / Prentice Hall


  10. ^ Afifi, Abdelmonem; May, Susanne K.; Clark, Virginia A. (2012), Practical Multivariate Analysis (Fifth ed.), Chapman & Hall/CRC, ISBN 978-1439816806


  11. ^ Kutner, Michael; Nachtsheim, Christopher; Neter, John (204), Applied Linear Regression Models (Fourth ed.), McGraw Hill, ISBN 978-0073014661


  12. ^ John Salvia; James Ysseldyke; Sara Witmer (29 January 2009). Assessment: In Special and Inclusive Education. Cengage Learning. pp. 43–. ISBN 0-547-13437-1.


  13. ^ Edward S. Neukrug; R. Charles Fawcett (1 January 2014). Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists. Cengage Learning. pp. 133–. ISBN 978-1-305-16183-2.


  14. ^ Randy W. Kamphaus (16 August 2005). Clinical Assessment of Child and Adolescent Intelligence. Springer. pp. 123–. ISBN 978-0-387-26299-4.


  15. ^
    "Bone Mass Measurement: What the Numbers Mean". NIH Osteoporosis and Related Bone Diseases National Resource Center. National Institute of Health. Retrieved 5 August 2017.





Further reading




  • Carroll, Susan Rovezzi; Carroll, David J. (2002). Statistics Made Simple for School Leaders (illustrated ed.). Rowman & Littlefield. ISBN 978-0-8108-4322-6. Retrieved 7 June 2009.


  • Larsen, Richard J.; Marx, Morris L. (2000). An Introduction to Mathematical Statistics and Its Applications (Third ed.). p. 282. ISBN 0-13-922303-7.



External links



  • Interactive Flash on the z-scores and the probabilities of the normal curve by Jim Reed








Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python