顯著差異

統計學嘅假說檢定^[1]^[2]所講嘅顯著差異（或統計學意義，英文：statistical significance，符號：ρ）係對數據之間差別嘅評價，一次實驗結果喺虛無假設之下冇乜可能發生時，就可以講呢個結果具有顯著差異。更準確嚟講，譬如某項研究定咗個數值α（顯著水平），表示虛無假設原本係啱嘅但係俾研究者拒絕咗嘅出錯概率^[3]，跟住用p值表示虛無假設正確嘅條件下得到某結果抑或仲極端過呢個結果嘅情形嘅概率^[4]。如果 $p ⩽ α$ ，噉就可以認為結果具有統計學意義，或數據之間有咗顯著差異。^[5]^[6]^[7]^[8]^[9]^[10]^[11]顯著水平應該喺開始搵數據前就諗掂，習慣係定喺5%^[12]或以下，唔同學科領域可能要求唔一樣。^[13]

喺任何涉及到響總體當中隨機揀樣本嘅實驗或觀察性研究裏，得到嘅結果都有可能只不過係抽樣誤差導致嘅。^[14]^[15]但係，如果一次觀察結果嘅p值細過（或等於）顯著水平α，研究者就能夠講「今次結果反映到總體嘅特徵」嘅結論^[1]，並拒絕虛無假設^[16]。

顯著差異嘅原因可能係：

呢種差異可能因參與比對嘅數據係嚟自不同實驗對象，如比－西一般能力測驗中，大學學歷被試組嘅成績同小學學歷被試組會有顯著差異。
亦可能嚟自於實驗處理對實驗對象造成根本性狀改變，因而前測後測嘅數據會有顯著差異。例如，記憶術研究發現，被試者學習某記憶法之前嘅成績同學咗記憶法後嘅記憶成績會有顯著差異，呢個差異好可能來自於學呢種記憶法對被試記憶能力嘅改變。

歷史

18世紀嗰時就有人提出顯著差異，約翰·阿巴思諾特（英文：John Arbuthnot）同皮埃爾-西蒙·拉普拉斯做出男女出世嘅概率一樣呢個虛無假設，跟住計咗人類出世時嘅性別比嘅p值。^[17]^[18]^[19]^[20]^[21]^[22]^[23]

1925年，羅納德·費沙（英文：Ronald Fisher）喺佢本書《研究工作者的統計方法（英文：Statistical Methods for Research Workers）》當中提出咗統計假說檢定嘅諗法，叫做「顯著性檢驗」（tests of significance）。^[24]^[25]^[26]費沙建議將1/20（=0.05）嘅概率作為拒絕虛無假說嘅臨界值。^[27]喺1933年嘅一篇論文中，耶日·內曼和埃貢·皮爾遜嗌呢個值做「顯著水平」，畀咗個符號 $α$ 畀佢。佢哋建議， $α$ 值要喺任何資料收集之前就諗定。^[27]^[28]

費雪初初係將顯著水平定喺0.05，但又唔想將佢定死。佢喺1956年出版嘅《統計方法與科學推斷》裏面，建議根據具體情況確定顯著水平。^[27]

顯著水平

在雙尾檢驗（英文：one- and two-tailed tests）中，顯著水平

α = 0.05

下嘅拒絕域分別喺抽樣分佈兩頭最尾，佔曲線底下面積嘅5%。

顯著水平（significance level，符號：α）經常攞嚟喺假說檢定中睇假設同實驗結果係咪一致，佢代表虛無假設（寫成 $H_{0}$ ）冇錯嘅情況下，噉啱就將佢否定咗嘅概率，即係發生第一型錯誤（棄真錯誤、α錯誤）嘅機會。

譬如我哋響兩個總體裏便隨機揀出A、B兩組樣本數據，然後發現佢哋喺.05水平上具備顯著差異，就係講兩組數據所代表嘅總體亦都有顯著差異嘅可能性係95%；而佢哋代表嘅總體重有5%嘅可能性係冇乜分別嘅，呢個5%係由於抽樣誤差造成嘅。亦都可以噉講：

如果拒絕「兩組數據一致（冇乜分別）」呢個虛無假設，噉會有5%嘅可能性犯第一型錯誤。
如果令A=兩個總體冇乜分別、B=揀出嚟兩組數有顯著差異， $P(A|B) = 0.05$ 。

如果我哋喺檢驗某實驗（Hypothesis Test）中測得嘅數據之間有顯著差異，就推翻到虛無假設，備擇假設則得到支持；反之，如果數據之間冇顯著差異，就推翻備擇假設，而唔拒絕虛無假設。通常情況下，實驗結果達到.05水平或.01水平，先至可以認為數據之間具備顯著差異，唔係嘅話就可能好似上邊講嘅咁會作出錯誤嘅判斷。作結論嗰時，要講清楚方向性（例如係顯著大過定係顯著細過）。

數學表述為：引入p值（p-value）作為檢驗樣本（test statistic）觀察值嘅最低顯著差異水平。喺 $α = 0.01$ 或 $α = 0.05$ 嘅情況下，若果虛無假設情況實際算得嘅概率 $p$ 細過 $α$ ，噉表示虛無假設成立時得到噉嘅結果嘅概率，重低過1%或5%，喺呢個顯著水平之下我哋可以拒絕虛無假設。

P(X=x)<ρ=0.05係「顯著（significant）」，統計分析軟件SPSS以*標記；
P(X=x)<ρ=0.01係「極顯著（extremely significant）」，通常以**標記。

睇埋

參考

↑ ^1.0 ^1.1 Sirkin, R. Mark (2005). "Two-sample t tests". Statistics for the Social Sciences (第3版). Thousand Oaks, CA: SAGE Publications, Inc. pp. 271–316. ISBN 978-1-412-90546-6.
↑ Borror, Connie M. (2009). "Statistical decision making". The Certified Quality Engineer Handbook (第3版). Milwaukee, WI: ASQ Quality Press. pp. 418–472. ISBN 978-0-873-89745-7.
↑ ^3.0 ^3.1 Dalgaard, Peter (2008). "Power and the computation of sample size". Introductory Statistics with R. Statistics and Computing. New York: Springer. pp. 155–56. doi:10.1007/978-0-387-79054-1_9. ISBN 978-0-387-79053-4.
↑ "Statistical Hypothesis Testing". www.dartmouth.edu. 原著喺2020-08-02歸檔. 喺2019-11-11搵到.
↑ Johnson, Valen E. (October 9, 2013). "Revised standards for statistical evidence". Proceedings of the National Academy of Sciences. 110 (48): 19313–19317. Bibcode:2013PNAS..11019313J. doi:10.1073/pnas.1313476110. PMC 3845140. PMID 24218581.
↑ Redmond, Carol; Colton, Theodore (2001). "Clinical significance versus statistical significance". Biostatistics in Clinical Trials. Wiley Reference Series in Biostatistics (第3版). West Sussex, United Kingdom: John Wiley & Sons Ltd. pp. 35–36. ISBN 978-0-471-82211-0.
↑ Cumming, Geoff (2012). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. New York, USA: Routledge. pp. 27–28.
↑ Krzywinski, Martin; Altman, Naomi (30 October 2013). "Points of significance: Significance, P values and t-tests". Nature Methods. 10 (11): 1041–1042. doi:10.1038/nmeth.2698. PMID 24344377.
↑ Sham, Pak C.; Purcell, Shaun M (17 April 2014). "Statistical power and significance testing in large-scale genetic studies". Nature Reviews Genetics. 15 (5): 335–346. doi:10.1038/nrg3706. PMID 24739678. S2CID 10961123.
↑ Altman, Douglas G. (1999). Practical Statistics for Medical Research. New York, USA: Chapman & Hall/CRC. pp. 167. ISBN 978-0412276309.
↑ Devore, Jay L. (2011). Probability and Statistics for Engineering and the Sciences (第8版). Boston, MA: Cengage Learning. pp. 300–344. ISBN 978-0-538-73352-6.
↑ Craparo, Robert M. (2007). "Significance level". 出自 Salkind, Neil J. (編). Encyclopedia of Measurement and Statistics.第3卷. Thousand Oaks, CA: SAGE Publications. pp. 889–891. ISBN 978-1-412-91611-0.
↑ Sproull, Natalie L. (2002). "Hypothesis testing". Handbook of Research Methods: A Guide for Practitioners and Students in the Social Science (第2版). Lanham, MD: Scarecrow Press, Inc. pp. 49–64. ISBN 978-0-810-84486-5.
↑ Babbie, Earl R. (2013). "The logic of sampling". The Practice of Social Research (第13版). Belmont, CA: Cengage Learning. pp. 185–226. ISBN 978-1-133-04979-1.
↑ Faherty, Vincent (2008). "Probability and statistical significance". Compassionate Statistics: Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS) (第1版). Thousand Oaks, CA: SAGE Publications, Inc. pp. 127–138. ISBN 978-1-412-93982-9.
↑ McKillup, Steve (2006). "Probability helps you make a decision about your results". Statistics Explained: An Introductory Guide for Life Scientists (第1版). Cambridge, United Kingdom: Cambridge University Press. pp. 44–56. ISBN 978-0-521-54316-3.
↑ Brian, Éric; Jaisson, Marie (2007). "Physico-Theology and Mathematics (1710–1794)". The Descent of Human Sex Ratio at Birth. Springer Science & Business Media. pp. 1–25. ISBN 978-1-4020-6036-6.
↑ John Arbuthnot (1710). "An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes" (PDF). Philosophical Transactions of the Royal Society of London. 27 (325–336): 186–190. doi:10.1098/rstl.1710.0011.
↑ Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (第Third版), Wiley, pp. 157–176, ISBN 978-0-471-16068-7
↑ Sprent, P. (1989), Applied Nonparametric Statistical Methods (第Second版), Chapman & Hall, ISBN 978-0-412-44980-2
↑ Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. pp. 225–226. ISBN 978-0-67440341-3.
↑ Bellhouse, P. (2001), "John Arbuthnot", in Statisticians of the Centuries by C.C. Heyde and E. Seneta, Springer, pp. 39–42, ISBN 978-0-387-95329-8
↑ Hald, Anders (1998), "Chapter 4. Chance or Design: Tests of Significance", A History of Mathematical Statistics from 1750 to 1930, Wiley, p. 65
↑ Cumming, Geoff (2011). "From null hypothesis significance to testing effect sizes". Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Multivariate Applications Series. East Sussex, United Kingdom: Routledge. pp. 21–52. ISBN 978-0-415-87968-2.
↑ Fisher, Ronald A. (1925). Statistical Methods for Research Workers. Edinburgh, UK: Oliver and Boyd. pp. 43. ISBN 978-0-050-02170-5.
↑ Poletiek, Fenna H. (2001). "Formal theories of testing". Hypothesis-testing Behaviour. Essays in Cognitive Psychology (第1版). East Sussex, United Kingdom: Psychology Press. pp. 29–48. ISBN 978-1-841-69159-6.
↑ ^27.0 ^27.1 ^27.2 Quinn, Geoffrey R.; Keough, Michael J. (2002). Experimental Design and Data Analysis for Biologists (第1版). Cambridge, UK: Cambridge University Press. pp. 46–69. ISBN 978-0-521-00976-8.
↑ Neyman, J.; Pearson, E.S. (1933). "The testing of statistical hypotheses in relation to probabilities a priori". Mathematical Proceedings of the Cambridge Philosophical Society. 29 (4): 492–510. Bibcode:1933PCPS...29..492N. doi:10.1017/S030500410001152X.
↑ "Conclusions about statistical significance are possible with the help of the confidence interval. If the confidence interval does not include the value of zero effect, it can be assumed that there is a statistically significant result." Prel, Jean-Baptist du; Hommel, Gerhard; Röhrig, Bernd; Blettner, Maria (2009). "Confidence Interval or P-Value?". Deutsches Ärzteblatt Online. 106 (19): 335–9. doi:10.3238/arztebl.2009.0335. PMC 2689604. PMID 19547734.
↑ StatNews #73: Overlapping Confidence Intervals and Statistical Significance
↑ Neyman, J. (1937). "Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability". Philosophical Transactions of the Royal Society A（英文：Philosophical Transactions of the Royal Society A）. 236 (767): 333–380. Bibcode:1937RSPTA.236..333N. doi:10.1098/rsta.1937.0005. JSTOR 91337.

[Sirkin-1] 1.0 ^1.1 Sirkin, R. Mark (2005). "Two-sample t tests". Statistics for the Social Sciences (第3版). Thousand Oaks, CA: SAGE Publications, Inc. pp. 271–316. ISBN 978-1-412-90546-6.

[Borror-2] Borror, Connie M. (2009). "Statistical decision making". The Certified Quality Engineer Handbook (第3版). Milwaukee, WI: ASQ Quality Press. pp. 418–472. ISBN 978-0-873-89745-7.

[Dalgaard-3] 3.0 ^3.1 Dalgaard, Peter (2008). "Power and the computation of sample size". Introductory Statistics with R. Statistics and Computing. New York: Springer. pp. 155–56. doi:10.1007/978-0-387-79054-1_9. ISBN 978-0-387-79053-4.

[:0-4] "Statistical Hypothesis Testing". www.dartmouth.edu. 原著喺2020-08-02歸檔. 喺2019-11-11搵到.

[Johnson-5] Johnson, Valen E. (October 9, 2013). "Revised standards for statistical evidence". Proceedings of the National Academy of Sciences. 110 (48): 19313–19317. Bibcode:2013PNAS..11019313J. doi:10.1073/pnas.1313476110. PMC 3845140. PMID 24218581.

[Redmond_and_Colton-6] Redmond, Carol; Colton, Theodore (2001). "Clinical significance versus statistical significance". Biostatistics in Clinical Trials. Wiley Reference Series in Biostatistics (第3版). West Sussex, United Kingdom: John Wiley & Sons Ltd. pp. 35–36. ISBN 978-0-471-82211-0.

[Cumming-p27-7] Cumming, Geoff (2012). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. New York, USA: Routledge. pp. 27–28.

[Krzywinski_and_Altman-8] Krzywinski, Martin; Altman, Naomi (30 October 2013). "Points of significance: Significance, P values and t-tests". Nature Methods. 10 (11): 1041–1042. doi:10.1038/nmeth.2698. PMID 24344377.

[Sham_and_Purcell-9] Sham, Pak C.; Purcell, Shaun M (17 April 2014). "Statistical power and significance testing in large-scale genetic studies". Nature Reviews Genetics. 15 (5): 335–346. doi:10.1038/nrg3706. PMID 24739678. S2CID 10961123.

[Altman-10] Altman, Douglas G. (1999). Practical Statistics for Medical Research. New York, USA: Chapman & Hall/CRC. pp. 167. ISBN 978-0412276309.

[Devore-11] Devore, Jay L. (2011). Probability and Statistics for Engineering and the Sciences (第8版). Boston, MA: Cengage Learning. pp. 300–344. ISBN 978-0-538-73352-6.

[Salkind-12] Craparo, Robert M. (2007). "Significance level". 出自 Salkind, Neil J. (編). Encyclopedia of Measurement and Statistics.第3卷. Thousand Oaks, CA: SAGE Publications. pp. 889–891. ISBN 978-1-412-91611-0.

[Sproull-13] Sproull, Natalie L. (2002). "Hypothesis testing". Handbook of Research Methods: A Guide for Practitioners and Students in the Social Science (第2版). Lanham, MD: Scarecrow Press, Inc. pp. 49–64. ISBN 978-0-810-84486-5.

[Babbie2-14] Babbie, Earl R. (2013). "The logic of sampling". The Practice of Social Research (第13版). Belmont, CA: Cengage Learning. pp. 185–226. ISBN 978-1-133-04979-1.

[Faherty-15] Faherty, Vincent (2008). "Probability and statistical significance". Compassionate Statistics: Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS) (第1版). Thousand Oaks, CA: SAGE Publications, Inc. pp. 127–138. ISBN 978-1-412-93982-9.

[McKillup-16] McKillup, Steve (2006). "Probability helps you make a decision about your results". Statistics Explained: An Introductory Guide for Life Scientists (第1版). Cambridge, United Kingdom: Cambridge University Press. pp. 44–56. ISBN 978-0-521-54316-3.

[17] Brian, Éric; Jaisson, Marie (2007). "Physico-Theology and Mathematics (1710–1794)". The Descent of Human Sex Ratio at Birth. Springer Science & Business Media. pp. 1–25. ISBN 978-1-4020-6036-6.

[18] John Arbuthnot (1710). "An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes" (PDF). Philosophical Transactions of the Royal Society of London. 27 (325–336): 186–190. doi:10.1098/rstl.1710.0011.

[Conover1999-19] Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (第Third版), Wiley, pp. 157–176, ISBN 978-0-471-16068-7

[Sprent1989-20] Sprent, P. (1989), Applied Nonparametric Statistical Methods (第Second版), Chapman & Hall, ISBN 978-0-412-44980-2

[21] Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. pp. 225–226. ISBN 978-0-67440341-3.

[Bellhouse2001-22] Bellhouse, P. (2001), "John Arbuthnot", in Statisticians of the Centuries by C.C. Heyde and E. Seneta, Springer, pp. 39–42, ISBN 978-0-387-95329-8

[Hald1998-23] Hald, Anders (1998), "Chapter 4. Chance or Design: Tests of Significance", A History of Mathematical Statistics from 1750 to 1930, Wiley, p. 65

[Cumming-24] Cumming, Geoff (2011). "From null hypothesis significance to testing effect sizes". Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Multivariate Applications Series. East Sussex, United Kingdom: Routledge. pp. 21–52. ISBN 978-0-415-87968-2.

[Fisher1925-25] Fisher, Ronald A. (1925). Statistical Methods for Research Workers. Edinburgh, UK: Oliver and Boyd. pp. 43. ISBN 978-0-050-02170-5.

[Poletiek-26] Poletiek, Fenna H. (2001). "Formal theories of testing". Hypothesis-testing Behaviour. Essays in Cognitive Psychology (第1版). East Sussex, United Kingdom: Psychology Press. pp. 29–48. ISBN 978-1-841-69159-6.

[Quinn-27] 27.0 ^27.1 ^27.2 Quinn, Geoffrey R.; Keough, Michael J. (2002). Experimental Design and Data Analysis for Biologists (第1版). Cambridge, UK: Cambridge University Press. pp. 46–69. ISBN 978-0-521-00976-8.

[Neyman-28] Neyman, J.; Pearson, E.S. (1933). "The testing of statistical hypotheses in relation to probabilities a priori". Mathematical Proceedings of the Cambridge Philosophical Society. 29 (4): 492–510. Bibcode:1933PCPS...29..492N. doi:10.1017/S030500410001152X.

[29] "Conclusions about statistical significance are possible with the help of the confidence interval. If the confidence interval does not include the value of zero effect, it can be assumed that there is a statistically significant result." Prel, Jean-Baptist du; Hommel, Gerhard; Röhrig, Bernd; Blettner, Maria (2009). "Confidence Interval or P-Value?". Deutsches Ärzteblatt Online. 106 (19): 335–9. doi:10.3238/arztebl.2009.0335. PMC 2689604. PMID 19547734.

[30] StatNews #73: Overlapping Confidence Intervals and Statistical Significance

[Neyman1937-31] Neyman, J. (1937). "Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability". Philosophical Transactions of the Royal Society A（英文：Philosophical Transactions of the Royal Society A）. 236 (767): 333–380. Bibcode:1937RSPTA.236..333N. doi:10.1098/rsta.1937.0005. JSTOR 91337.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

顯著差異

目錄

歷史

相關概念

顯著水平

睇埋

參考