The Statistics of Psychology

This portion of the textbook describes the basic statistical terms used in research. Included are definitions, examples, and real life applications of each term intended to help those who read not only understand statistical terminology but better interpret data and research papers.

The Statistics of Psychology 

Hallie Skinner - 04/10/2024

As we search for truth through empirical thinking and research, it’s highly likely that we will encounter various sets of data that we will need to interpret. The more we can understand the statistics of psychology and science and the terms that go with it, the better we will be at understanding the meaning of research projects and results and how they apply to the real world. Especially in education, we will be able to deduce meaning from the statistics of what is most effective in helping children and people at all levels learn and develop into the person they have the potential to be. This chapter of the textbook will define many key statistical terms, how they are used, and real life examples that we can relate to to better understand them.


Alpha

The alpha, also referred to as the level of significance, is a value chosen before a study to represent the probability of attaining your results due to chance. It is the highest p value that researchers are willing to tolerate and still say that an effect or difference in a sample is statistically significant. An easier way to understand Alpha is through a pole vaulting analogy. In pole vaulting, a bar is set as the standard that an athlete needs to clear. Likewise, alpha is the standard set by a scientist that determines what the probability is that a result was attained by chance. Additionally, it also represents the probability of rejecting a null hypothesis when it is true. (The null hypothesis states that no statistical significance exists in a set of observations.) Typically scientists choose .05 or 5% to be their alpha, however that is not always the rule. If the consequences of a false positive, or rejecting the null hypothesis, are more severe, such as in medical research, researchers will lower the alpha. For example, researchers could choose .01 as their alpha to decrease the chances of a false positive. 

Another way to phrase what an alpha entails would be to say, “I want to be 95% confident in avoiding rejection of the null hypothesis and creating a false positive, so I will select an alpha value of .05 or 5%.” 


P Values

A p value measures how rare the results of an experiment are and is therefore determined when results are received, or after an experiment. For instance, if after an experiment your p value was .02 then that means out of 100 experiments conducted, you would see rare results 2% of the time. Any p value less than .05 (a typical p value) is considered statistically significant- meaning that the finding is not due to chance. 

To understand what a p value is, we also need to understand what a null hypothesis and alpha are (refer to the next section to understand alpha). The null hypothesis states that no statistical significance exists in a set of observations. If the p value of a data set is greater than the alpha, the experiment is typical and you do not reject the null hypothesis. However, if the p value is less than the alpha, researchers reject the null hypothesis and can conclude that there is statistical significance in a data set. So why would we need to know if an experiment yields statistical significance? An important example would be medical research on the effects of a certain type of drug on a sickness. If results of a drug trial come back with statistical significance obtained by measuring and interpreting p value, then researchers would know that that drug had an effect on the illness and would be able to replicate that research to help those affected by said illness. This is dense material with a lot of big words and intertwined concepts so here is a video that will be helpful in understanding p values and what they determine in a real life scenario: 

https://www.youtube.com/watch?v=vemZtEM63GY 


Statistical Power

Statistical Power is the likelihood of a significance test detecting an effect in a study when there actually is one. The higher the power, the more likely the test is to detect an effect, and the lower the power, the less likely it is to detect an effect. A significance test with low power leads to results being distorted by random and systematic error. Enough data needs to be collected from a sample to statistically test whether the null hypothesis can reasonably be rejected in favor of an alternative hypothesis. In testing, high statistical power is needed so researchers can draw accurate conclusions about a population in sample data. Because of this, the power of a significance test is often set at 80%, meaning 80/100 statistical tests will detect an effect in a study or experiment. Why is this important? If the power isn’t high enough and effects aren’t detected it can be a waste of money and time in a study and could be unethical to collect data from participants in a trial. In saying that, it is also important to note that if the statistical power is set much higher than 80, a test may be overly sensitive and detect very small effects that do have statistical significance but little practical usefulness. 


Pearson’s R Coefficient

 

Pearson R Coefficient is a number between -1 and 1 that measures the strength and direction of a relationship between two variables and describes the characteristics of a data set. Any number between 0 and 1 indicates a positive correlation, meaning when one variable changes, the other variable changes in the same direction. For example, a baby's weight and length generally have a positive correlation, meaning that as they get taller they also start to weigh more. A negative correlation, any number between 0 and -1, means that when one variable changes, the other variable goes in the opposite direction. An example of this is as elevation increases, air pressure decreases. 

Here are some graphs that will help you visualize what Pearson’s R looks like!

r=1

r >.5, r <.5 


.3> r > 0

r= 0


Cohen's D

Cohen’s D is used when comparing two groups and the difference between them, or comparing 2 means to each other. A good example of this in real life is comparing the effectiveness in medications– medication A is more effective than medication B. We can also see this in education– intervention A led to better results in student achievement than intervention B. As measured by Cohen’s D an effect size is small if d= 0.2, medium if d= 0.5, and large if d is greater than or equal to 0.8. Cohen's D can help us understand if an effect is practically significant, or meaningful in real life scenarios– if one medication or intervention does more to help than another and therefore should be used more. 


Statistical and Practical Significance


“Just because something it true doesn't mean it matters all that much”


In interpreting the effect of a medication, treatment, intervention, or any experiment, the scientist needs to know the statistical and practical significance of the results attained. Just knowing one or the other is not enough, but knowing both can help one see if there is in fact an effect happening, and if it is worth applying to real life and real people. Simply put, statistical significance indicates that there is an effect shown in a study (represented by the P value) and practical significance indicates that there are applications of the effect in the real world (effect sizes). 

An example of this principle is related to vaccines. If a vaccine shows statistical significance but not any practical significance, as shown by the effect size, a company may not want to distribute that vaccine widely, as it has little practical use.Companies can use effect sizes, or the practical significance, to determine if they want to move forward with a vaccine or return to more research and development. Another example could be related to lessened college tuition leading to higher paying jobs. If lessened college tuition only raises a person’s earnings by $100, it may be wise to consider if it is worth the investment to lower tuition. 


Interpreting Effect Size

Effect size describes how meaningful the difference between variables or groups is. In other words, it is the practical significance of a finding or results. A large effect size means that a research finding has practical significance while a small effect size means that there are limited practical applications. Interpreting the effect size also includes understanding the strength and direction of the relationship between two variables.


Effect Size 

Cohen’s D

Pearson's R

Small 

0.2

0.1 to 0.3 or -0.1 to -0.3

Medium 

0.5

0.3 to 0.5 or -0.3 to -0.5

Large 

0.8 or greater 

0.5 or greater or -0.5 or less



*the closer the value is to 0, the smaller the effect size, while values closer to -1 or 1 indicate a higher effect size



Law of Large Numbers and Sample Sizes

The law of large numbers states that as a sample size grows, it becomes more representative of the whole population. 

Example: Imagine you want to estimate the average height of all the students in your school. You could measure the heights of a few students, maybe 2% of the student body, and the average height you calculate would not be very accurate. However, if you were to measure the height of a larger group of students, maybe 40% of the student body, the average height you calculated would be more representative of the whole student body. 


Type 1 and Type 2 Errors 

Type 1 and 2 errors relate to the alpha standard and rejecting or accepting the null hypothesis (the null hypothesis states that no statistical significance exists in a set of observations). 

Scientists can reduce the chance of errors by increasing the sample size. The larger a sample size is, the smaller the chance that it will differ substantially from a whole population and give misleading results. 


Descriptive vs. Inferential Statistics 

Descriptive statistics report the features or characteristics of data and provide insights about the data through numerical or graphic representations. Descriptions include measures of central tendency such as the mean, median, or mode of a data set. This also includes measures of dispersion such as standard deviation or range, and the frequency of a particular outcome in a data set.

Inferential statistics are used to make conclusions about a whole set of data from smaller samples. This involves analyzing a random sample from a much broader dataset and then applying the conclusions to the whole population. Inferential statistics can reveal correlations, probabilities, and other relationships in a data set. 


Standard Measure of Center: Mean , Median, Mode, and Percentiles

So why do we use percentiles? They are helpful in comparing scores or measurements, where percentages can’t do the same. For instance, when a baby has a check up with the doctor, they might be in the 98th percentile for height and weight of babies their age. This can be helpful for parents and doctors to know if their baby is healthy compared to children their age or if they are scoring significantly lower than others. A similar process is true for test scores in school and knowing how well a student is doing compared to other peers in their class. 

Here is a helpful video explaining how to calculate percentiles and put them in context: 

https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap/percentiles-cumulative-relative-frequency/v/calculating-percentile 


Standard Measures of Spread: Range and Standard Deviation 

Extra knowledge: How to calculate Standard Deviation as described in https://www.nlm.nih.gov/oet/ed/stats/02-900.html#:~:text=A%20standard%20deviation%20(or%20%CF%83,data%20are%20more%20spread%20out. 

“In this formula, σ is the standard deviation, xi is each individual data point in the set, µ is the mean, and N is the total number of data points. In the equation, xi, represents each individual data point, so if you have 10 data points, subtract x1 (first data point) from the mean and then square the absolute value. This process is continued all the way through x10 (last data point). The results are then summed (symbolized as Σ), which is the numerator of the fraction from the equation.” 



The following are concepts that are not discussed in this portion of the textbook, but can be added in and would be helpful!

Not required for the class, but interesting:




This content is provided to you freely by BYU-I Books.

Access it online or download it at https://books.byui.edu/development_motivati/the_statistics_of_psychology.