Lesson 10: Making Predictions From Data
Seek Learning by Study and by Faith
As you seek to follow Jesus Christ and develop Christlike attributes, you will be filled with charity. An ancient American prophet named Mormon invited, “Wherefore, my beloved brethren, pray unto the Father with all the energy of heart, that ye may be filled with this love, which he hath bestowed upon all who are true followers of his Son, Jesus Christ; that ye may become the sons of God; that when he shall appear we shall be like him, for we shall see him as he is; that we may have this hope; that we may be purified even as he is pure. Amen” (Moroni 7:48). True disciples of Christ emulate Him.
Introduction
Data analysis can uncover the truth about a current scenario, a future event, or an event that happened in the past.
Elder David A. Bednar stated, “I long have been impressed with the simple and clear definition of truth outlined in the Book of Mormon: The Spirit speaketh the truth and lieth not. Wherefore, it speaketh of things as they are, and of things as they really will be; wherefore, these things are manifested unto us plainly, for the salvation of our souls” (Elder David A. Bednar, Things as They Really Are).
While we live in a time where God has revealed many truths, there are still many things that God has not yet revealed. As individuals, our futures are still full of uncertainty. This lesson focuses on analyzing data to make predictions, which gives us quantitative tools to understand things “as they are” and predict things “as they really will be” (Jacob 4:13).
Input and Output Variables
One effective method of understanding the truth is by using what we know to predict or explain what we don't know. We do this using input and output variables.
Mathematicians and statisticians use different names to refer to input and output variables. In this lesson, these variables are inputs and outputs. It can be helpful to know the other terms for variables as well.
Input Variable | Output Variable |
x-variable | y-variable |
independent variable | dependent variable |
explanatory variable | response variable |
predictor variable | response variable |
In Lesson 8, you read about Sarah, a woman trying to improve her health. She used the function \( f(x)=6x+1860 \), where the input variable, \( x \), represented the number of minutes she walked each day, and the output variable, \( f(x) \) or \( y \), represented her total caloric expenditure for the day. The number of calories burned varied depending on the duration of her walk, and the length of her walk could determine or predict the calories burned. In simpler terms, she had the freedom to choose how long she walked, while the total calories burned depended on the duration of her walk. This instance highlights alternative terms for input and output variables. As stated previously, this lesson will continue to use only the terms input and output for variables.
When two variables might be related, use data analysis to explore relationships between the input variable and the output variable. Create a scatter plot with the input variable on the horizontal axis (x-axis) and the output variable on the vertical axis (y-axis). Looking at the scatter plot helps you see if a pattern or trend exists that can be described using a function.
Trendlines
When data shows a pattern or trend, we can use the data to make predictions. The predictions may be about some future event, an estimate of what happened in the past, or even a statement about where things are currently. We often use trendlines to describe a trend or correlation in data.
Although the term trendline includes the word "line," trendlines are not always lines. They can also be curved. We will be using linear trendlines, quadratic trendlines, and exponential trendlines.
Community Service Example
A study conducted in 2014 showed that “Americans who actively work to better their communities have higher overall well-being than those who do not” (Americans Serving Their Communities Gain Well-Being Edge). Although the actual data of the study is confidential and cannot be shared here, the following data is recreated to reflect the results of the actual study. This fabricated data shows the score earned by 10 fictitious individuals on a well-being index and the number of hours of community service they have completed in the past month.
Well-Being Index Score (Scale of 0–100) | Hours of Community Service Completed in the Past Month |
32 | 0.5 |
43 | 0 |
47 | 1 |
58 | 2.5 |
61 | 3.5 |
72 | 8 |
76 | 6.5 |
87 | 4 |
92 | 10 |
To study the trend this data shows, use Excel to create a scatterplot. Decide if the trend is linear, quadratic, or exponential. Then add the appropriate trendline and equation to the graph.

Looking at the graph, you can see that all the points don’t lie directly on the line. However, the line does show the general trend of the data. The general relationship between the Well-Being Index Score and the Hours of Community Service is described by the function:
where the input, \( x \), represents the number of hours of community service completed in the past month, and the output, \( y \), represents the score on the well-being index.
We can use this function to predict the well-being score for individuals based on the number of hours of community service they have completed in the previous month.
For individuals who have completed 10 hours of community service in the past month,
For individuals who have completed 5 hours of community service in the past month,
On average, individuals who complete 10 hours of community service have a higher well-being score than individuals who complete 5 hours of community service.
Excel Instructions: Trendlines
Follow each of the five steps below to learn how to create the community service scatterplot and trendline with the following data:
| Hours of Community Service | Well-Being Index Score |
| 0.5 | 32 |
| 0 | 43 |
| 1 | 47 |
| 2.5 | 58 |
| 3.5 | 61 |
| 8 | 72 |
| 6.5 | 76 |
| 4 | 87 |
| 10 | 92 |
Step 1: Create an Excel file that lists the input variable in the first column and the output variable in the second column.

Step 2:
- Highlight the data you want to graph.
- Select Insert.
- Select the Scatterplot icon and the Scatter option.

Step 3: Once the scatterplot has been created, do the following:
- Right-click on one of the points.
- From the right-click menu, select Add Trendline.

Step 4: From the Format Trendline menu, do the following:
- Because the scatterplot for these data show a linear trend, select Linear (Select Polynomial to create a quadratic trendline or Exponential to create an exponential trendline.)
- Check the box to Display Equation on Chart.

Step 5: Add chart element (see 1), such as axis titles and chart titles, to create a finished graph (see 2).

Practice Creating Trendline
Practice Problem 1
The following data gives the height and shoe size for each student in a high school class. The height is measured in inches and the shoe sizes are in men's US sizes.
Please enter this data into Excel. Then, follow the "Excel Instruction: Trendlines" given above to create a linear trendline that describes these data.
| Height | Shoe Size |
| 71 | 9.5 |
| 70 | 10 |
| 70 | 11 |
| 62 | 6.5 |
| 71.5 | 11.5 |
| 61 | 4.5 |
| 71 | 12 |
| 61 | 4.5 |
| 67 | 10 |
| 67 | 10 |
| 66 | 11 |
| 66 | 8 |
| 64 | 5.5 |
| 65 | 9 |
Enter the equation of the trendline below:
y = x +
Correlation versus Causation
If you are not careful, trendlines can lead to a common flaw in logical reasoning.
When a scatterplot shows a trend or relationship between two variables, the variables are correlated. However, this does not mean that changes in the input variable cause a change in the output variable.
A causal relationship implies that alterations in one variable lead to changes in another variable. On the other hand, when variables are correlated, there exists an alternative explanation for the relationship, aside from causation. Here are a few illustrative instances.
Eating Ice Cream Causes Shark Attacks
Here is an example. The following scatterplot and trendline show the relationship between monthly ice cream sales and the number of shark attacks per month. (Ice Cream Sales Cause Shark Attacks. A Real Estate Perspective.↩)

This scatterplot shows a linear trend where increased monthly ice cream sales correspond with a rise in shark attacks. While the graph shows a correlation between ice cream sales and shark attacks, you cannot conclude that eating ice cream causes shark attacks.
The correlation seen in these data has a more rational explanation. Because of warmer weather, people are more likely to eat ice cream in the summer than in the winter. Ice cream sales are higher during the summer months. Similarly, warmer weather during the summer means that people are more likely to go to the ocean and go swimming. Due to the increased number of people in the water, and the migratory patterns of sharks, shark attacks are more common during the summer than in the winter. So, the relationship between ice cream sales and shark attacks is due to summer: both are more common during the summer when the temperature is warmer. This alternative explanation explains the trend we see in the data.
The sharks and ice cream example requires an alternative explanation. However, no matter what the data represents, remember that seeing a correlation or trend in the data does not tell us that there is a causal relationship (where changes in one variable cause a change in the other variable). Here is an example where it is more tempting to say a causal relationship exists. We first saw this example in Lesson 3.
Correlation without Causation
A research study published in 2013 examined the effect that drinking soda (or other sugary drinks) has on cardiovascular health. The data examined in this study showed a trend where men who drank more soda had a higher risk of a heart attack. However, a CBSNews article reporting on the study states that “other doctors caution that just because there may be a link between sugary drinks and heart attacks does not mean sugar is causing them.” (Soda a Day May Lead to Heart Attacks in Men↩). The article points out that men who drink sugary drinks consume them with burgers, fries, and other unhealthy food choices. Having a generally unhealthy diet and other poor health habits may be the actual cause of increased heart attack risk.
Here are more examples of correlation without causation:
Ice cream sales and drowning incidents both increase during the summer months. However, it would be incorrect to conclude that eating ice cream causes more drownings. The common factor is the warm weather, which leads to both higher ice cream sales and more people going swimming, thus increasing the drowning incidents.
There is a positive correlation between the number of firefighters at a scene and the extent of property damage. However, it would be fallacious to assume that having more firefighters causes more damage. The larger fire incidents naturally require more firefighters to respond, leading to a correlation but not a causal relationship.
Students who score higher on standardized tests tend to have higher shoe sizes. This correlation does not suggest that having larger feet improves test scores or vice versa. The link is coincidental and stems from the fact that older students tend to have both higher test scores and larger shoe sizes.
As the number of storks increases in a region, the birth rate rises. However, this is not evidence that storks deliver babies. The correlation arises from the fact that both the number of storks and the birth rate are influenced by the size of the population or habitat suitability.
Community Service and Well-Being
We should apply the rule that correlation does not imply causation to the community service example from the previous section. This example shows a trend where people who do more community service tend to have higher scores on a well-being index. However, you cannot conclude that community service causes them to have better well-being. We just know there is a correlation and that an individual who does more community service is more likely to have a higher score on the well-being index. It might be the community service that causes a feeling of well-being, but there might also be an alternative explanation. The information we have is not enough to tell.
Linear, Quadratic, or Exponential?
Excel can create linear, quadratic, and exponential trendlines. To decide which of these three trendline types would apply in a particular situation, look at the shape of the scatterplot. If the general shape of the points is close to being a line, use a linear trendline. If the shape of the graph resembles a U-shaped parabola, use a quadratic trendline. If the shape is nonlinear and increasing (or decreasing), use an exponential trendline.
Use linear trendlines with linear data that has a constant rate of change. Use quadratic and exponential trendlines with nonlinear data with a variable rate of change.
Practice Problem 2
Compare the graphs below with the linear, quadratic, and exponential trendlines. After comparing the three trendlines, select the one that best fits this data.
Linear: 
Quadratic: 
Exponential: 
What type of trendline best fits this data?
Exponential
Linear
Quadratic
Excel Writing Exponential Functions
Because of the rules of exponents (reviewed in Lesson 1), exponential functions can be written in a variety of different ways. For example:
Therefore, even though the equations \( f(x)=2^{-x} \) and \( f(x)=(\frac{1}{2})^x \) look different from one another, they represent the same function and have the same graph.
Excel uses the exponential number \( e \) when giving the equation for an exponential function. The exponential number \( e \) is a constant that is approximately equal to 2.7182. Like \( π \), the exponential number \( e \) is a non-terminating, non-repeating decimal. Because \( e \) is a number that shows up naturally in many applications, it has a unique name and symbol.
We can use the rules of exponents to show \( f(x)=e^{1.098612x} \) and \( f(x)=3^x \) are different equations that represent the same function.
E vs. e in Excel
View these instructional videos to learn the difference between 'e' and 'E' in Excel and to understand the methods for handling equations involving these notations.
Using EXP in Excel to evaluate a quadratic trendline equation
Trendline equations with e and E
Predictions from Trendlines
Creating a scatterplot and trendline provides a way to describe the relationship between two variables. You can use this function to make predictions. The following example demonstrates how to use trendline predictions as part of the Quantitative Reasoning Process.
1. Understand the Problem
Carolina lives in Buenos Aires. She has a 10-year-old Fiat Cronos. She has paid for the vehicle. Now Carolina is saving money each month to purchase her next car. She currently has ARG$9,000 saved.

The new car Carolina wants to buy will cost ARG$35,300, including all taxes and fees. Carolina needs to determine whether she will get enough money from selling her old Fiat Cronos to purchase the new car.
2. Identify Variables & Assumptions
Carolina identifies the following key variables:
The total cost of the new car.
The condition of the Fiat Cronos she is selling.
The amount of money from selling her Fiat Cronos.
She realizes she is making the following assumptions:
She will pay cash for her new car and not borrow money.
She can use all the money from the sale of the Fiat Cronos to purchase her new car.
She can buy a new car for ARG$35,300.
She has ARG$9,000 to use to purchase her new car.
If she can sell her Fiat Cronos for at least ARG$26,300, she can buy a new car.
3. Apply Quantitative Tools
Carolina knows she needs to be able to sell her 10-year-old Fiat Cronos for at least ARG$26,300. She finds several Fiat Cronos for sale, but none are the same model year as hers. She isn’t sure how much she will earn by selling her old car. Carolina creates a spreadsheet listing the age and sale price of Fiat Cronos in her area with similar features and condition as her Fiat Cronos.
The scatterplot shows how the value of the Fiat Cronos changes over time. After looking at the scatter plot, Carolina realizes that since the data is nonlinear, an exponential trendline will better describe the relationship between the age and sale price of Fiat Cronos.

From the scatterplot shown above, we see that the function relating to age and sale price of Fiat Cronos is \( y=18422e^{-0.1x} \) where \( x \) represents the car's age and \( y \) represents the sale price.
Carolina uses this equation to predict how much she will get from selling her 10-year-old car:
Rounding up, Carolina concludes that she can sell her car for around ARG$6,800. However, she realizes that this prediction provides an average price and that there will be variations in the selling prices of a Fiat Cronos. The car may sell for a little more or less than this predicted price.
4. Make an Informed Decision
Carolina has ARG$9,000 saved. If she sells her car for ARG$6,800, she will have a total of ARG$15,800. Carolina needs at least ARG$35,300 to purchase the new car. She will likely be about ARG$18,500 short of the amount she needs to purchase the new car. Carolina decides to wait to sell her car until she has saved more money.
5. Evaluate Your Reasoning
As Carolina reflects on her decision, she is disappointed. Because she still wants to purchase a new car, she looks at her options again. She finds a less expensive car that she likes that will meet her needs. The alternative vehicle is only ARG$16,000 and fits within her current budget. She revises her decision and goes ahead and puts her Fiat Cronos up for sale. She lists the sale price as ARG$7,800, so she still has room to negotiate with potential buyers about the price.
Practice Making a Prediction
Practice Problem 3
Earlier in this lesson, you found the following trendline:
Use the trendline to predict the shoe size of a student who is 68 inches tall.
Predicted Shoe Size =
Solution:
To predict the shoe size of a student who is 68 inches tall, we enter 68 as the input of the trendline function:
\( y = 0.6057x - 31.555 \)
\( y = 0.6057(68) - 31.555 \)
\( y = 41.1876 - 31.555 \)
\( y = 9.6326 \)
The student would need a shoe size of about 9.5 or 10.
Example: Old Faithful
Old Faithful is the most well-known feature of Yellowstone National Park. The travel arcticle About Old Faithful boasts that Old Faithful is “the most famous geyser in the world.”
Old Faithful’s name comes from its predictable eruptions. Visitors to the park come from all over the world to see the eruption of Old Faithful. The geyser is so predictable that the park posts signs letting visitors know how many minutes until the next eruption. Other geysers in Yellowstone and elsewhere erupt less frequently. Their eruptions are much more difficult to predict.
The scatterplot below shows how the park uses the length of the previous eruption to determine how long it will be until the next eruption. The scatterplot shows the general pattern found in actual data recorded by researchers for past eruptions.

Notice that if the previous eruption of Old Faithful lasted for 4.0 minutes, you would predict the time until the next eruption to be anywhere from 68–90 minutes. See the blue points in the plot below.
The red points on the scatterplot predict that if the previous eruption lasts 2.0 minutes, you can expect a waiting time between 42 and 66 minutes until the next eruption. The longer the previous eruption lasts, the longer you will wait until the next eruption.

If you can fit a mathematical model to the data, then you can create an equation that will allow you to predict the average waiting time for the next eruption based on the length of the current eruption.
Note that the general trend in the data is linear. A line will be the most appropriate mathematical model for this data.

The mathematical model we get from this data is the equation \( y=33.47+10.73x \), where the input variable, \( x \), represents the length of the most recent eruption, and the output variable, y, represents the waiting time for the next eruption. If the most recent eruption lasted 4.0 minutes, then the model would estimate that the next eruption will occur in
The Park Rangers can add 76.39 minutes to the time at which the previous eruption ends, and then post a sign such as this list of recent Old Faithful Activity near Old Faithful stating when the next eruption will occur. As shown in the plot above, the actual waiting time could be anywhere from 68–90 minutes. The Rangers can let visitors know that the eruption time could be a little earlier or a little later than the predicted 76.39 minutes from the end of the previous eruption.
Practice Problem 2
On a visit to Yellowstone National Park, you time the eruption of the Old Faithful Geyser. The geyser erupted for 3.7 minutes, and the eruption ended at 3:14 p.m.
Use the model we found above to give a prediction for the time at which the geyser will erupt next.
Time of Next Eruption:
Time of Next Eruption: 4:27 p.m.
Example: Snow Machines in Yellowstone
Yellowstone National Park officials must conserve natural resources, while at the same time allowing for use of the Park by the public. Conservation and public use can be at odds with one another. The use of snow machines in Yellowstone is an example of where there has been a lot of controversy.

Yellowstone National Park requires snow machines to have the best available technology (BAT) to protect the Parks’ environment. Older snow machines have less efficient engines that emit more hydrocarbons and carbon monoxide, which harm the environment. However, restricting the types of snow machines restricts winter access to Yellowstone.
The officials at the National Park Service used data analysis to help make decisions related to the conservation and use of Yellowstone National Park, such as snow machine use. The following example outlines the process used by Yellowstone officials to determine the winter use policy in Yellowstone.
1. Understand the Problem
By the late 1990s, the number of snow machines entering Yellowstone during the winter had increased to the point that there was a noticeable impact on air quality. In 1999, an environmental group petitioned the National Park Service and asked them to ban recreational snowmobiling in Yellowstone and all other National Parks. In 2000, in response to noticeable impacts on air quality and the request by an environmental group, the National Park Service decided to phase out the use of snow machines in Yellowstone National Park. In 2001, the National Park Service was sued over their proposed ban on snow machines. Because the National Park Service has a legal mandate to allow for public use of park resources, a federal court overturned Yellowstone’s snow machine ban. Yellowstone officials then decided to carry out several studies to determine the best way to balance environmental conservation and public use of the park.
2. Identify Variables & Assumptions
The key variables identified by researchers were as follows:
The types of vehicles entering the park
The number of vehicles entering the park
Hydrocarbon emissions
Carbon monoxide emissions
Types of snow machine engines (2-stroke engines, 4-stroke engines, BAT engines)
The researchers made several assumptions:
Hydrocarbon and carbon monoxide are harmful to the environment.
Decreasing the number of vehicles entering the park will decrease hydrocarbon and carbon monoxide emissions.
BAT snow machines have lower hydrocarbon and carbon monoxide emissions than non-BAT snow machines.
Traveling with a guide will reduce harmful emissions.
3. Apply Quantitative Tools
Between 2002 and 2011, the National Park Service released eight different reports on the impact of winter use on air quality. The following explanation shows a part of the data examined by Park officials as they used data to make decisions on Park policy.
At the beginning of their research in 2000, researchers compared carbon monoxide and hydrocarbon emissions from different vehicle types entering Yellowstone. The following bar chart shows the estimated annual emissions from automobiles, RVs, snow machines, snow coaches, and buses from 1992 through 1999.

They also looked at pie charts that showed the source of all annual hydrocarbons and carbon monoxide emissions:


It is clear from the bar and pie charts that snow machines are the largest source of these harmful emissions. Park officials decided that, although snow machines should not be banned altogether in Yellowstone, changes to the winter use policy did need to occur.
They gathered data on how the type of engine a snow machine had impacted the carbon monoxide emissions.
Type of snow machine engine | Number of snow machines entering the Park | CO (ppm) |
2-stroke | 20 | 0.36 |
2-stroke | 22 | 1.72 |
2-stroke | 28 | 2.41 |
2-stroke | 42 | 3.16 |
2-stroke | 67 | 4.90 |
2-stroke | 470 | 25.73 |
BAT | 110 | 3.20 |
BAT | 140 | 2.78 |
BAT | 160 | 3.02 |
BAT | 470 | 4.89 |
BAT | 701 | 7.64 |
BAT | 2750 | 24.19 |
To test their assumption that snow machines with the best available technology (BAT) have lower emissions than older snow machines with 2-stroke engines, they created a scatterplot of these data that shows trendlines for each type of snow machine.

Comparing the slopes of these two lines gives information on the relationship between carbon monoxide emissions and the two types of snow machines.
For snow machines with 2-stroke engines (older technology), the slope of the line is 0.0538. This tells us that:
Each 2-stroke engine adds 0.0538 ppm of carbon monoxide. Similarly, since the slope of the line for BAT engines is 0.0081, you know that BAT engines add 0.00081 ppm of carbon monoxide. These numbers are small and hard to compare. If you multiply them each by 1,000, we see that 1,000 2-stroke engines would add 53.8 ppm of carbon monoxide while 1,000 BAT engines will only add 8.1 ppm of carbon monoxide.
4. Make an Informed Decision
Based on these calculations (and others not shown here), park officials developed a new policy for winter use in Yellowstone. In 2003 and 2004, new winter use restrictions were put in place. They were again sued over these plans because they restricted winter use in Yellowstone. They again did more research and in 2007, they completed a new plan that allowed for 540 commercially guided, BAT snow machines and 83 snow coaches to enter the park daily.
5. Evaluate Your Reasoning
Between 2007 and 2013 the Park reevaluated their winter use plan several more times. Some of the reevaluation occurred as a result of further suits filed in federal court and some of the reevaluation was based on public comment and input.
In 2011, Park officials gathered data that compared data collected on the number of winter vehicles entering the Park through the West Entrance and carbon monoxide measurements taken near the entrance.
This scatterplot shows how the number of winter vehicles correlated to carbon monoxide measurements. There is a strong linear relationship between more cars entering the park and an increase in carbon monoxide measurements.

The following graph shows a bar chart and a time series plot overlaid on top of each other. It shows how the number of winter vehicles and carbon monoxide measurements have changed over the period the park has been changing their policy.

These graphs clearly show that the changes made in 2003–2011 are helping reduce carbon monoxide emissions in the Park.
In 2013 they revised their policy again based on further study. One common criticism of the plan before 2013 was that the public was not allowed to enter the Park unless they were on a commercially guided tour. Also, the management plan allowed 83 snow coaches per day, but on some days fewer than 83 snow coaches entered the Park while snow machine users were turned away.
Park officials revised their guidelines to allow for flexibility in the ratio of snow machines and snow coaches. If fewer snow coaches enter the park on a given day, they are now able to allow more snow machines to enter. They also added an option for one non-commercially guided group to enter the Park each day.
Most of the examples of the Quantitative Reasoning Process have shown how it can be used by individuals to make decisions about their personal and family lives. This example shows how the Quantitative Reasoning Process was used by the National Park Service. The Quantitative Reasoning Process is commonly used by businesses, government agencies, non-profit organizations, communities, and other organizations. Scientists, mathematicians, statisticians, business people, and other professionals use the Quantitative Reasoning Process to help them make decisions in their work.
Lesson Checklist
By the end of this lesson, you should be able to do the following:
Identify the input and output variables from a given scenario.
Create a scatterplot of two quantitative variables in Excel.
Identify for given data whether a linear, exponential, or quadratic trendline would be the most appropriate.
Fit a linear, exponential, or quadratic trendline to a scatterplot of data in Excel.
Use a trendline to predict the average value of the response variable at a given value of the explanatory variable.
Interpret predictions and draw a contextual conclusion.
Assess the appropriateness of a trendline for given data.
Explain why correlation does not imply causation.
The Hand of the Lord
Our Heavenly Father loves us and blesses us. His love can be seen in personal revelation, answered prayers, or miracles. However, it can also be seen in a moment of inspiration, a feeling of reassurance, or a quiet prompting. Sometimes the Hand of the Lord in our lives may be difficult to recognize and sometimes it is more clear. However, reflecting each day on how the Lord has blessed us will help us better recognize His influence in our lives.
Each day, we invite you to reflect on how the Hand of the Lord was evident in your life and record your thoughts and feelings. Keeping a record of how you have felt or seen the Hand of the Lord can be a blessing for you and your family. It may also bless countless others. As you feel prompted to share how you have seen the Hand of the Lord in your life, please consider posting your experience in the course WhatsApp community. Your post should bless the lives of others and strengthen testimonies that the Lord is actively involved in our lives.
© 2020 Brigham Young University-Idaho