Data AnalysisYour submitted document should include the following items. Points will be deducted if the following are not included.Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document.Number your pages across your entire solutions document.Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document.Generate all requested graphs and tables using StatCrunch.Upload your document onto Blackboard as a Word (docx) file or pdf file using the linkprovided by your instructor. It is your responsibility for uploading a readable file.Full assignment Instructions, as well as a example is attached as a word file.Access to StatCrunch is required.https://www.statcrunch.com/5.0/group.php?groupid=8220I will provide the login info…Extra Notes:- Each graph title should start with “Distribution of..”- For the questions that require calculation, you can do it on a paper but would have to type the solution into word document.- Please complete each part

Data AnalysisYour submitted document should include the following items. Points will be deducted if

Problem 1: Appropriateness of Inference

For the following scenarios, answer the questions for each part. In each part, the underlined text

is the name of the StatCrunch data set to be used for that part. Please note, do not conduct

inference in either of these parts; just answer each question.

a) Food Prices: Target versus Safeway. Grocery prices of the same randomly selected

items were collected and compared from Target and Safeway. Imagine you were

interested in conducting a hypothesis test to determine whether the mean prices were

significantly different. Note: to answer the questions below, subtract Target price –

Safeway price (i.e. subtract Safeway price from Target price).

i) What is (are) the parameter(s) of interest? Choose one of the following symbols

( (the mean of one sample) D (the mean difference from a paired (dependent)

samples) − 2 (the mean difference of two independent samples) and describe the

parameter in context of this question in one sentence.

ii) Depending on your answer to part (i), construct one or two relative frequency

histograms. Remember to properly title and label the graph(s). Copy and paste these

graphs into your document.

iii) Describe the shape of the histogram(s) in one sentence.

iv) Depending on your answer to part (i), construct one or two boxplots and copy and

paste these graphs into your document.

v) Does the boxplot (or do the boxplots) show any outliers? Answer this question in one

sentence and identify any outliers if they are present.

vi) Considering your answers to parts (iii) and (v), is inference appropriate in this case?

Why or why not? Defend your answer using the graphs in two to three sentences.

b) GMU Health Center Waiting Time. During the flu season, it is known that the waiting

time at the GMU Health Center can be extreme. A statistics student wanted to test her

claim that the wait time was greater than 100 minutes. She took a random sample of wait

times during the flu season and recorded them in StatCrunch.

i) What is (are) the parameter(s) of interest? Choose one of the following symbols

( (the mean of one sample) D (the mean difference of two paired (dependent)

samples) − 2 (the mean difference of two independent samples) and describe the

parameter in context of this question in one sentence.

ii) Depending on your answer to part (i), construct one or two relative frequency

histograms. Remember to properly title and label the graph(s). Copy and paste the

graph(s) into your document.

iii) Describe the shape of the histogram(s) in one sentence.

iv) Depending on your answer to part (i), construct one or two boxplots and copy and

paste these graphs into your document.

v) Does the boxplot (or do the boxplots) show any outliers? Answer this question in one

sentence and identify any outliers if they are present.

vi) Considering the answers provided in parts (iii) and (v), is inference appropriate in this

case? Why or why not? Defend your answer using the graphs in two to three

sentences.

Problem 2: GPA of Students Depending on Where They Sit.

A professor wanted to know whether there was a difference in students’ grade point averages

(GPA) depending on whether they sit in the front half of the classroom versus the back half of

the classroom. In a previous semester, a random sample of students was selected from the front

of a classroom and another random sample was selected from the back of a classroom and the

student’s current GPA was recorded. The data provided in StatCrunch represent the GPAs from

each random sample. The file is called “GPA Versus Seating Location.” At the 0.01

significance level, can the professor conclude from these data that the mean GPA for front sitters

is higher than back sitters? Assume all conditions for conducting inference are satisfied.

Conduct a full hypothesis test by following the steps below. Enter an answer for each of

these steps in your document.

a) Define the population parameter of interest in context of this question in one

sentence.

b) State the null and alternative hypotheses using correct notation.

c) State the significance level for this problem.

d) Calculate the test statistic in StatCrunch using STAT → T Stats → 2 Sample →

With Data. Copy and paste the output table into your document.

e) Label the p-value seen in your output table produced in part (iv) using the

probability notation (it begins with P(…)).

f) State whether you reject or do not reject the null hypothesis and your reason for

your answer in one sentence.

g) State your conclusion in context of the problem (i.e. interpret your results and/or

answer the question being posed) in one or two complete sentences.

Problem 3: Next page

Problem 3: Metal Hardness Testing

The manufacturer of hardness testing equipment uses steel-ball indenters to indent metal that is

being tested. However, the manufacturer thinks there might be a difference in hardness reading

when using a diamond indenter. The metal specimens to be tested are large enough so that two

indentations can be made. Therefore, the manufacturer wants to use both indenters on each

specimen and compare the readings. The order of the indentations will be random. This

particular design is called the paired design (or matched pairs design or dependent samples

design). Assume all conditions are satisfied in this problem. The data set used for this problem

is called “Metal Hardness Testing”.

a) Calculate the difference between specimens by subtracting Steel Ball – Diamond. For

example, the first difference is 51 – 52 = -1. List the difference for each of the 14 pairs in

your document.

b) For the first piece of metal, which indenter produced the larger hardness reading?

Answer this question in a complete sentence.

c) Obtain the mean of these differences and the standard deviation of these differences in

StatCrunch. You may copy and paste the box that you obtain from StatCrunch or list the

values. Please round these values to four decimal places.

d) Construct a 95% confidence interval using the above data. Please do this “by hand”

using the formula and showing your work (please type your work). Use your t-table

(found in the last page of our formula packet) to obtain your t* critical value needed for

the confidence interval. Present this confidence as (lower limit, upper limit)

e) Use StatCrunch to obtain a 95% confidence interval for the above data by selecting:

Stat → T Stats → Paired. Enter Steel Ball for Sample 1 and Diamond for Sample 2.

Copy and paste your output into your document.

f) Does your confidence interval capture 0? Answer this question and briefly explain what

this implies in one or two sentences in the context of the question.

g) Using your answer to part (g), imagine you were using a hypothesis test to determine if a

significant difference exists in mean hardness reading between the two indenters (the

hypotheses would be H0: D = 0 vs Ha: D ≠ 0). What decision and conclusion can be

made in this case? Provide an answer and a reason for your choice in one or two

sentences. Please only use your confidence interval to answer this question (i.e. do not

run this hypothesis test).

Problem 4: Next page

Problem 4: Lego Prices

The data set named “Lego Prices” contains a selection of Lego sets sold on the Lego website in

August 2016. The goal of this problem is to explore one variable (the number of Pieces a set

contains) that may help a buyer predict the price of a Lego Set. The Price variable is the

response variable in this problem.

a) Investigate the relationship between the explanatory variable “Pieces” and response

variable “Price” by doing the following:

i) Make a scatterplot and copy and paste it in your solutions (use Graph → Scatter

Plot in StatCrunch).

ii) Calculate the correlation coefficient (use Stat → Summary Stats → Correlation in

StatCrunch). Provide this value in your document.

iii) Interpret the scatterplot and correlation coefficient in terms of trend, strength, and

shape (form) in one complete sentence.

b) Using the “Pieces” variable as the explanatory variable, run a Simple Linear Regression

analysis in StatCrunch. Use Stat → Regression → Simple Linear. Copy and paste only

the StatCrunch results output (no tables).

c) Add the fitted line plot to your document. This graph appears on page 2 of your output.

d) Type the regression equation into your document.

e) Interpret the slope of the regression line (in context of this data set).

f) Is it meaningful to interpret the y-intercept? Why or why not?

g) State r-squared (i.e., the coefficient of determination) and explain what this value means

in context of the data set.

h) Use the regression equation from part (d) to predict the price of a randomly selected set

containing 556 pieces. State your predicted value in a sentence that is in context of the

data. Do not forget to mention the units. Note: You can do this calculation “by hand” or

using StatCrunch.

i) Is your prediction in part (h) an example of extrapolation? Why or why not?

Sample Solution to Display Formatting

Problem X: Students’ Grades

A random sample of 30 students was selected from a STAT 250 course taught during the

summer session and their first exam scores were recorded.

a) Create a histogram in StatCrunch. Be sure to title and label it correctly.

b) Interpret the histogram’s shape

See sample solution and formatting on page 2.

Notes about submission

Following the main points will help you submit a professionally completed assignment.

Kenneth Strazzeri

STAT 250-0xx (your correct section)

Data Analysis Assignment 1

Problem X

a)

b) The shape of this distribution is left skewed because I see the majority of the data values

falling in the upper end of the distribution and a few 50s and 60s skewing the shape. There does

not seem to be any outliers visible on the graph.

…

