Question 1 (8 marks = 2 + 2 + 2 + 2)
A company is having a new website developed. In the final testing phase the download time to open the new home page is recorded for a large number of computers in home and office settings. The mean download time for the site is 2.5 seconds. Suppose that the download times for the site are normally distributed with a standard deviation of 0.5 seconds. If you select a random sample of 30 download times,
a. What is the probability that the sample mean is less than 2.75 seconds?
b. What is the probability that the sample mean is between 2.70 and 2.90 seconds?
c. The probability is 80% that the sample mean is between what two values symmetrically distributed around the population mean?
d. The probability is 90% that the sample mean is less than what value?
Question 2 (9 marks = 3 + 3 + 3)
At a large south-east Asian airport flights are classified as being on time if they land less than 15 minutes after the scheduled time. A study of airlines using the airport finds that a Middle Eastern airline has the lowest proportion of late flights, 0.149. Suppose you have been asked to perform a follow up study for this airline in order to update the estimated proportion of late arrivals. What sample size would you use to estimate the population proportion to within a an error of
a. ± 0.06 with 95% confidence?
b. ± 0.04 with 95% confidence?
c. ± 0.02 with 95% confidence?
Question 3 (8 marks = 2 + 2 + 2 + 2)
The owner of a petrol station wishes to study the petrol-purchasing habits of customers at her station. You select a random sample of 60 motorists during a certain week with the following results.
• Amount purchased: = 42.8 litres, S = 11.7 litres.
• 11 motorists purchased premium unleaded petrol.
a. At the 0.05 level of significance, is there evidence that the mean purchase is different from 38 litres?
b. Find the p-value in (a).
c. At the 0.05 level of significance, is there evidence that fewer than 20% of all her service station customers purchase premium unleaded petrol?
d. What is your answer to (a) if the sample mean equals 39 litres?
Question 4 (13 marks = 2 + 2 + 2 + 2 + 2 + 2 + 1)
An event organiser would like to predict the number of ‘portaloo’ lavatories required for outdoor entertainment events based on the volume of food to be consumed at the event. The organiser has collected the following data.
Number of lavatories Volume of food consumed (Kg)
a. Plot a scatter diagram for this data. Describe (briefly) the relationship between the volume of food consumed and number of portaloos.
b. Assuming a linear relationship, calculate the regression equation for this data. Please show your calculations or insert your summary output from Excel.
c. Interpret the estimated Y intercept, , and the estimated slope, .
d. Perform residual analyses and determine whether the sample data meet the regression assumptions of equal variance (homoscedasticity) and normality.
e. Is the amount of food consumed a statistically significant predictor of portaloos? Please explain your answer.
f. Comment on the goodness of fit of this model.
g. Predict the number of lavatories if event attendance is 50,000.
Question 5 (12 marks = 2 + 2 + 2 + 3 + 3)
The traffic controller at Sydney’s Kingsford Smith International Airport believes that the main reasons for late aircraft departures are external to the airport management—namely, late aircraft arrivals and mechanical failures. In a report to management discussing this issue the traffic controller decides to test his assertion using data collected from a random sample of 20 days.
No. of delays No. of late arrivals No. of mechanical problems
4 2 3
8 5 6
1 0 1
0 0 1
7 4 3
2 1 1
0 1 1
0 1 1
3 2 1
4 4 0
2 2 0
8 6 2
12 6 5
5 4 2
3 2 1
0 1 0
2 1 1
6 4 2
13 10 3
7 3 4
a. Write out the estimated multiple regression equation. Please show your calculations or insert your summary output from Excel.
b. Interpret the meaning of the slopes b1 and b2 in this equation.
c. Predict the number of delayed flights if four aircraft arrive late and there are three mechanical failures.
d. Comment on the goodness of fit of the estimated regression model.
e. Is there a significant relationship between the number of delays and the two independent variables (number of late arrivals and number of mechanical problems) at the 0.01 level of significance?
On the basis of these results indicate the most appropriate regression model for this data.