Week 7 Confidence interval estimation

1
1
STAM4000
Quantitative Methods
Week 7
Confidence interval estimation
This Photo by Unknown Author is licensed under CC BY-NC
2
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan
Business School pursuant to Part VB of the
Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further
reproduction or communication of this material by you may be the subject of copyright
protection under the Act.
Do not remove this notice.
2

3
ion
#1
#2
#3
Distinguish between point and interval
estimators
Interval estimation of the population
mean
Interval estimation of the population
proportion
Week 7
Confidence
interval estimat
Learning
Outcomes

4
Why does this matter?
This Photo by Unknown Author is licensed under CC BY-SA-NC
In the real
world, we are
concerned
about the
confidence of
our estimates.
This Photo by Unknown Author is licensed under CC BY-SA-NC
5
#1 Distinguish between point and interval estimators
https://www.google.com/search?q=statistics+estimation+mean+comic&tbm=isch&chips=q:statistics+estimation+mean+comic,online_chips:confidence+interval&rlz=1C1CHBF_enAU841AU846&hl=en&sa=X&ved=2ahUKEwjjg8Cr1abuAhXBn0sFHUjjATwQ4lYoA
3oECAEQHQ&biw=1013&bih=433#imgrc=2M__G-tOSP2FZM&imgdii=WinFxTzU1vt0wM

6
#1

โ€ขA statistic, measured at one point in
time, to estimate a parameter.

Point estimator

โ€ขA range of values, based on a statistic
to estimate a parameter.

Interval
estimator
Point and interval estimators
7
Two types of
point estimators
this week
Sample mean,
๐‘ฅาง
to estimate the population mean,
ฮผ
Sample proportion,
๐‘ฦธ
to estimate the population
proportion,
p
#1 Point estimator for ฮผ and p
8
#1 Interval estimators
9
#1 General form of an interval estimator
This Photo by Unknown Author is licensed under CC BY-SA
10
#1 What is the margin of error?
General form of CI:
point estimate ยฑ margin of error
The margin of error:
โ€ข The amount by which, sample results likely
differ from population results.
โ€ข Is related to the sampling error, as we are
using a sample from the population
E.g. Each week, you go to your
local delicatessen and ask for, an
average, of 100 grams
of Provolone cheese.
Say, you are willing to accept
a 10 grams margin of error.
This tells us that the average
cheese you buy lies between
90 grams to 110 grams.
This Photo by Unknown Author is licensed under CC BY-NC-ND
11
#2 Interval estimation of the population mean
https://econtent.frontrange.edu/~dplatt/135comics.htm
12
#2 Confidence interval for the population mean

Confidence Intervals for ฮผ
ฯƒ is known
Use Z distribution
ฯƒ is unknown
Use t distribution

 

๐’™เดฅ ยฑ ๐’›๐’„๐’“๐’Š๐’•
๐ˆ ๐’

 

๐’™เดฅ ยฑ ๐’•๐’„๐’“๐’Š๐’•
๐’” ๐’

13
#2 Conditions to check before creating a CI for ฮผ
https://pixabay.com/photos/bulldog-cute-easter-animal-dog-2952049/

14
#2 The CI for ฮผ, may or may not actually include ฮผ?
CI?
We are only a specified percentage confident, that ฮผ lies in the interval.

Is ยต
in
the

15
#2
The confidence interval formula to estimate ยต, when ฯƒ is KNOWN:
๐‘ฅาง is the sample mean
z
crit is the Z critical value
ฯƒ is the KNOWN population standard deviation
n is the sample size
ยฑ is read as โ€œplus or minusโ€ and may be written as +/- or โˆ“
๐‘ฅาง ยฑ ๐‘ง๐‘๐‘Ÿ๐‘–๐‘ก
๐œŽ ๐‘›
Understand interval estimation for ฮผ when ฯƒ is known

 

16
#2
The percentage (%) of the confidence interval
determines the Z critical value.
Example:
Find the Z critical value for a 95% CI.
Finding Z critical from the Z tables
-1.96 0 1.96 Z
Z crit Z crit
95%
= 0.95
100% = 1
0.975

17
17
#2 Summary table of common Z critical values for CI
18
#2 Margin of error (ME) of a CI for ฮผ when ฯƒ is KNOWN
๐‘ฅาง ยฑ ๐‘ง๐‘๐‘Ÿ๐‘–๐‘ก
๐œŽ ๐‘›
๐‘ฅาง ยฑ ME
ME ME
The size of ME depends on:
โ€ข Z critical, based on the percentage of the CI
โ€ข sample size, n
โ€ข population standard deviation, ฯƒ
19
19
#2 Some facts about confidence interval
โ€ข Holding all else constant, if the sample size (n) is increased, the margin of
error decreases, making the confidence interval narrower. Thus, increasing
the sample size is a way to counteract the loss of precision associated with
high confidence.
โ€ข Holding all else constant, if the percentage of confidence increases, the
critical value increases. This increases the margin of error and the confidence
interval becomes wider.

20
20
#2 Two common interpretations of the CI for ฮผ

Note:
โ€ขUse of โ€œconfidenceโ€
โ€ขUse the name of the variable
โ€ขUse the values of the LCL and UCL
โ€ขInclude units

We are CI% that the population
mean lies between the lower
confident limit and the upper
confidence limit.

Note:
โ€ขThe sample size is the same
โ€ขEach CI is different
โ€ขWe are theoretically creating an infinite
number of CI

If all possible samples of the
same size
n are taken, CI% of
those CI would contain the
population mean and about
(100 – CI)% would not
contain the population mean.

21
21
#2
In 1989 A.J. Hackett started the worldโ€™s first commercial bungee
jumping site in Queenstown, New Zealand. In 2020 a random
sample of 49 bungee jumpers had a mean age of 26.7 years. Assume
the population standard deviation is known to be 3.5 years.
a) Check the conditions to find a confidence interval for the
population mean age of a bungee jumper.
b) Find and interpret a 95% CI for the mean age.
c) Holding all else the same, explain whether a 99% CI for the mean
age would be narrower, the same width or wider than the 95% CI?
Do not create another CI.
d) What is the business application here?
BUSINESS QUESTION: What is the mean age of a bungee jumper?
Example
This Photo by Unknown Author is licensed under CC BY-SA-NC
22
22
#2
This Photo by Unknown Author is licensed under CC BY-SA
Example solution
a) Random Sample Condition?
Satisfied as told random sample of 49 jumpers.
10% Condition? Not told if sampled without replacement.
As n = 49 we must assume that there are at least 490 bungee
jumpers in the population.
Normal or Large Enough Sample Condition:
As n = 49 > 30 we can use the Central Limit Theorem and
conclude that
๐‘‹เดค~ Normal.
The conditions are satisfied, we have normality.
As ฯƒ is known, we can use the Z tables.
https://pixabay.com/photos/bulldog-cute-easter-animal-dog-2952049/
23
23
#2
This Photo by Unknown Author is licensed under CC BY-SA-NC
Example solution
๐‘ฅาง ยฑ ๐‘ง๐‘๐‘Ÿ๐‘–๐‘ก ๐œŽ
๐‘›
= 26.7 ยฑ 1.96 ร—
3.5
49
= 26.7 ยฑ 1.96 ร—
3.5
7
= 26.7 ยฑ 1.96 ร—
0.5
= 26.7 ยฑ 0.98
= (25.72 years, 27.68 years)
b)
Interpretation: We are 95% confident the population mean age of a
bungee jumper lies between 25.72 and 27.68 years.

24
#2
c) Holding all else the same, the 95% confidence interval (CI) will be narrower
than the 99% CI due to the smaller Z critical value of 1.96 compared to Z critical
of 2.576, respectively. The 99% CI gives a wider interval with more confidence,
but less precision.
d) The confidence interval tells us that the population mean age for a bungee
jumper lies between a small range of 25.72 years and 27.68 years. To capture a
wider age group, a marketing campaign targeting younger and older individuals,
could be used to increase sales and profit.
Example solution
25
25
#2 Exercise
BUSINESS QUESTION: What is the average number of almonds per 30 gm bag of a healthy snack?
A company sells 40 gram bags of almonds as a healthy snack and wants to estimate the
number of almonds that are packed into each bag. A random sample of 36 bags was selected
from a production run and the number of almonds counted in each bag. The sample mean number
of almonds is 20.6. Assume the population standard deviation is known to be 1.7 almonds per bag.
a) What is the point estimate of the number of almonds per bag?
b) Check the conditions to create a confidence interval for ฮผ.
c) Find and interpret a 90% confidence interval for the population mean.
d) Say, now that 100 bags were randomly selected and assume that the mean was miraculously the
same at 20.6 almonds. Without calculating another confidence interval, explain what happens to
the width of the 90% confidence interval created in part c)?
This Photo by Unknown Author is licensed
under
CC BY-NC
27
27
#2 Understand interval estimation for ฮผ when ฯƒ is unknown
We can use the sample standard deviation (s), as we do not know the population
standard deviation (ฯƒ).
The confidence interval formula to estimate ยต when ฯƒ is unknown:
๐‘ฅาง is the sample mean
t
crit is the t critical value
s is the sample standard deviation, an estimate of ฯƒ
n is the sample size
ยฑ is read as โ€œplus or minusโ€, sometimes written as +/-
๐‘ฅาง ยฑ ๐‘ก๐‘๐‘Ÿ๐‘–๐‘ก
๐‘  ๐‘›
28
28
#2 No ฯƒ? No problem, just use s and the t-tables
We can use the sample
standard deviation (s) as an
estimate of ฯƒ.
Now, we are using the sample mean and
the sample standard deviation to
estimate ฮผ, so we have more variability.
We can no longer use Z.
We need a
new
distribution
called the
Studentโ€™s t
distribution
t distribution:
Family of t
curves that
depend on the
sample size

29
29
#2 Comparison of Z and t curves
Z t
0
t curve for n = 10
Bell-Shaped
t curve for n = 36
Symmetric
โ€˜Thickerโ€™
Tails
Standard Normal, Z
As the df โˆž, the Studentโ€™s t distribution Z, Standard normal distribution.
30
30
#2
โ€ข t -table row: degrees of freedom: df = n – 1 for CI for ฮผ.
โ€ข t-table column: 100 -๐ถ๐ผ% /100
2
is the area in the right tail of the t curve, and this area is denoted
in the subscript of โ€œtโ€ in the first row of the t-table.
โ€ข Read off the required t critical value, where the row and column intersect.
Reading t-tables for CI about ฮผ when ฯƒ is unknown
31
31
#2
Determine the t critical value for each of the following:
a) 95% CI and n = 10
Use row: df = n – 1 = 10 – 1 = 9
Column:
100 -๐ถ๐ผ% /100
2
=
100 -95% /100
2
= 0.025
Use column t
0.025
t critical = ยฑ 2.262
b) 99% CI and n = 10.
Use row df = 9 and column t
0.005
t critical = ยฑ 3.250
c) 90% CI and n = 64.
df = 63, use row 60 and column t0.05
t critical = ยฑ 1.671
Example
32
32
#2 Example
The council of a city wants to attract more shoppers to the
city centre by proposing the building of a new public carpark.
The council plans to pay for the carpark through parking fees.
They employed a consultant, who found a similar carpark, in
a similar city, and randomly sampled 44 weekdays. The
consultant found daily fees collected averaged $4326, with a
standard deviation of $1500. Assume the conditions are
satisfied.
a) Find a 90% confidence interval for the mean daily income
this new carpark is estimated to generate.
b) Interpret your confidence interval.
c) The consultant who advised the council on this car park
proposal, predicted that parking revenues would average
$4000 per day. Based on your confidence interval, what
do you think of the consultantโ€™s prediction?
eage.com.au/national/victoria/ooops-developer-fails-to-build-two-promised-levels-of-underground-parking-20151028-gkkhxv.html
33
#2
๐‘ฅาง ยฑ ๐‘ก๐‘๐‘Ÿ๐‘–๐‘ก ๐‘ 
๐‘›
= 4326 ยฑ 1.684 1500
44
= 4326 ยฑ 380.809
= ($3945.19, $4706.81)
a) Told to assume the conditions are satisfied. As ฯƒ is unknown, we must use the t distribution.
For a 90% CI, t
0.05, df = 43, use 40, t critical = ยฑ 1.684
b) We are 90% confident the population mean daily income of this new carpark will lie
between $3945.19 and $4706.81.
c) The consultants prediction of $4000 seems reliable as $4000 lies inside this 90% confidence
interval.
Example solution
34
34
#2 Exercise
Demand for pet puppies has increased with the onset of COVID-19 โ€“ the companionship of a
pet is comforting to many. In Australia, one of the most popular breed of dogs is a cavoodle
โ€“ a cross between a cavalier spaniel and a poodle. Of 25 recent cavoodle puppy litters, the
mean was 3.65 puppies with a standard deviation
of 1.56 puppies. Assume conditions are satisfied.
a) Find and interpret a 95% confidence interval.
b) What is the width of your confidence interval in part a)?
c) Holding all else constant, and without doing the calculations, would a 99% confidence
interval be narrower, wider or the same width as your confidence interval from part a)?
Explain.
BUSINESS QUESTION: What is the average number of puppies per litter for a cavoodle?
https://www.facebook.com/adlcavoodles/reviews/
36
#3 Interval estimation of the population proportion
https://photostockeditor.com/free-images/bungee
37
#3
Confidence interval formula to estimate the population proportion, p:
๐‘ฦธ = sample proportion of interest
๐‘žเทœ = 1 – ๐‘ฦธ
Z
critical = Z value related to the CI %
n = sample size
qp n
p z
critical
ห† ห†
ห†
๏‚ฑ ๏‚ด
Hints about CI for p:
โ€ข To estimate p, we only use Z
โ€ข Use the decimal form of the proportions
โ€ข Work to at least 3 decimal places
Interval estimation of the population proportion
pห† ๏‚ฑ z pห†(1pห†) / n
pห† ๏‚ฑ ME
38
#3 Conditions to check before creating a CI for p
This Photo by Unknown Author is licensed under CC BY-NCND
39
#3 Example
Owners of a start-up business want to open a market stall to sell their products.
They are trying to decide whether to accept credit card payments or rely solely
on cash. They took a random sample of 100 market customer purchases for
other stalls and found 70 of these were paid by credit card.
a) Describe in words what p and
๐‘ฦธ are, in the context of this example.
b) Check the conditions.
c) Find and interpret a 95% confidence interval.
Solution:
a) p = population proportion of market customers who pay by credit card
๐‘ฦธ = sample proportion of market customers who paid by credit card = 70
100
= 0.7, ๐‘žเทœ = 1 – ๐‘ฦธ = 0.3
b) Check conditions: Told random sample; must assume there are at least 1000 market customers in
the population;
๐‘›๐‘ฦธ = 100(0.7) = 70 > 10 and ๐‘›๐‘žเทœ = 100(0.3) = 30 > 10. Conditions satisfied; use Z.
c)
0.7 ยฑ 1.96 0.7ร—0.3
100
= 0.7 ยฑ 0.0898
= (0.6102, 0.7898)
Interpretation: The owners of the start-up can be 95% confident that
the population proportion of market customers who pay by credit card
lies between 61.02% and 78.98%.
This Photo by Unknown Author is licensed under CC BY-NC-ND
https://unsplash.com/@peterampazzo?utm_source=unsplash&utm_medium=referral
&utm_content=creditCopyText

40
#3
A journalist, for an adventure sports magazine, is writing an article on the
proportion of bungee jumpers who sustain an injury. He takes a random sample
of 200 bungee jumpers and finds 10 of these claimed to have sustained an injury
from their jump. Assume the conditions are satisfied.
a) Describe p and
๐‘ฦธ in the context of this exercise.
b) Construct and interpret a 90% Confidence Interval (CI).
c) Without calculating another confidence interval, what happens to the width of
your 90% CI from part a), if the sample size was increased to 400 but the sample
proportion is unchanged at 0.05? Briefly explain your answer.
Exercise
This Photo by Unknown Author is licensed under CC BY-NC-ND
42
Supplementary Exercises
โ€ข Students are advised that Supplementary Exercises to this topic may be found on the
subject portal under โ€œWeekly materialsโ€.
โ€ข Solutions to the Supplementary Exercises may be available on the portal under โ€œWeekly
materials “at the end of each week.
โ€ข Time permitting, the lecturer may ask students to work through some of these exercises
in class.
โ€ข Otherwise, it is expected that all students work through all Supplementary Exercises
outside of class time.

43
Extension
โ€ข The following slides are an extension to this weekโ€™s topic.
โ€ข The work covered in the extension:
o Is not covered in class by the lecturer.
o May be assessed.
44
44
Quick quiz: interpretation of a CI for ฮผ
A researcher has calculated a 95% confidence interval (CI) for the population
mean number of screens (smartphones, television, laptops, etc.) per
household to be (2.0, 5.4) screens.
Are any of the following interpretations of
this CI correct?
a) The probability that the population mean is greater than 1 is at least 0.95.
b) There is a 95% probability that the population mean lies between 2.0 and
5.4 screens.
c) If we were to repeat the experiment over and over, then 95% of the time
the population mean would fall between 2.0 and 5.4 screens.
d) We are 95% confident the sample mean lies between 2.0 and 5.4 screens
per household.
e) 95% of all households have between 2.0 and 5.4 screens.
This Photo by Unknown Author is licensed under CC BY-SA
This Photo
by Unknown Author is licensed under CC BY
45
45
Quick quiz solution
a) The probability that the population mean is greater than 1 is at least 95%.
Incorrect: as probability โ‰  confidence. The population mean is either in the CI, with a
probability of 1 or the population mean is not in the CI, with a probability of 0.
b) There is a 95% probability that the population mean lies between 2.0 and 5.4 screens.
Incorrect: as probability โ‰  confidence.
c) If we were to repeat the experiment over and over, then 95% of the time the
population mean would fall between 2.0 and 5.4 screens.
Incorrect: as 95% of the confidence intervals created cannot possibly have these exact
same values of (2.0, 5.4).
d) We are 95% confident the sample mean lies between 2.0 and 5.4 screens per
household.
Incorrect: the CI is to estimate the population mean. It had to know the value of the
sample mean to create the CI to estimate the population mean.
e) 95% of all households have between 2.0 and 5.4 screens.
Incorrect: the CI is to estimate the population mean number of screens, and is
not about the measurements of individual households.
This Photo by Unknown Author is licensed under CC BY
This Photo
by Unknown Author is licensed under CC BY-NC-ND
This Photo
by Unknown Author is licensed under CC BY-SA
46
46
Summarise factors affecting the width of a CI for ฮผ
โ€ข The margin of error (ME) determines the width of the confidence interval.
โ€ข Note: Me = half the width.
โ€ข Holding all else constant, if the sample size, n, is increased, the confidence interval
becomes narrower, and more precise. Why?
โ€ข Holding all else constant, if the percentage of confidence is decreased, the critical
value (either Z or t) is smaller and the confidence interval becomes narrower, and
more precise. Why?
โ€ข Holding all else constant, if the standard deviation decreases, the confidence interval
becomes narrower and more precise. Why?
๐‘ฅาง ยฑ ๐‘ก๐‘๐‘Ÿ๐‘–๐‘ก
๐‘  ๐‘›
๐‘ฅาง ยฑ ๐‘ง
๐‘๐‘Ÿ๐‘–๐‘ก
๐œŽ ๐‘›
47
47
Trade-off between cost, confidence and
precision.
โ€ข Increasing n, increases information. This decreases the width of
the CI, making the CI more precise, but at the cost of collecting a
larger sample.
โ€ข Decreasing CI %, decreases confidence but also decreases the
width of the CI, increasing precision of the CI.
โ€ข A balance must be struck between:
o cost
o confidence
o precision
Can we ever be 100% confident with a CI?
48
As this will give us the
minimum sample size needed,
we must always
round up to
the nearest integer.
n =
๐‘ง
๐‘๐‘Ÿ๐‘–๐‘ก 2 ๐œŽ2
๐‘€๐ธ2
n =
๐‘๐‘๐‘Ÿ๐‘–๐‘ก ๐œŽ
๐‘€๐ธ
2
๐‘ฅาง ยฑ ๐‘ง๐‘๐‘Ÿ๐‘–๐‘ก
๐œŽ ๐‘›
๐‘ฅาง ยฑ
ME
Determine the minimum sample size for a CI about ฮผ
Example:
If
๏ณ = 45, what is the minimum sample size
needed to estimate the mean within ยฑ 5
with 90% confidence?
Solution:
n =
๐‘๐‘๐‘Ÿ๐‘–๐‘ก ๐œŽ
๐‘€๐ธ
2
=
1.645(45)
5
2
= 219.19 โ‰ˆ 220
The minimum sample size needed is 220.

49
Illustration of confidence intervals for p
Say, a bowl contains 1,000 different coloured balls.
We want to estimate p, the population proportion of white balls in the bowl.
Forty students were each asked to do the following:
โ€ข Take a random sample of 25 balls from the bowl
โ€ข Calculate ๐‘ฦธ, the sample proportion of white balls in their sample
โ€ข Use their ๐‘ฦธ to create a 95% confidence interval for the population proportion
of white balls in the bowl of balls, p.
The next slide shows the forty confidence intervals created.
โ€ข The real population proportion of white balls in the bowl is p = 50%= 0.50
โ€ข How many CI did NOT include p of 0.50?
50
Illustration continued
Only one student
(number 34) had
a confidence
interval that did
not contain the
true (population)
proportion,
p of 0.5.
The student did
everything
correctly.

51
โ€ข The CI for the proportion can be rearranged
to find n
n =
๐‘
๐‘๐‘Ÿ๐‘–๐‘ก๐‘–๐‘๐‘Ž๐‘™ ๐‘ฦธ๐‘žเทœ
๐‘€๐ธ
2
Determine the minimum sample size for a CI about p
q n
p
p z
critical
ห† ห†
ห†
๏‚ฑ
Z critical is the Z value for the % CI
ME = margin of error in decimal form
๐‘žเทœ= 1 – ๐‘ฦธ , in decimal form
๐‘ฦธ is either:
o The estimated sample proportion provided by a previous study
o If there is no value of ๐‘ฦธ provided, use ๐‘ฦธ = 0.50 as this gives the
largest standard error and the largest minimum value of n.
Example: A researcher is estimating
the proportion of students who buy
their lunch every day. A recent survey
found the sample proportion to be
0.30. What sized sample, n, would be
needed to ensure a 95% CI has a
margin of error of 0.04?
n =
๐‘
๐‘๐‘Ÿ๐‘–๐‘ก๐‘–๐‘๐‘Ž๐‘™ ๐‘ฦธ๐‘žเทœ
๐‘€๐ธ
2
n =
1.96 0.3(1 – 0.3)
0.04
2
n = 22.455 2
n = 504.21 ๐‘ ๐‘ก๐‘ข๐‘‘๐‘’๐‘›๐‘ก๐‘ 
Minimum sample size needed is 505
students.

52
A researcher is estimating the proportion of students who buy their lunch every day.
What sized sample, n, would be needed to ensure a 95% CI has a margin of error of 0.04?
As ๐‘ฦธ is not provided, we must use 0.5, as this gives the largest minimum value of n.
n =
๐‘
๐‘๐‘Ÿ๐‘–๐‘ก๐‘–๐‘๐‘Ž๐‘™ ๐‘ฦธ๐‘žเทœ
๐‘€๐ธ
2
n =
1.96 0.5(1 – 0.5)
0.04
2
n =
1.96 0.25
0.04
2
n = 24.5 2
n = 600.25 students
Minimum sample size is 601 students.
Example
53
Things to note with finding n:
1. Round up not off (that is, here we round up for any decimal value).
Why? Because this is the minimum sample size required.
2. Remember that the sample size required does not depend on the
population size.
3. Calculating the sample size is part of the design of a survey and should be
done at the start of the survey process. The sample size is a guideline of the
minimum sample size required to give you the data you need.
4. Note: If doing a survey, the sample size, n is the number of respondents, not
the number of individuals surveyed.