Data visualization and descriptive statistics

1
1
STAM4000
Quantitative Methods
Week 2
Data visualization and
descriptive statistics
https://twitter.com/thesmartjokes/status/681927905073606656
2
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan
Business School pursuant to Part VB of the
Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further
reproduction or communication of this material by you may be the subject of copyright
protection under the Act.
Do not remove this notice.
2

3
on
e
#1
#2
#3
Create data visualisations
Distinguish between measures of
central tendency
Distinguish between measures of
dispersion
Week 2
Data visualisati
and descriptiv
statistics
Learning
Outcomes

4
#1 Create data visualisations
https://www.google.com/search?q=turn+chart+upside+down+comic&rlz=1C1CHBF_enAU841AU846&sxsrf=ALeKk031aVpVFZqBvapo95C5JwC7IKU8XA:1610532923293&tbm=isch&source=iu&ictx=1&fir=deEu733a1GTzsM%252CkE9Jb3TLpRkJ8M%252C_&vet=
1&usg=AI4_-kT88IfIf_dkQGI1tipICwu3u78KHQ&sa=X&ved=2ahUKEwjPr6rW1pjuAhW7yDgGHfODA-AQ9QF6BAgJEAE#imgrc=YQEFz4DyRQqgNM&imgdii=ZEF-JXVm8KeepM

5
https://www.google.com/search?q=cutest+cat&t
bm=isch&hl=en&chips=q:beautiful+cutest+cat,g_
1:beautiful:y5l6wMp0MCI%3D,online_chips:kitte
n&rlz=1C1CHBF_enAU841AU846&sa=X&ved=2ah
UKEwiHkLzl53uAhVZCLcAHQ8hCZgQ4lYoBnoECAEQIg&biw=
Do you like to draw diagrams when explaining something? 1466&bih=635#imgrc=r4ntESJIpg3H3M
When using Google maps for directions, do you prefer to watch
the map and mute the audio?
When meeting new people, do you find it easier to remember
faces, instead of names?
Do you use a mind map diagram with links and words to organize
and remember things?
Are you more of a
VISUAL thinker or a VERBAL thinker?
Count how many times you reply
“yes” to the following quiz questions:
Why
does
this
matter?
A picture
tells a
thousand
words ..

6
#1 Example of visualisations
Example of
charts
Categorical
Pie chart
Bar chart and Pareto chart
Quantitative
Pie chart
Histogram
Frequency
Polygon
Frequency
curve
Stem and Leaf Plot Ogive
Box Plot
7
Pie chart:
Label segments or use a legend.
Check segment size
Check segment values
Check categories are mutually
exclusive and collectively
exhaustive.
Check total value of pie chart:
o If frequencies, check totals
sample size.
o If relative frequencies, check
totals 1 or 100%
o Note: If the pie chart is for
quantitative data and
displaying numerical, check
totals to sum of values.
Charts for categorical data
Biotechnology
10%
Capital
Markets
10%
Diversifes
Banks
40%
Grocery Stores
10%
Home
Improvement
Retail
10%
Metals &
Mining
20%
Pie chart for top ASX 10 companies in Australia (%)
#1
8
More charts for categorical data
Bar chart Pareto bar chart
One bar per category
Bar height reflects frequency
Equal bar width
Gaps between bars
Sorted bars in ascending or descending
order
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Bar chart of top 10 companies from Australia (%)
#1

9
Histogram
One bar per class
Bar height may reflect frequency or
relative frequency
Equal class widths
NO gaps between bars. Why?
Following number scale.
Description of histogram:
General shape:
o symmetric (evenly balanced) OR
o skewed (tail on either end)
Peaks: number and position
Unusual features: gaps, multiple
peaks, no peak etc.
Common chart for quantitative data
12
5 0
10
15
(0, 5] (5, 10] (10, 15] (15, 20] (20, 25] (25, 30]
Frequency
Class (minutes)
Histogram of call wait times
#1
Example: The call centre of an electricity provider has
received a number of complaints from customers that the
call wait time is too long. The manager of the call centre
claims that most wait times are 15 minutes or less. To
investigate the complaints, a consumer group telephoned
the electricity provider 25 times and recorded the call wait
times . This histogram displays the data collected by the
consumer group.
E X C E L

1 2
0
3
7

10
Understanding the importance of shape
Bimodal
Multimodal
Uniform
Unimodal and symmetric
Positively skewed (or skewed to the right)
Negatively skewed (or skewed to the left)
How would
you
describe the
shape of
the
histogram
for the call
waiting
example?
#1
11
How do we
describe a data set?
We use descriptive statistics.

Shape
For a histogram
or frequency
curve:
Is there a single
peak or several
peaks?
oIs it symmetrical
or skewed?

 

Centre
•If you had to
pick a single
number to
describe all the
data, what
would you
choose?

 

Spread
•Since statistics is
about variation,
how dispersed is
our data?

 

Unusual features
•Are there any
gaps in the data
set?
•Is there more
than one mode?
If so, is there a
lurking variable?

#1
12
© 2010 Pearson Education
Example: These histograms compare the daily volume (number) of shares traded by
month on the New York Stock Exchange (NYSE) in one year, divided by January to
June and July to December. Histograms are OK for comparing two groups; box and
whisker plots (or boxplots) are better when comparing several groups. See the next
slide.
#1

Compare datasets with visualizations

13
© 2010 Pearson Education
Example
This chart of box and whisker plots compares the daily volume (number) of shares traded by
month on the New York Stock Exchange (NYSE) in one year. The months follow a calendar
year and are denoted by numbers. E.g.., 1= January
#1
14
© 2010 Pearson Education
From this visualization, we can ascertain the following:
March had the least variation overall; June and December had the greatest variation
overall.
May and November have the highest median sales traded; August had the lowest median
March had the smallest interquartile range; December had the largest interquartile range
March, May, June, July, September and November each had trading days with extreme
values.
All months had skewed distributions.
#1 Example continued
15
Box and whisker plot (boxplots)
Displays a five-number summary:
o minimum
o Q1
o median, Q2
o Q3
o maximum
Median shown inside box
Length of box displays interquartile range
Whiskers show data values considered usual
Shapes e.g., dot or asterisk, represent unusual data values (outliers);
o dot to represent values outside 1.5 IQR
o asterisk to represent values outside 3 IQR, from nearest quartile
#1
https://twitter.com/statsols/status/929006600664354816
16
Boxplot
https://lsc.deployopex.com/box-plot-with-jmp/
#1
17
General shapes of frequency curves and boxplots
Negatively
skewed
Unimodal
and
symmetric
Positively
skewed
#1
18
#2 Distinguish between measures of central tendency
http://methods.sagepub.com/book/testing-and-measurement/n4.xml

19
Population parameters and sample statistics
Population
parameters
•Measurements
based on the
entire data set.
Sample
statistics
•Measurements
based on a
sample of
data.
Notation
•Greek letters
for population
parameters.
•English letters
for sample
statistics.
https://www.causeweb.org/cause/resources/fun/cartoons/parameter-notation
#1

20
#2 What is the typical value for a data set?
https://nebusresearch.wordpress.com/tag/statistics/
21
#2
Modal
value:
most
frequently
occurring
value
Modal class:
the class(s) with
the highest
frequency, or
tallest peak(s)
in a bar chart or
histogram
Mode, Mo

•It can be found for both
categorical and quantitativ
data.

Advantage of
the mode:

•It’s use is limited to
descriptive statistics.
•It does not use all the value
in a data set.

Disadvantages
of the mode:

22
#2
A dataset with one
mode is unimodal.
E.g.. A sample of
latte prices ($):
5, 3, 6, 5, 4, 6, 5
Mo = $5
A dataset with
two
modes is bimodal.
E.g.. A sample of
espresso prices ($:)
4, 5, 6, 3, 6, 5, 6, 4, 5
Mo = $5 and $6
A dataset with
three
or more
modes
is
multimodal.
E.g.. A sample of
ice-coffee prices ($):
5, 8, 7, 6, 5, 9, 7, 6
Mo = $5, $6 and $7
A dataset with
no mode
is
uniform.
E.g.. A sample of
cappuccino prices ($):
5, 3, 4, 6
No mode
Number of modes
23
#2 Median, Me

•It is not influenced by extreme hi
or by extreme low values. Hence
when we have a skewed data set,
the median is usually the best
measure of central tendency.

Advantage of
the median:

•It does not use all the values i
data set.
•Only used in descriptive statis
•It is tedious to calculate manu
•Cannot find the median for
categorical data.

Disadvantages
of the median:
Median:
the middle
value
(midpoint) in an
ordered set of
numbers.

24
#2 Median, Me
If n Is ODD, the median is the middle
value in a sorted dataset.
E.g.. Sample of customer sales ($)
8, 12, 4, 10, 7
Sorted: 4, 7, 8, 10, 12
n = 5
Median = $8
If n is EVEN, the median is the
average of the two middle values in a
sorted dataset.
E.g.. Sample of customer sales ($)
8, 12, 4, 10, 7, 13
Sorted: 4, 7, 8, 10, 12, 13
n = 6
Median = (8 +10)/2 = $9
HINT: Sort data first
25
#2 Mean, μ or