BUS708 Statistics and Data Analysis
Trimester 2, 2019
1 OVERVIEW OF THE ASSIGNMENT
This assignment will test your skill to collect, summarise and present data using Microsoft Excel and/or
other approved tools. It will also test your understanding to interpret the output produced by the
software to solve business problems.
You will need to use the dataset provided as well as collecting your own dataset and produce a
numerical and graphical summary. You will need submit an Excel file following the requirement as
2 TASK DESCRIPTION
There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.
Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you.
This dataset is edited from Google Play Store Apps dataset provided by Lavanya Gupta that
can be obtained from Kaggle (https://www.kaggle.com/lava18/google-play-store-apps) under
the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/3.0/. The number of cases from the original dataset
has been reduced and all NaN values have been removed.
Dataset 2: You will need to collect a dataset via survey to answer the question given in
Section 6 below. You will need to collect data from international students, between 3 – 4
different country of origin with at least 5 students per country.
Both datasets should be saved in an Excel file (see Submission Requirement on the next page). All data
processing should be performed in Excel or Statkey (http://www.lock5stat.com/StatKey). Specific
instruction as to which tools should be used for each section will be given during tutorials.
Your tasks are to provide a description for each dataset in Section 1, and to answer the following
research questions given in Section 2 to Section 6 using dataset 1 or dataset 2 as indicated in each
1. Section 1: Description about Data
a. Dataset 1: Give a short but clear description about this dataset. Is this primary or
secondary data? What are the cases? What are the variables and their types?
b. Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether
your sample is biased). Is this primary or secondary data? What are the variables and
2. Section 2: Are most google play apps free?
Using Dataset 1, describe the proportion of phone apps which are free. You need to
provide both numerical summary as well as graphical display that easily shows the
proportion of the free apps.
3. Section 3: What is the price distribution of paid apps after an iteration of outlier removal?
Using Dataset 1, perform one iteration of outlier detection on the price of paid apps
using the method described in the lecture notes. After removing those outliers,
describe the price distribution of paid apps using both numerical and graphical
summary which shows the remaining outliers, if any.
4. Section 4: Is there a difference in prices among paid apps from the categories
Communication, Games, and Tools?
Using Dataset 1, describe the distribution of paid apps from the categories
Communication, Games and Tools. You need to provide both numerical summary as
well as graphical display which shows the outliers, if any.
5. Section 5: Is there any relationship between Rating and Review?
Using Dataset 1, describe the relationship between the rating of an app and the
number of reviews it receives. You need to provide both numerical summary as well
as graphical display.
6. Section 6: Do international students from different countries tend to use different
Using Dataset 2, describe the relationship between a student’s country of origin and
the main communication app the student is using (e.g. WhatsApp, Fb Messenger,
WeChat, LINE, Viber, etc). You need to provide both numerical summary and
3 SUBMISSION REQUIREMENT
Deadline to submit the report: Week 7, Sunday 1 Sep 2019, 23:59
You need to submit an Excel file to Turnitin which consists of:
1. Dataset 1 and Dataset 2, each in separate worksheet, with appropriate sheet name
2. Numerical & graphical summary for each section, each section should be answered in
separate worksheet with appropriate sheet name (e.g. “Section 1”, “Section 2”, etc)
Arrange the worksheets starting with Dataset 1, Dataset 2, Section 1, Section 2, etc.
4 MARKING CRITERIA
Students are advised to read the marking rubric provided on Moodle. Detailed marking criteria
based on this rubric will be provided during tutorial week 6.
5 DEDUCTION, LATE SUBMISSION AND EXTENSION
Late submission penalty: – 5% of the total available marks per calendar day unless an extension is
approved. This means 0.75 marks (out of 15 marks) per day.
For extension application procedure, please refer to Section 3.3 of the Subject Outline. Please do
NOT email the lecturer or tutor to seek an extension, you need to follow the procedure described in
the Subject Outline.
Please read Section 3.4 Plagiarism and Referencing, from the Subject Outline. Below is part of the
“Students plagiarising run the risk of severe penalties ranging from a reduction through to 0 marks for a first
offence for a single assessment task, to exclusion from KOI in the most serious repeat cases. Exclusion has
serious visa implications.”
“Authorship is also an issue under Plagiarism – KOI expects students to submit their own original work in both
assessment and exams, or the original work of their group in the case of a group project. All students agree to a
statement of authorship when submitting assessments online via Moodle, stating that the work submitted is
their own original work.
The following are examples of academic misconduct and can attract severe penalties:
• Handing in work created by someone else (without acknowledgement), whether copied from another
student, written by someone else, or from any published or electronic source, is fraud, and falls under
the general Plagiarism guidelines.
• Students who willingly allow another student to copy their work in any assessment may be considered
to assisting in copying/cheating, and similar penalties may be applied. ”
BUS708 Statistics and Data Analysis