Workshops and Assignment

Module 1 Workshops and Assignment Copyright © Jacob L. Cybulski
Assignment A1 Preview:
RM Studio Explore+Classify
Assignment A1 Preview:
RM Studio Explore+Classify
Indian company Bangalore Food Assist
(BFA) in association with Zomato restaurant
search and discovery site, asked you to
develop a data mining method of
determining if restaurants should provide
the following services to their patrons:
Booking a table;
Online meal ordering.
BFA provided you with a sample of 48,000
restaurant reviews, which include:
Restaurant name, its type and phone nos;
Address, location and neighbourhood;
Cuisine and meal types, menu when available;
Average meal cost for a couple;
Customer feedback, i.e. dishes liked, rate of the
visit, the number of votes cast, text of reviews.
BFA would like to get some preliminary
insight into the Bangalore restaurants. The
following questions are of interests to BFA:
A. What neighbourhoods have the most attractive
(cost vs rate) restaurants? and,
B. What should be the strategy for table booking
and online ordering for a new restaurant?
C. What should be the strategy for an already
established restaurant? (feedback available)
BFA wants you to cleanup and explore the
restaurant reviews, develop and evaluate a
classifier for the restaurant table booking
and online ordering, and minimise
11 erroneous classifications.
Tasks and Deliverables
Part LP1 (First Submission)
Exec: Briefly define your
problem in business terms.
Prep: Choose or calculate the
most suitable numerical and
nominal attributes for the task
(without review text). Justify.
Visualise and interpret
attribute characteristics.
Part LP2 (Final Submission)
Exec: Describe your solution
in business terms. Provide
recommendations. List
More Prep: Deal with duplicates, bad and missing values or
incompatible attribute types. Transform these attributes or create
the new ones as needed.
Rels (A): Use appropriate analysis and data visualization to
investigate relationships between attributes. Interpret results.
Model (B&C): Create and explain one or two classification
models, i.e. k-NN and Decision Tree. Report and justify their
properties. Investigate and deal with the class imbalance.
Evaluate (B&C): Evaluate, validate and test the models.
Solution (B&C): Explain step-by-step how your analytic processes
can be executed and results replicated, e.g. with new data.
Extend: Extend your work, e.g. undertake independent research
to use models and visualisations other than those discussed in
class (may include work in R, Python or other analytics platform).
The following mini-case study will be used in assignment A1.
Data (in UTF-8 encoding):
Flickr by Kirti Poddar (CC BY-NC-SA 2.0)
No MS Excel !