Workshops and Assignment

Module 1 Workshops and Assignment Copyright © Jacob L. Cybulski
Assignment A2 Preview:
RM Cluster+NN+Text
Assignment A2 Preview:
RM Cluster+NN+Text
Australian Wine Importers (AWI) asked you
to develop a method of estimating rating
(points) of imported wines based on their
text and structured attributes.
AWI provided you with a sample of 130,000
wine tasting results, which include:
Wine “title” (name + vintage);
Country, Province and Region;
Variety and Winery;
Description and Designation;
Price (US$)
However:
Taster name and Points to be excluded.
In the future, AWI would like to get the
preliminary insight as to the wine quality
based on social media reviews. The
following questions are of interests to AWI:
A. What group of wines the new wine is
most similar to, and why / how?
B. What is the estimated rating of the
newly introduced wine to the
Australian market? (fractional ratings
permitted)
AWI wants you to cleanup and explore wine
tasting data, develop and evaluate a wine
rating estimator, and minimize the
estimation error in the process.
1 1
The following mini-case study will be used in assignment A2.
Data: www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip
Source: https://www.kaggle.com/zynicide/wine-reviews
Part LP4
Exec: Create a problem
definition and write a brief
spec of its possible solution.
Model: Create at least these two models, i.e. (M1) decision trees
and (M2) neural nets. Ensure your solution considers three types
of models, which are based on (A1) structured data only, (A2) text
data only, (A3) a mix of structured and text data. Describe
operators properties.
Optionally create model ensembles.
Optionally utilise clusters and deal with anomalies, use PCA in
their visualisation.
Optionally answer question (B).
Validate & Optimise: Optimise the models’ performance to
minimise overall error in ratings. Compare performance of all
models (including ensembles), using R2, correlation and others.
Visualise optimisation results.
Optionally use grid optimisation.
Solution: Create a quality deployment process. Score the best
model, and demonstrate how to apply the model to new data.
Extend: Conduct research and use novel data mining approaches.
Tasks and Deliverables
Part LP3
Exec: Briefly define a
problem in business terms.
Rels: Perform cluster
analysis of wines’ text.
Conduct segmentation
analysis, including both
text and structured data.
Identify relationships in
data. Visualise and interpret
results. Answer the
management question (A).