Task Task 1 (25 marks) Conside.

Task Task 1 (25 marks) Consider the following data set below which represents the assessment results of 40 students in a subject consisting of four assignments and final exam. Assignment-1, Assignment-2, Assignment-3, Assignment-4, Final_Exam ?,94,34,30,42 35,94,85,33,45 31,46,22,35,48 46,90,60,36,50 52,94,49,48,50 58,94,30,34,51 47,90,?,23,52 37,94,25,?,52 35,94,45,31,54 57,94,100,29,54 51,94,5,30,54 45,94,33,33,55 44,0,35,36,55 52,95,56,42,56 35,94,?,36,57 57,97,57,42,57 45,90,71,43,57 39,94,54,33,57 31,94,63,31,57 45,94,?,29,59 35,90,84,49,59 37,90,40,50,61 83,97,26,39,61 68,97,55,45,62 50,95,56,46,62 77,93,?,41,63 84,48,18,35,63 45,90,21,38,63 62,95,38,?,63 38,94,40,39,64 50,90,?,29,64 32,90,38,32,64 44,90,43,36,65 57,94,52,39,68 50,94,39,42,70 55,90,62,?,71 43,94,54,36,72 50,90,30,30,74 54,90,82,28,77 64,95,5,8,78 a) Create an ARFF file by using a text editor for this dataset and open the ARFF file in WEKA. (10 marks) b) Observe the summary data for the data set and the histograms for all attributes on the Preprocess tab page. Use the Visualize tab page to view the scatter plots between the variables of the data sets. Put a screenshot of the tab in your assignment. (5 marks) c) Apply the unsupervised Discretize filter to the Assignment-4 marks. Put a screenshot of the filter output in your assignment and make remarks on the data. (5 marks) d) Practice filling in the missing values for all columns in the Viewer window in Weka both manually and by using filters. Put a screenshot of the filter outputs in your assignment and make comments on what values are suggested by WEKA for the missing values? (5 marks) Task 2 (20 marks) In Weka, load the data set from weather.arff. Perform classifications using the following methods • OneR (10 marks) • NaiveBayesSimple (10 marks) For each method give a summary of the models produced and comment on their accuracy using confusion matrix and other performance metrics used in Weka. Task 3 (30 marks) Perform decision tree induction in WEKA on the soybean.arff data set using • J48 (10 marks) • NBTree (10 marks) • REPTree (10 marks) For each method give a summary of the tree and rules produced and comment on their accuracy using performance metrics used in Weka.

Leave a Reply

Your email address will not be published. Required fields are marked *