Advance Database & Big Data

 

PROGRAMME

 

 BSc (Hon) Computing and BEng Software Engineering Foundation Year
DATE  Week 15
MODULE CODE  SWE5204
MODULE TITLE  Advance Database & Big Data
ASSESSMENT TYPE  Coursework – Portfolio
WEIGHTING  100%

 

Learning Outcomes:

 

·         LO1: Evaluate new and emerging developments in database technologies
·         LO2: Compare and contrast multi-paradigm solutions to domain specific database constructs
·         LO3: Apply appropriate database concepts and techniques to solve given problems.
·         LO4: Demonstrate the application of appropriate Big Data tools for advanced analytics.

 

 

 

 

 

Assignment Description

 

Marks allocation

There are three sections in this assessment, with each section carries marks as follows:

Question 1: 10 %

Question 2: 10 %

Question 3: 80 %

 

 

Question 1 : [LO1 & LO2]                                                                                                                                                           [10 marks]

 

  • Discuss the characteristics (subject-oriented, integrated, time-variant, non-volatile and support of management decision-making process) that differentiate data warehouses from other database systems supported by examples. (5 marks)

 

  • Use ONE detailed example to discuss and analyse why a company can benefit from data warehousing. You are expected to use the characteristics discussed in 1(a) to detail your discussion and analysis. (5 marks)

 

 

 

Question 2 : [LO1 & LO3]                                                                                                                                                            [10 marks]  

 

  • Describe and discuss THREE of the FOUR main categories of NoSQL database: Key value store, Column store, Document store and Graph database, giving a definition and overview of the key characteristics of each category, including comparative advantages of each category, and giving an example of an appropriate application area.

 

Question 3 : [LO3 & LO4]                                                                                                                                                            [80 marks]  

 

For Big Data task, you need to perform the required task for the given data set, following a Big Data architecture framework (Lecture- Big Data project development process and its stages):

 

Acquisition: Although we are providing the location of the dataset, you need to research how the data was assembled.      (5 marks)

 

Storage: You are probably storing the data set in a local resource. So, for this point we are asking you to elaborate a brief What-if? Scenario and what you will need to do to store your data in a cloud-based system.                                                                                                 (5 marks)

 

Analysis: This is the core of your assessment. You need to perform a full data analytics framework (based on the lectures and tutorials):

3.1.  Data Wrangling: Follow the recommendations from the lectures and the Tutorial. You must generate a clean version of the dataset.                                                                                                                                                                                (20 marks)

3.2.  Descriptive analysis: You are expected to fully describe your data as we did in the tutorials. Remember that you are answering what happened question. Please provide appropriate visualisations according to the data you are describing. You must include, as a minimum, a histogram and a scatter or density plot.                                                                                                  (20 marks)

3.3.  Diagnostic analysis: Here you should try to answer the why question. For this point, a regression analysis (based on a scatter plot for example) would be sufficient. As before, you must support your analysis with the adequate plots.                               (20 marks)

 

Action: You need to provide a sensible recommendation based only on the assigned task and your analysis.                                    (10 marks)

 

General considerations:

 

Please be aware that each plot should be fully described in your assessment. However, the plot itself, must be clear enough to tell a history by itself. Remember that when dealing with text data, you must trim leading and trailing blanks before conducting any analysis.

In the following list you will find four points for dataset: Name of file, source of file, the required task, and the suggested action

Under tutorials, you will find a list of examples of possible data descriptive tasks that you could perform for the dataset.

 

arXiv Dataset:

 

  1. File: Look at source
  2. Source: Download the latest version from: https://www.kaggle.com/Cornell-University/arxiv (This is a large json file ~2.81 GB). To deal with json files in python, please refer to the working JSON tutorial.
  3. Task: Create a new dataset in csv format with only records containing comments and without the word COVID in the title. This new dataset will have only 5 columns: id, title, comments, journal-ref and categories. Using this new dataset, report if there is a relation between journal-ref and categories; construct a ranking for all the words in the titles.
  4. Action: Based on the ranking of words, pick the top ten and recommend a journal for each word. For example, if the top word is “human”, you should rank journals in terms of articles with “human” in the title and then pick the one at the top.

 

Note:

After you have completed the challenge, we would like you to present your findings as a report. This is a  .docx/.pdf file report and just walking us through code and plots.

 

 

You need to submit the Q3 – implemented code in Jupyter Notebook “.ipynb” file & the report. Answers to Q1 & Q2 in PDF/Word format.

 

 

 

 

 

General Assessment Guidelines for Written Assessments Level HE5

  % Relevance Knowledge Argument/Analysis Structure Presentation Written English Research/ Referencing
Class 1

(Exceptional Quality)

85-100% Directly relevant to title. Expertly addresses the assumptions of the title and/or the requirements of the brief. Demonstrates an exceptional knowledge/understanding of theory and practice for this level through the identification and critical analysis of the most important issues and themes. Makes exceptional use of appropriate arguments and/or theoretical models. Demonstrates some distinctive or independent thinking. Presents an exceptional critical analysis of the material resulting in clear, logical and original conclusions. Coherently articulated and logically structured. An appropriate format is used. The presentational style & layout is correct for the type of assignment. Effective inclusion of figures, tables, plates (FTP). An exceptionally well written answer with standard spelling and grammar. Style is clear, resourceful and academic. Sources accurately cited in the text. An extensive range of contemporary and relevant references cited in the reference list in the correct style.
Class I

(Excellent Quality)

70-84% Directly relevant to title. Addresses the assumptions of the title and/or the requirements of the brief. Demonstrates an excellent knowledge/understanding of theory and practice for this level. Demonstrates the ability to identify and critically appraise the most important issues and themes. Makes creative use of appropriate

arguments and/or theoretical

models.

Presents an excellent analysis of

the material resulting in clear,

logical conclusions.

Coherently articulated and logically structured. An appropriate format is used. The presentational style & layout is correct for the type of assignment. Effective inclusion of figures, tables, plates (FTP). An excellently written answer with standard spelling and grammar. Style is clear, resourceful and academic. Sources accurately cited in the text. A wide range of contemporary and relevant references cited in the reference list in the correct style.
Class II/i

(Very G o od Quality)

60-69% Directly relevant to title. Addresses most of the assumptions of the title and/or the requirements of the brief. Demonstrates a very good

knowledge/understanding of

theory and practice for this

level through the identification

and analysis of key issues.

Uses sound arguments or

theoretical models. Presents a clear

and valid analysis of the material in

the main with clear, logical

conclusions.

Logically constructed in the main. An appropriate format is used. The presentational style & layout is correct for the type of assignment. Effective inclusion of FTP. A very well written answer with standard spelling and grammar. Style is clear and academic. Sources accurately cited in the text and a wide range of appropriate references cited in reference list in the correct style
Class II/ii

(Good Quality)

50-59% Generally addresses the title/brief, but sometimes considers irrelevant issues. Demonstrates a good knowledge/understanding of theory and practice for this level through the identification and critical appraisal of some key issues, themes and questions. Presents largely coherent arguments. Evidence of attempted analysis and critical evaluation, with some descriptive or narrative passages. Conclusions are fairly clear and logical. For the most part coherently articulated and logically structured. An acceptable format is used. The presentational style & layout is correct for the type of assignment. Inclusion of FTP but lacks selectivity. Competently written with minor lapses in spelling and grammar. Style is readable and academic in the main. Most sources accurately cited in the text and an appropriate reference list is provided which is largely in the correct style.
Class III

(Satisfactory

40-49% Some degree of irrelevance to the title/brief. Superficial consideration of the issues. Demonstrates an adequate knowledge/understanding of theory and practice for this level. An attempt is made to critically appraise some key issues. Presents basic arguments, but focus and consistency lacking in places. Issues are vaguely stated. Descriptive or narrative passages evident which lack clear purpose. Conclusions are not always clear or logical. Adequate attempt at articulation and logical structure. An acceptable format is used. The presentational style & layout is largely correct for the type of assignment. Inappropriate use of FTP or not used where clearly needed to aid understanding. Generally competently written although intermittent lapses in grammar and spelling pose obstacles for the reader. Style limits communication and is non-academic in a number of places Some relevant sources cited. Some weaknesses in referencing technique.
Borderline

Fail

35-39% Significant degree of irrelevance to the title/brief. Only the most obvious issues are addressed at a superficial level and in unchallenging terms. Demonstrates weaknesses in

knowledge of theory and

practice for this level, with poor understanding of key issues.

Limited argument, which is descriptive or narrative in style with little evidence of analysis. Conclusions are neither clear nor logical. Poorly structured.

 

Lack of articulation. Format deficient.

For the type of assignment the presentational style &/or layout is lacking. FTP ignored in text or not used where clearly needed. Deficiencies in spelling and grammar makes reading difficult. Simplistic or repetitious style impairs clarity.

Style is non-academic.

Limited sources and weak referencing.
Fail <34% Relevance to the title/brief is intermittent or missing. The topic is reduced to its vaguest and least challenging terms. Demonstrates a lack of basic knowledge of either theory or practice for this level, with little evidence of understanding. Inadequate arguments and no analysis.

Conclusions are sparse

Unstructured. Lack of articulation. Format deficient For the type of assignment the presentational style &/or layout is lacking. FTP as above. Poorly written with numerous deficiencies in grammar, spelling and expression. Style is non-academic. An absence of academic sources and poor referencing technique.