A Review on Security and Privacy Challenges of Big Data Manbir Singh,MariiaTalalaeva, Malka N. Halgamuge*, Ali Syed and GulluEkici School of Computing and Mathematics, Charles Sturt University, Melbourne, Victoria 3000, Australia [email protected], [email protected], [email protected] Abstract -Big data has a growing number of confidentiality and security issues. New technology doubtlessly brings people benefits, privileges, convenience and efficiencies. At the same time, technological advances are accompanied with threats that can pose dangerous privacy risks. Privacy of data is a source of much concern to researchers throughout the globe. A question that remains unanswered is what exactly can be done to resolve confidentially and privacy issues of big data? To answer some questions, content (data) for this chapter has been collected to analyze from 58 peer reviewed articles from 2007 to 2016 in order to find resolving answers for Big Data confidentiality issues. The documents range from different industries that include healthcare, finance, robotics, web applications, social media, and mobile communication. The selected journal articles were used to make comparative analysis of security issues in different areas to cast solutions. This chapter consists of four main parts: introduction, materials and method, results, discussion and conclusion. This inquiry aims to find different security issues of big data in various areas and gives solution by analyzing results. The results of the content analysis suggest that the internet applications and financial institutions are dealing with specific security problems, whereas social media and other industries are dealing with confidentiality issues of sensitive information that have heightened privacy concerns. Both these issues are addressed in this study, as retrieved results from the data, highlighted gaps that can further be researched for development. The method used to gather data for this chapter is through the analysis of studies that deal with a particular confidentiality issue. After the analysis and evaluation, suggestions that can confront confidentiality issues are displayed by using a different algorithm method. This research has addressed gaps in the literature by highlighting security and privacy issues that big companies face with recent technological advancements in corporate societies. By doing this, the research could shed revolutionary light on issues of big data and provide futuristic research directions to solve them. Index Termsâ€” Big Data, Security, Privacy. I INTRODUCTION A significant portion of Information technology research efforts goes into analyzing and monitoring data regarding events on the servers, networks and other connected devices. Big data is a fairly new concept in modern technological world. There has been an increasing usage of big data, as the problem of security has become very important (Kim, Kim & Chung, 2013). This chapter covers different aspects of big data security, in particular, challenges related to big data variety, velocity and volume. The amount of sensitive information that needs to be protected is constantly increasing (Islam, 2014). However, in the era we live in, information is required to be protected from hazards and this applies to any profession. Insufficient protection can bring various security challenges (Faulkner &Kritzstein& Zimmerman, 2011; Zhang & Dou & Pei & Nepal & Yang & Liu & Chen 2015;Abou-Tair&Berlik; &Kelter 2007). The notion of big data came to use not long time ago, as it relates to big amounts of information that companies produce and need to store. This information is then used to analyze and predict increases and decreases in future sales by analyzing annual trends. Growing number of informative data is subjected to storage issues as there are no designs yet established. In fact, any given amount of information that is generated is an issue of security and privacy. Confidentiality of sensitive information becomes a perplexing issue for companies if they do not take considerable amount of time, effort, and resources to deal with confidentiality issue. The area of Big Data is used to manage a great number of datasets, as all the vast amount of data is often not structured and have been stored from different sources (Ferretti&Pierazzi&Colajanni 2014; Chang & Tsai & Lin2009). Traditional access control mechanisms to ensure privacies are insufficient in recent times with the growth of demand that bring the need of a fine granular access control mechanism to make sure every aspect of privacy is reflected. This framework is called an ontology-driven XACML context (Abou-Tair&Kelter 2007). On the other hand, providing privacy in cloud is much more complex than one thinks (Huang & Du, 2014). The majority of data preservation techniques are targeted at small levels as they often fall short; an algorithm is designed with MapReduce to gain high scalability by performing computing in parallel. This method is called local record anonymization (Huang & Du 2014). In this case, the hybrid cloud is a very different approach and difficult to implement. The idea is to separate the Sensitive data from non-sensitive data and store them in different trusted clouds. This method of isolation is best suited for processing the image files in an entirely different approach, as it is used to deal with data at rest (Zhang & Dou & Pei & Yang & Chen, 2015). A multilevel identity encryption method is used both at the file level and at block level to satisfy data protection. This process helps to leverage cloud provider because of transparency (Raghuwanshi&Rajagopalan 2014). Preserving privacy of information is one big issue; nonetheless providing security to IT is another huge matter, but matter need to be taken into hands. There are a lots of security risks in big data, as the main one is: Privacy leakage, this is one of the most dangerous issues that has already caused many problems for multinational companies (Abou-Tair&Berlik&Kelter 2007). Therefore, a whole range of data needs different computational techniques to make it secure and safe. The first step in cloud security is to ensure the entry points. This helps to detect possible attacks (Pham & Syed & Mohammad &Halgamuge 2010), and alert users to use the instruction detection system (Tan & Nagar & He & Nanda (2014). Encryption is another way of making data safe in the local area network, VPN encryption as this is used to safeguard data (Mengke&Xiaoguang&Jianqiu&Jianjian 2016). On the other hand, in the case of web application, randomization is based Random4 encryption algorithm is similarly used (Tan & Nagar & He & Nanda, 2014). This prompts Big cooperates to use MuteDB architecture, where they are incorporating data encryption into, key management, authorization, and authentication from a new MuteDB architecture. This architecture assures scalable solution to guarantee the confidentiality of information in the database (Ferretti&Pierazzi&Colajanni&Marchetti 2014). Security of medical data also falls into the same category, as both cloud-based technologies and attribute-based encryptions are used for storage and retrieval (Syed &Teja 2014). Another way is to use a pocket-sized computer, which is called Raspberry Pi. This computer makes sure that regional data is collected and kept isolated (Feng &Onafeso& Liu 2016). IIDATA PROCESSING METHOD The Knowledge Discovery from Data (KDD), which is often treated, as a synonym for â€˜Data miningâ€™ is a method used to discover information from data to avoid leakage. Every day millions of bytes of data generated throughout the world, and through analyzes of trends, makes researchers understand the needs of organizing data. This process in return allows companies to grow and remain in the competitive market by identifying seasonal trends and launching products in peak seasons. Therefore, the research in the area is signif
icant and requires development. There are usually three steps involved in the method of KDD, which is performed in an iterative way. They are discussed as below: Step 1. Data processing: Data processing method is a step that selects inconsistencies of missing data fields and removes, and reintegrates them into the data pool. It is presented in a form so that it can be read quickly, to generate and reach potential results. Step 2. Data Transformation:This step is to transfer data into forms appropriate for mining. The Data is not presented in its proper form, and therefore, it must be sorted out to represent a type that can generate some useful information. Step 3. Data Mining: In data mining, various methods are employed to extract the information from the data, as algorithms are used to extract information from the data pool. Step 4. Pattern Evaluation:After the data is extracted, patterns are then evaluated to obtain knowledge on trends. PRIVACY Every day, large amounts of data is generated and processed in an array of industries. Thus, the privacy of data can be established by methodical proceeedures. In fact, there are four different types of steps involved: (i) Data provider, (ii) Data Collector, (iii) Data Miner, and finally (iv) decision makers, as these are people who are involved in the processing of data that is collected and derived from knowledge of data groups. Each one has different challenges to privacy protection, as these are discussed below. Approaches to privacy protection by: A. Data Provider: Data providers can provide data voluntarily according to the demands of the Data Collector. The data collectors can retrieve data from the providersâ€™ of customersâ€™ daily activities. However, there are many ways to limit the data collectorsâ€™ access to this data and this could be done in several ways. Internet companies now have a strong motivation to track usersâ€™ movement over the Internet to ensure that the valuable information can be extracted from the data produced by usersâ€™ online activities. These can block the advertisements on the sites, and also kill the script, For example, AdBlock, Encryption tools are used to encrypt data and transfer them into Cypher-text which is not in a readable form and so it can be transferred in a safe way. B. Data collector: The original data retrieved from data provider generally obtains sensitive information, and if experts do not take specific precautions before passing into data miner, the sensitive information can be disclosed to the public, and confidentiality could be a troubling issue for companies. However, this can be solved byreplacing some value with a parent value, this method is a good way to hide sensitive information. Permutation de-associates the relationship between quasi-identifier and numerical attributes by dividing the data sets into groups and shuffling the information among groups. Perturbation can also operate data with some false value to hide it from collectors. This includes adding noise, swapping data, and generating synthetic data. C. Data Miner: The data miner uses an algorithm to obtain data from the data collectors. However, there are two types of privacy issues that can risk confidentiality in this process. Firstly, when data is directly observed, the information could be leaked. At times, even the data mining results may also leak private information. Some approaches are then helpful, such as, the Heuristic distortion approach that helps to resolve how to select the appropriate data sets for data modification. This method works by replacing certain attributes of data items with a particular symbol. Probabilistic distortion approach distorts data through random numbers generated from a predefined probability distribution function. The reconstruction based approach generates a database from scratch that is compatible with a set of non-sensitive rules. D. Decision Maker: The ultimate goal of data mining is to provide information to the decision makersâ€™, however, to achieve its objectives it is compulsory to meet the confidentiality rules and regulation to protect people. At first glance, it seems that the decision maker has no responsibility, nonetheless in actuality they must have a duty of care. If the results are disclosed to competitors, the policy makers will suffer the loss, as the openness, freedom and anonymity of the Internet, as Data Provenance poses great challenge for seeking the provenance of information. Also, the decision maker must look at five aspects of information including authority, accuracy, objectivity, currency and coverage. SECURITY Security has always been an issue however in recent times with the growth of conventional security mechanisms that are used to secure small scale or static data are no longer adequate, as far as big data is concerned. There are often loopholes in the system that allows intruders to exploit services. There are some security challenges, which is prevalent in the vicinity. The majority of organizations are dealing with sensitive information at the threat of data theft. Table 1 displays types of security challenges and threats that need attention. INTERNET OF THINGS In society today, technological gadgets are the new modern day slaves. The public uses the word â€œinternetâ€ to do shopping, reading, paying bills, basically running all their errands. Existence is at ease through novel inventions in the perception of multi purposeful Internet. There has been countless research done to assess the daily use of the internet by people and predictions have been made for future use of the internet. Despite the usefulness of the internet, it is still an under researched area and facts are insufficient in the area. Therefore, this paper aims to go through 12 research papers to analyze emerging themes to seize research development. The correlation among these 12 research papers will draw data to analyze emerging themes of internet usage. These discoveries will allow gaining information to futuristically look in the direction of automated systems. This will also allow looking at focal benefits in using technologically advanced plans. The â€œInternet of everythingâ€ is the greatest significant idea established in the real world to be used by individuals globally. Novelties under this motif is accommodating in the pursuance to ease lives of people in their daily dealings. Internet development has been gradually slowly, nonetheless momentarily improvements are at a noteworthy frequency. The internet emerged in 1969 with the ARPANET (Advance Research Project Agency Network) with few web sites (Hussain, 2017). Currently, it is figured that, in 2020 around 20 billion gadgets will be joined to the internet. More innovative work will be on how to connect people who are unconnected to the internet. Many new researches have undertaken studies in exchange to find futuristic measures of internet usage. This pursuance to draw data from 12 researches in the same topic under different areas will prove the connection between each research that revolutionizes in connecting the unconnected to the internet. This document can be referred to once a problematic issue is encountered in a certain area under the topic and knowledge can be issued from it. Cloud computing also known as fog computing, is an essential area once the topic is connectivity between each and every device in household (Johnson, 1998). Cloud computing emerges in a superior haste, in several ways, that can be used to connect the unconnected. Fog computing and cloud computing is utilized as an average construction of the â€œInternet of Everythingâ€. â€œInternet of Everythingâ€ comprises of numerous crucial notions, that is imperative in learning and teaching (Kopetz, 2011). However, the core variance amid the â€œInternet of Everythingâ€ and â€œInternet of thingsâ€ that they are both physical substances physically present. However, â€œInternet of Everythingâ€ is a method outside this as it is amongst the utmost extensively documented concerns as Internet associated devices are
liaised through an Internet reminder, for example, consequently alters the temperature on a Nest indoor controller. People might feel that they can operate the â€œInternet of Everythingâ€ to examine information from these devices. However, considering that, even today, the Internet pages have cyber spaces that are intangible perhaps we are unaware that gadgets are not only made from physically tangible gadgets nonetheless also have intangible cyber ones too. This is evident as of the world’s supreme site, “Google”, is not a physical tangible tool. It occurs in a cyber spaces amid the wires. This is the legal administrations that is used each day, for example, DropBox or Instagram is also virtual spaces. This is the main expansion of the â€œInternet of Everythingâ€; it does the administration that you cannot put a finger on it and say that it exists in a physical space. Although, the Internet is furthermore covered of significantly greater number of things, however we are only using the online administrations utilized. This shows that people need to understand that the Internet consists of information streams and relations of associations. Similarly, one can claim that the Internet contains clients that are the majority of the general population as associates. The â€œInternet of Everythingâ€ links up different concepts into one secure idea. This method allows devises to communicate with each other. It is obvious that the â€œInternet of Thingsâ€ might as well be called a rail street line, which includes the tracks and the associations, while the â€œInternet of Everythingâ€ is the majority of the trains, ticket machines, staff, clients, climate conditions, and so forth. This research, elaborates ideas on whether and how the theme of Internet of Things and Internet of Everything is developing, as this is a fairly new area that comes under these two themes. What are the major developments done so far? will be figured out in this research. The gaps will also be identified and will be pointed out and clearly explained. III MATERIAL AND METHOD This chapter has categorized different security and privacy issues of big data according to types of issues and some parameters. The data that has been collected through a content analyses, which is retrieved by content analysis of 24 publications from 2007 to 2016 and found answers and analyzed the data of each context in different industries including healthcare, finance, robotics, web applications, social media, and mobile communications. An agile approach is used for this project because data that depends on IT has a massive computation (â€œThe 5 Methodology Milestones for Big Dataâ€, 2015). Data about security issues and solutions was collected from different journal articles, and data was surmized through thorough analysis of the collected data, then it was structured, and analyzed. The data is displayed in a table for better comprehension that shows the area of security issue, solution, algorithms and then used for solution and also general and technical remarks accompany. After the analyzes of data, the results and conclusion were drawn objectively. Figure 1: Graphical abstract of the analysis: Security and Privacy Challenges of Big Data A. Data Collection Method: This analysis has pooled the data of the published articles from 58 peer-reviewed scientific studies which were published, in between the years of 2007-2016. The raw data presented in Table 3 specifies the variables used for the analysis including application, security issues, and algorithm. B. Data Analyses Method: The data is analyzed by categorizing the collected data and displaying emerging themes in a table. The data sets include attributes such as volume of data, application area, and issues that are prevailing. The methodology adopted allows solving these issues, as the algorithm selected is merely for this purpose. Table 1: Different Security Challenges and Solutions Security Challenges Solution for Data Security 1. Real-Time Monitoring: Real-time monitoring has always Layered Protection: In computer hardware, the been a big issue on account of the number and frequency security of information system is formed through of security alerts that generates. It is now easy to rectify expansion of layers. The securities of outer layers any loopholes or dangers nonetheless it takes lots of effort rely on the security of inner layers. The more the to find such threats (Kizza 2015). layers the better security is (Kuhn & Walsh & Fries2005). 2. Granular Audits: Protection of Different Domains: In case the real time monitoring system, it does not DNS can be divided into local region, network capture the attacks, therefore, audits are needed (Lee et perimeter, network transmission, and infrastructure. al.) Therefore, different technologies are used in various procedures to secure areas in order to establish distributed security system Stouffer Falco &Scarfone 2011). 3. Secure Computations in Distributed Systems: Hierarchical Protection: In this system, parallel computation and storages are used Since the importance of the same information is to process massive amounts of data. Securing the mapper different in different institutes. Thus hierarchical and data in the presence is not a trusted mapper and protection is needed, and in this case, different becomes a primary concern (Montlick, 1996). access control measures are used so that a particular user that covers the only specific parameter (Zissis, &Lekkas, 2012). 4. Secure Data Storage and Transactions Logs: Time-Sharing Protection: Data and transactions are stored in multi layers. Moving Information security in big data is a dynamic data manually among levels is not an issue, nonetheless process. Taking time into consideration of securing given the amount of data that is generated; auto tiring for of Big Data can be incredibly enhanced Vashist, management is needed. Nonetheless it does not keep track 2015). where the data is stored. Thus maintaining 24/7 availability is a big issue (Montlick, 1996). 5. Endpoint Evaluation: 3KDEC Algorithm: Many large organizations require data collection from A symmetric key block encipherment algorithm is various sources. A key challenge here is to validate the used to present the practical solution to the problem input, and this indeed is the validation and filtering of data where numeric data is converted to alphanumeric which can be daunting as the challenges are posed by type and thus encrypted data is not stored in untrusted data sources (Goel& Hong 2015). existing numeric fields (Kaur &Dhindsa& Singh, 2009). Table 2: Description of Algorithms and other Key Terms Used Algorithm/ Key Terms Description MapReduce (Zhang & Dou & Pei & Nepal & MapReduce Programming model is for generating and processing data sets with distributed and parallel algorithm on clusters. Yang & Liu & Chen 2015). AES (Huang & Du AES (Advanced Encryption Standard) algorithm is a symmetric block cipher used to protect the 2014) sensitive and confidential information by encrypting it to an unreadable form. Multilevel identity This is an extension of normal encryption. Here the identity of the data on Cloud is encrypted and encryption protected by multiple layers. (Raghuwanshi& Rajagopalan 2014) ORAM (Li & Guo, Oblivious RAM (Random Access Memory), allows clients to access their data on a remote server. 2014, April). XACML framework eXtensible access control markup language, is used to implement attributes-based on access control (Abou-Tair &Berlik policies. 2007) Data masking The method of creating inauthentic nonetheless structurally similar versions of data for testing and (Motiwalla, & Li 2010) training purposes. Random 4 Application specific encryption algorithm, which is used to prevent SQL (Structured Query Language) (Avireddy et al 2012) injection. StarLight (Faulkner &This tool is used to gather information from different sources like visual intelligence, geospatial to alert Kritzstein; & staff on sea ports. Zimmerman, 2011) 3DES (Islam &
Islam, Triple data encryption algorithm is a symmetric key block cipher, which applies DES 2014) (Data Encryption Standard) three times on same data. VPN (Mengke The virtual private network extends private network across the public network. Establishing virtual P2P &Xiaoguang connection through dedicated connections and traffic tunneling creates VPN (Virtual Private Network). &Jianqiu&Jianjian, 2016) OBEX (Krishnan & The objective exchange is a communication protocol that helps the exchange of binary objects between Helberg&Merve, 2016) devices. MuteDB Architecture This architecture devise incorporates data encryption, authentication, authorization and key management (Ferretti&Pierazzi to assure confidentially of data in cloud. &Colajanni&Marchetti, 2014) TABLE 3 â€“ Security Issues in Big Data: Algorithms used in published papers Article Application Security Issues Algorithms Used Data Size Method Used 1 Krishnan, Helberg, Merve (2016) Health and Finance Privacy of customersâ€™ and employeesâ€™ information FPE (Format Preserving Encryption) algorithm that combines: one algorithm to encrypt, another algorithm to decrypt and one algorithm to sample Dataguise This helps to decrease data breaches risk. Dataguise is created to detect, protect and also handle compliance to regulatory mandates 2 Moura et. al. (2016) Cloud computing Usersâ€™ privacy and management of big data Functional encryption algorithm consisting of Key Generation, Encryption, Decryption, and Evaluation Homomorph ic encryption Keeps private information secure. This forms encryption that allows certain computations to be executed cipher text and create encrypted results. 3 Swarna et. al. (2016) Cloud environment Privacy of transmitting and stored information Ring signatures include only two algorithms: Sign and Verify. Ring signature This type of signature can be executed by any group member Ring signature uses PSA algorithm 4 Moura et. al. (2016) Social network Privacy of usersâ€™ information Rendering algorithm User rights management Secure storage of information. The user identifies rights and limits the content; all users are registered securely 5 Xiaoguang et al. (2016) Local area networks Interface security VPN encryption —– Background linkage for managers including remote lock, data wiping, and automatic alarm. 6 Charishma et. al. (2015) Business organizations Privacy of companyâ€™s and employeesâ€™ information Cryptographic algorithms Analytic tool Splunk Provides log management by taking and analyzing the logs using certain patterns. 7 Chandanke re (2015) Cloud storage Privacy of stored information Group signature algorithms consist of four algorithms: KeyGen, Sign, Verify, and Trace. KenGen and Sign are randomized, Verify and Trace are deterministic. Dynamic encryption and group signature Enables secure sharing of information Dynamic encryption enables transmitting encrypted data to group member by adding members to admin. Group signature enables disclosure of identity by admin in case of dispute. 8 Gang (2015) Social network Privacy of users K-anonymity algorithm Anonymity protection Anonymity protection is used to protect data that can include relationship, attributes and identity anonymities. Usman (2015) Business organizations Privacy of companyâ€™s and employeesâ€™ information symmetric encryption algorithm AES encryption This type of encryption makes data unreadable for attackers. It contains operations, including substitutions and permutations. 9 Gang (2015) Multimedia Privacy of transmitting media files Watermarking detection algorithms (cryptographic algorithms) Data Watermarkin g â€œData Watermarkingâ€ is applied to protect Copyrighting â€œData Watermarkingâ€ relates to the information identification that is inserted imperceptibly. 10 Kaur et.al. (2015) E-Commerce Privacy of customersâ€™ information RivestShamirAdleman (RSA) algorithm A new Ebanking security system In the new system activities and functions are s This new system is grounded on a strong access control. 11 Jyothirmai et. al. (2015) HealthCare Privacy of patientsâ€™ information access control algorithms (cryptographic algorithms) Role-Based Access Control Role-Based Access Control is a tool for managing access of data and making data safe Role-Based Access Control is able to manage different policies of access control which is grounded on role hierarchies. 12 Krishnan et al. (2015) Mobile phone Intrusion detection in Mobile phones BLA, Bluetooth object exchange (OBEX) Protocol Covers 60% of Bluetooth market Using Bluetooth logging agent, and also using database rules to authenticate. 13 Gang (2015) Commercial companies Privacy of companyâ€™s and employeesâ€™ information Access control algorithms (cryptographic algorithms) Role-based access control Roles are able to be generated and behavior of each user can be checked. It is grounded on authorization of â€œUsers-Objectâ€ and role optimization. 14 Feng et al. (2015) Healthcare Regional secure data process/collect to limit issues in future health care Raspberry Pi All digital healthcare industry Raspberry Pi is a pocket-sized computer used in forensic medicine, and forensic etymology. 15 Raji et al. (2015) Social networking Privacy challenges in online social networks (OSN) P2P-ONS Architecture Can include Facebook, Twitter, Messenger, etc. Architecture composed of privacy enabled start-up for userâ€™s social communication and adaptive replica for ensuring availability of shared data. 17 Gang (2015) Business environment Privacy of companyâ€™s and employeesâ€™ information Clustering algorithm Access control (risk adaptive) Risk adaptive access control is appropriate when it is not clear which data is accessible for users. This method uses information theory and statistical methods to identify quantization algorithm. 18 Gang (2015) Commercial organizations Privacy of companyâ€™s and employeesâ€™ information Provence graph algorithms Data Provence Data Provence is used through labelling, so it is able to check if the results are correct, to differentiate the data in the table or to update the data. 19 Gadepally et al. (2015) Kepner et al. (2014) Bioinformatics and social media Privacy of patientsâ€™ information bioinformatics, personal information of users in social media Graph algorithms that are created using associative arrays Computing on masked data Computations are allowed to be executed directly on masked data, as the authorized recipients are allowed to unmask data. CMD (Computing on Masked Data) includes methods of cryptographic encryption and associative arrays that represents big data 20 Wagh et. al. (2014) Educational organizations Privacy of education resources, and personal information of users, integrity Digital signature: a randomized KeyGen algorithm, a randomized Sign algorithm, and a deterministic Verify algorithm. Digital signature, data encryption, access control Access control is achieved through verification by transmitting secure data using data encryption and digital signature. 21 Merkel (2014) Health and Finance Privacy of customerâ€™s information Mathematical algorithms Bash tool It simplifies a complex process. It is used mostly as an intermediary 22 Hsu et al. (2014) Group communicatio n in social media Privacy of usersâ€™ information Changed RSA algorithm, that relies on NP class Group key transfer protocol This protocol protects from attacks that decreases system implementation overhead. This protocol is based on LSSS (Large Scale Survey System and DH key agreement and does not have online KGC (Key Generator Center) 23 Pace (2014) Scientific computing Privacy of scientistsâ€™ works ARC algorithm for storage pool Data management Data management simplifies data workflow and makes it secure. Data management assigns to pool service management. 24 Bertine et al. (2014) Web application Encrypted data is not secure PPDM and PPDA Sheer amount of data in cloud Using Cryptography to encrypted data. 25 Ferretti et al. (2014) Cloud database services Scalab
le solution to guarantee confidentially of MuteDB architecture Cloud providers Data encryption, authentication and authorization to form new MuteDB. information in the database. 26 Huang et al. (2014) Web application and Hybrid cloud service providers Privacy of image data stored in public cloud AES algorithm Image Data Dividing images to blocks and shuffle them, to make them unrecognizable. 27 Mehak (2014) Cloud storage Privacy of stored information Map Reducer Algorithms including Master Key Algorithm Hadoop Enables processing of big data sets. It parallelizes processing of data across computers in a cluster 28 Mirarab et. al. (2014) E-Commerce Privacy of customersâ€™ information Stenography: least significant bit algorithm (LSB) Encryption, Steganography Encryption and Steganography makes E-Commerce more safe and secure. Encryption is executed though Elliptic Curve Cryptography. LSB (Least Significan Bit) Steganography is used for image compression. 29 Syed et al. (2014) Cloud services including medical data, confidential defense records, etc. Encryption of data in the database is not sophisticated enough to provide enhanced security. Encryption for frequent access node Potential to enhance the security of all cloud providers. Combining cloud-based technologies and attribute based on encryption for secure storage and retrieval. 30 Raghuwansi et al. (2014) Cloud service providers Privacy of data at rest in cloud Multilevel identity encryption Cloud vendors and consumers By using encryption and verification services both at file and block storage level. 40 Li et al. (2014) Google Drive, Dropbox, Amazon S3, SkyDrive, iCloud, gnyte, OneDrive, etc. Privacy preserving data access to cloud ORAM algorithm Geo- distributed cloud sites Using ORAM (Oblivious Random Access Memory) for load balancing thus revealing the access patterns 41 Tan et al. (2014) Any cloudbased application e.g. one drive, cloud Cloud security Secured entry points Terabytes. Sensing attacks and alerting the user by intrusion detection system and data leak prevention system. 43 Islam et al. (2014) Multimedia like email, music, and images Dealing with structured data and unstructured data Statistical learning algorithms Infinite Text analysis by filtering, clustering and classification, building security node. 44 Abawajy et al. (2014) Robotics and control system Malware detection LIME classifier Massive and expected to grow exponentiall A designer has to initialize a fourtier LIME (Large Iterative Multitier Ensemble Classifiers) classifier by specifying which ensemble meta-classifier will y operate at the fourth tier. 45 46 Islam et al. (2014) Financial system Security to unstructured data 3DES 1200-1400 Exabytes Digital certificate, using hash functions. 47 Tankard (2012) Multi-silo environment Privacy and law of data protection Symmetric encryption key algorithms Vormatric Vormatric manages data access control that combines storage elements, policy management and data encryption 48 Kaplan (2012) Business organizations Privacy of companyâ€™s and employeesâ€™ information Cryptographic algorithms Encryption It protects the system from attacks. Encryption protects information by encoding messages. 49 Pramila et al. (2012) Healthcare Current system to diagnose the patient has slight range and is not secure Using location tracking technology, telediagnosis, Access using PKC Covers the person suffering from Alzheimer’s Disease A new method proposed with the long-range outdoor environment with GPS (Global Positioning System) and fine-grained distributed data access control. 50 Viddy et al. (2012) Web applications or websites Web application is suffering security attack especially SQL injection attack Random4 encryption algorithm Have potential to provide security to web traffic Using an encryption algorithm based on Randomization. 51 Faulkner et al. (2011) Military Port has very serious security issues; they lack port specific security technologies to alert security personnel in case of any hazard or danger. StarLight uses Visual intelligence, entity detection, and intrusion Commercial and military ports StarLight combines information from different sources and integrates text, geospatial and temporal data to alert security staff. 52 Motiwalla et al. (2010) Healthcare Privacy preserving for healthcare data. Data masking Spent $39.4 billion in 2008 Changing the data values by using noise perturbation, data aggregation, and data swapping. 53 Yun et al. (2010) Database or Data Warehouse of any company Information security structure for database processor Following PlanDo-Check-Act cycle to implement nine principles established by OECD Cover all organization having a database Using a structure that complies with laws requirements and conflicts between consumer and database processor. 54 Chang et al. (2009) Mobile Phones RFID enabled credit cards lacks sophisticated computation A new RFID system based on mobile phones Have Potential to expand it in credit card Proposed an efficient and secure mechanism using mobile devices like RFID (Radio-Frequency Identification) readers together IV RESULTS This study has collected and analyzed data from 58 peer reviewedarticles published from 2007 to 2016 in order to find answers for confidentiality issues of Big Data. The observations clearly show that, industries that have kept sensitive data of customers are trying to preserve their privacy whereas, the industries, which have their computations in real time, are working hard to keep it as a secret. Moreover, it is significant to say that most of solutions are based on encryption algorithms. Furthermore, in different areas, as for example Finance and Health (role-based access control), can be applied as a solution. In addition, most seeable solutions are directed to protect the access of big data, as there is not much concern on the security aspects of big data. Figure 2: Overview of the different application, using the data from the 58 peer reviewed scientific articles published in 2007â€“2016 of Security and Privacy Challenges of Big Data. The percentage of mostly used application is given in the Figure 2. After analyzing the Data from Table 3 the following conclusion can be drawn. a. Healthcare: Preserving the privacy of data is a critical issue. Healthcare information should not be disclosed or retrieved easily, or leaked to hackers, as it is sensitive information that may cause financial detriment. To avoid this, there are some techniques used that include the anonymization of the records by using a tool called MapReduce. Additionally, data masking, which is a unique computer device that is called Raspberry Pi, and this devise collects data in a secure manner, by using location tracking technology to treat patients. b. Web Application: Web applications suffer the majority of data breaches. The violations occur not only at the backend nonetheless also during the data transmission and data collection. The best solution is to assure and identify the owner by encrypting data before the transmission thru encrypting the frequently accessed nodes securing entry points are also uses specific architectures for authentication, authorization, and key management. c. Mobile Devices: In mobile devices, the first issue is to verify the user as this verification can be a problem. Another fundamental problem is to secure the transaction from all devices. The possible solutions include using a concrete architecture like OBEX, to monitor the behavior of data and network for detecting intrusions by using RFID-based mobile phones. d. Social Networking: The Privacy and access to shared data causes the majority of the problems. A twoway authentication of a user is one of the possible privacy preserving solutions that need to be done. e. Finance: This application area has been under the scanner, and the majority of attacks and threats are the target for monetary benefits. Both privacy of sensitive information and data storage in cloud or databases are at risk. Authentication, authorization of the user is one of the first challenges of underlying issues that need immediate attention. Banks use a
n intrusion detection system to detect threats, ergo, it is not safe enough. Various encryption algorithms used makes computations and transactions secure. VDISCUSSION Big data is a fairly new concept in IT, and it is obvious that research in the area is not thorough enough and more research needs to be done. However, there is an important gap in many articles suggesting that research is bias towards traditional methods and Big Data is under-researched. Firstly, in most of the articles only one area is considered, that makes research findings one-sided and incomplete. This disadvantage refers to all articles used except article of Gang (2015) where different areas and different solutions for security issues are represented in his study. For instance, social network, multimedia, commercial organizations, companies, business environment, anonymity protection, data watermarking, data Provence, role-based access control and risk-adaptive access control are all aspects that can be looked into by scholars. To draw more conclusive results, a wider range of information needs to be considered in most articles. Moreover, there is lack of comparison of different solutions and why or how they could apply is another gap in these articles. The literature chosen for this study also has gaps in the area of application as to how certain security measures could be applied. Secondly, in most articles there is no detailed information about solutions including algorithms that are used for particular problems, for example case studies would be valuable to understand and gain solutions in a particular area. Thirdly, in many articles security issues are represented in a general form without specific information related to a certain problem of an area. In particular, most common security and privacy issues do not include details about what privacy was meant and what kind of information needs to be protected. Technological development with variety of benefits can also bring threats that can pose a danger and result in the breach of privacy and if important information is made public by companies then they can be facing hefty fines. Big data is a new area referred to the vast amount of information that needs to be analyzed and stored in order to eliminate confidentiality breaches. There are many security issues in different areas of big data and sensitive information needs to be protected. Though a big research in big data needs to be done, as this area is still new, and more research is demanded because many questions need to find answers. Results clearly show that security issues are similar in different areas, solutions can be the same in different areas and many solutions are grounded on encryption algorithm. Moreover, protection of access is very significant in big data. To summarize this study, this research has added knowledge about big data security issues and highlighted research gaps in the area. However, this area is neglected and requires a continuing research covering different aspects of big data. VCONCLUSION In conclusion this chapter has analyzed data obtained from 58 peer-reviewed scientific publications from 2007 to 2016. This study has highlighted certain gaps in the literature to evaluate possible solutions to a rising problem in various privacy and security issues in different areas of big data. The company-provider needs to ensure security for a safer infrastructure and protection of customersâ€™ information and this data has to comply with confidentiality standards. In some areas as Health and Finance solving, security issues are the key point of effective and successful work of the company. Many different technologies are created to protect against securities issues, however, the existing technology is not able to completely solve security issues and research in this area is poor but continuing. Although a big research has been done regarding big data issues, it is still a fairly new advancement in IT and a lot of questions and aspects of security problems of big data are not answered and covered enough. In particular, there is a lack of comparative analysis of both security issues in different areas, as to draw solutions by comparing and contrasting studies and finding solutions for them. Therefore, this work is aimed at finding and comparing important security issues in big data in different areas and also evaluated solutions that can solve security issues. This analysis is also important in a sense of providing the grounds for further research and enriching existing information about big data. In order to address these gaps and highlight issues in regards to some security and privacy issues of big data, certain tools and techniques, have been used to find possible answers to particular alarming issues of Big Data. Data has been categorized, and then the second step was to group them under different parameters. The revelation concluded that web applications and financial institutes are dealing with security problems, and each problem is resolved in varying ways. Social media and other industries dealing with sensitive information have individual privacy concerns, which are treated with a uniform approach. This research has addressed gaps in the literature by highlighting security and privacy issues that big companies face with recent technological advancements in corporate societies. By evaluating these gaps there may be some light shed on these issues of big data and provide future researcher directions to solve them. For the future work, more data should be collected to observe the security and privacy challenges of Big Data. REFERENCES A Survey of Information Security in HealthCare Sector. (2015). Retrieved from http://www.ijera.com/special_issue/NCDATES/CSE/PART-3/CSE%20135-2935.pdf Abawajy, J. H., Kelarev, A., & Chowdhury, M. (2014). Large iterative multitier ensemble classifiers for security of big data. IEEE Transactions on Emerging Topics in Computing, 2(3), 352-363. Agrawal, R., &Srikant, R. (2000, May). Privacy-preserving data mining. InACMSigmod Record (Vol. 29, No. 2, pp. 439-450). ACM. Avireddy, S., Perumal, V., Gowraj, N., Kannan, R. S., Thinakaran, P., Ganapthi, S., …&Prabhu, S. (2012, June). Random4: an application specific randomized encryption algorithm to prevent SQL injection. In 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (pp. 1327-1333). IEEE. Bertino, E., &Samanthula, B. K. (2014). Security with privacy-A research agenda. In Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2014 International Conference on (pp. 144153). IEEE. Chandankere, B. (2015). Secure Data Sharing in an Untrusted Cloud. Journal of Engineering Research and Applications, 5(8). Chang, A. Y., Tsai, D. R., Tsai, C. L., & Lin, Y. J. (2009, October). An improved certificate mechanism for transactions using radio frequency identification enabled mobile phone. In 43rd Annual 2009 International Carnahan Conference on Security Technology (pp. 36-40). IEEE. Charishma, P., &Venkatesh, K. (2015). Big Data Security Analytic Solution using Splunk. Journal of Engineering Research and Applications, 5(4) Choi, C. (2013). A new type of security chip guards against big data snooping. Scientific American, 309(6). Computing on Masked Data to improve the Security of Big Data. (2015). Retrieved from http://arxiv.org/pdf/1504.01287v1.pdf Computing on Masked Data: a High Performance Method for Improving Big Data Veracity. (2014). Retrieved from https://arxiv.org/ftp/arxiv/papers/1406/1406.5751.pdf DATAGUISE REVEALS FIVE BIG DATA SECURITY PITFALLS. (2015). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1667178648?OpenUrlRefId=info:xri/sid:primo&ac countid=10344 Dell’s says its new solutions allow customers to tackle BYOD, big data and security concerns. (2013). Entertainment Close – Up, Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1468158864?accountid=10344 Dhiah el Diehn, I., Berlik, S., &Kelter, U. (2007). Enforcing privacy by means of an ontology driven xacml framework. In Third International
Symposium on Information Assurance and Security (pp. 279-284). IEEE. Faulkner, L. L. Kritzstein; B. P Zimmerman J. J (2011) Security infrastructure for commercial and military ports Pages: 1 â€“ 6.Ocean 11- MST/IEEE Kona. Program Book Art. No 6107174 Feng, X., Onafeso, B., & Liu, E. (2016). Investigating Big Data Healthcare Security Issues with Raspberry Pi. In Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on (pp. 2329-2334). IEEE. Ferretti, L., Pierazzi, F., Colajanni, M., &Marchetti, M. (2014). Scalable architecture for multi-user encrypted SQL operations on cloud database services. IEEE Transactions on Cloud Computing, 2(4), 448-458. Gang, Z. (2015). Research on Privacy Protection in Big Data Environment. Journal of Engineering Research and Applications, 5(5). Goel, S., & Hong, Y. (2015). Security Challenges in Smart Grid Implementation. In Smart Grid Security (pp. 139). Springer London. Hsu, C., Zeng, B., & Zhang, M. (2014). A novel group key transfer for big data security. Applied Mathematics and Computation, 2014(249). Huang, X., & Du, X. (2014, April). Achieving big data privacy via hybrid cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 512-517). IEEE. Huang, X., & Du, X. (2014, April). Achieving big data privacy via hybrid cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 512-517). IEEE. Hussain, F. (2017). Internet of Everything. In Internet of Things (pp. 1-11). Springer International Publishing. Internet Security Flaws in the Age of Big Data. (2014). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1638234956?OpenUrlRefId=info:xri/sid:primo&ac countid=10344 Islam, M. R., & Islam, M. E. (2014, December). An approach to provide security to unstructured Big Data. In Software, Knowledge, Information Management and Applications (SKIMA), 2014 8th International Conference on (pp. 1-5). IEEE. Johnson, S. M. (1998). The Internet changes everything: Revolutionizing public participation and access to government information through the Internet. Administrative Law Review, 277-337. Kaur, K., Dhindsa, K. S., & Singh, G. (2009, March). Numeric to Numeric Encryption of Databases: Using 3Kdec Algorithm. In Advance Computing Conference, 2009. IACC 2009. IEEE International (pp. 15011505). IEEE Kaur, K., Pathak, A., Kaur, P., & Kaur, K. (2015). E-Commerce Privacy and Security System. Journal of Engineering Research and Applications, 5(5). Kim, S., Kim., N., & Chung, T. (2013). Attribute Relationship Evaluation Methodology for Big Data Security. IT convergence and security. Retrieved from http://ieeexplore.ieee.org.ezproxy.csu.edu.au/stamp/stamp.jsp?tp=&arnumber=6717808 Kizza, J. M. (2015). Introduction to computer network vulnerabilities. InGuide to Computer Network Security (pp. 87-103). Springer London. Kopetz, H. (2011). Internet of things. In Real-time systems (pp. 307-323). Springer US. Kuhn, D. R., Walsh, T. J., & Fries, S. (2005). Security considerations for voice over IP systems. NIST special publication, 800-58. Lee, W., Stolfo, S. J., Chan, P. K., Eskin, E., Fan, W., Miller, M., …& Zhang, J. (2001). Real time data miningbased intrusion detection. In DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (Vol. 1, pp. 89-100). IEEE. Li, P., &Guo, S. (2014). Load balancing for privacy-preserving access to big data in cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 524-528). IEEE. LI,P.,&GUO,S.(2014,APRIL).LOAD BALANCING FOR PRIVACY-PRESERVING ACCESS TO BIG DATA IN CLOUD.INCOMPUTER COMMUNICATIONS WORKSHOPS (INFOCOMWKSHPS),2014IEEECONFERENCE ON (PP.524-528).IEEE. Li, Y., &Xiangsheng, L. (2010, October). Information security structure for database processer. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010) (Vol. 15, pp. V15-97). IEEE. Mehak, G. (2014). Improving Data Storage Security in Cloud using Hadoop. Journal of Engineering Research and Applications, 4(9) Mirarab, A., &Kenari, A. (2014). A New Framework for Secure M-Commerce. Journal of Engineering Research and Applications, 4(11). Montlick, T. F. (1996). U.S. Patent No. 5,561,446. Washington, DC: U.S. Patent and Trademark Office. Motiwalla, L., & Li, X. (2010, July). Value added privacy services for healthcare data. In 2010 6th World Congress on Services (pp. 64-71). IEEE. Motiwalla, L., & Li, X. (2010, July). Value added privacy services for healthcare data. In 2010 6th World Congress on Services (pp. 64-71). IEEE. Pace, A. (2014). TECHNOLOGIES FOR LARGE DATA MANAGEMENT IN SCIENTIFIC COMPUTING. International Journal of Modern Physics C: Computational Physics & Physical Computation, 25(2). Pham, D. V., Halgamuge, M. N., Syed, A., &Mendis, P. (2010, July). Optimizing windows security features to block malware and hack tools on USB storage devices. In Progress in electromagnetics research symposium. Pham, D. V., Syed, A., Mohammad, A., &Halgamuge, M. N. (2010). Threat analysis of portable hack tools from USB storage devices and protection solutions. In Information and Emerging Technologies (ICIET), 2010 International Conference on (pp. 1-5). IEEE. Pramila, R. S., Nargunam, A. S., & Affairs, A. (2012, March). A study on data confidentiality in early detection of Alzheimer’s disease. In Computing, Electronics and Electrical Technologies (ICCEET), 2012 International Conference on (pp. 1004-1008). IEEE. Privacy and Security Solutions for Interoperable Health Information Exchange: Final Implementation Plans. (2007). Retrieved from https://healthit.ahrq.gov/sites/default/files/docs/page/FIP_0.pdf Raghuwanshi, D. S., &Rajagopalan, M. R. (2014, January). MS2: Practical data privacy and security framework for data at rest in cloud. In Computer Applications and Information Systems (WCCAIS), 2014 World Congress on(pp. 1-8). IEEE. Security and Privacy Issues of Big Data. (2016). Retrieved from https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf Skinner, G., Chang, E., McMahon, M., Aisbett, J., & Miller, M. (2004, November). Shield privacy Hippocratic security method for virtual community. In Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE (Vol. 1, pp. 472-479). IEEE. Stouffer, K., Falco, J., &Scarfone, K. (2011). Guide to industrial control systems (ICS) security. NIST special publication, 800(82), 16-16. Swarna, S., & Maryam. S. (2016). Increasing Security Level in Data Sharing Using Ring Signature in Cloud Environment. Journal of Engineering Research and Applications, 6(2) Syed, S., &Teja, P. S. (2014, November). Novel data storage and retrieval in cloud database by using frequent access node encryption. In Contemporary Computing and Informatics (IC3I), 2014 International Conference on (pp. 353-356). IEEE. Tan, Z., Nagar, U. T., He, X., Nanda, P., Liu, R. P., Wang, S., & Hu, J. (2014). Enhancing big data security with collaborative intrusion detection.IEEE cloud computing, 1(3), 27-33. Tankard, C. (2012). Big data security. Network Security, 2012(7). The 5 Methodology Milestones for Big Data. (2015). Retrieved from https://icrunchdatanews.com/5methodology-milestones-big-data/ THE BIG PICTURE. (2012). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1011329843?rfr_id=info%3Axri%2Fsid%3Aprimo Usman, I. (2015). The Risk and Challenges of Cloud Computing. Journal of Engineering Research and Applications, 5(12). Vashist, R. (2015). Cloud Computing Infrastructure for Massive Data: A Gigantic Task Ahead. In Big Data in Complex Systems (pp. 1-28). Springer International Publishing. Wagh, K., Jathar, R., Bangar, S., &Bhakthadas, A. (2014). Securing Data Transfer in Cloud Environment. Journal of Engineering Research and Applications, 4(5) YANG, M., ZHOU, X., ZENG, J., & XU, J. (2016). Challenges and Solutions of Information Sec
urity Issues in the Age of Big Data. Volume3 pages: 139-202. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., & Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers,64(8), 2293-2307. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., & Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers,64(8), 2293-2307. Zissis, D., &Lekkas, D. (2012). Addressing cloud computing security issues. Future Generation computer systems, 28(3), 583-592.