____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 2 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 3 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 4 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 5 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 6 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 7 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 8 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 9 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 10 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 11 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 12 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 13 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 14 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 15 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 16 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 17 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 18 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 19 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 20 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 21 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 22 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 23 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 24 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 25 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 26 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 27 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 28 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 29 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 30 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 31 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 32 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 33 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 34 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 35 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 36 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 37 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 38 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 39 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 40 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 41 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 42 ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Big Data Architecture Lab (Copyright © Arcitura Education Inc. www.arcitura.com) v2.1 43 Table of Contents About Using Answers and Hints ................................................................................................................. 3 Reading Exercise 12.1 (15 minutes) In-Class Reading and Discussion: SFI Case Study Background 4 Technical Infrastructure and Automation Environment ............................................................................... 4 Business Goals and Obstacles................................................................................................................... 4 Lab Exercise 12.2 (90 minutes) Design Big Data Pipeline for SLA Compliance..................................... 6 Plan Data Acquisition and Storage ............................................................................................................. 7 Plan Data Processing ................................................................................................................................11 Plan Data Export .......................................................................................................................................14 Lab Exercise 12.3 (60 minutes) Alleviate Customer Dissatisfaction ......................................................16 Plan Data Acquisition and Storage ............................................................................................................17 Plan Data Processing and Export..............................................................................................................19 Lab Exercise 12.4 (30 minutes) Reduce Data Storage Cost ....................................................................21 Identify Alternative Data Storage Solution .................................................................................................22 Reading Exercise 12.5 (15 minutes) In-Class Reading and Discussion: LOC Case Study Background .......................................................................................................................................................................24 Technical Infrastructure and Automation Environment ..............................................................................24 Business Goals and Obstacles..................................................................................................................24 Lab Exercise 12.6 (60 minutes) Solution for Intelligent Oil Exploration.................................................26 Plan Data Acquisition and Storage ............................................................................................................27 Plan Data Analysis ....................................................................................................................................30 Lab Exercise 12.7 (60 minutes) Enhance Oil Well Production ................................................................32 Plan Well Logs Acquisition, Storage and Processing ................................................................................33 Democratize Well Logs..............................................................................................................................36 Lab Exercise 12.8 (60 minutes) Reduce Maintenance Costs and Achieve Regulatory Compliance ...38 Develop Predictive Maintenance Solution .................................................................................................39 Develop Continuous Asset Monitoring Solution.........................................................................................41 Reading Exercise 12.9 (15 minutes) In-Class Reading and Discussion: TXC Case Study Background .......................................................................................................................................................................43 Technical Infrastructure and Automation Environment ..............................................................................43 Business Goals and Obstacles..................................................................................................................43 Lab Exercise 12.10 (45 minutes) Identify Fraud and Eliminate Waste ...................................................45 Collate and Correlate Datasets..................................................................................................................46 Lab Exercise 12.11 (45 minutes) Prioritize Resource Allocation and Enable Open Data Access........48 Enable Social Media Data Analysis and Public Data Access ....................................................................49 Answers/Hints for Exercise 12.2.................................................................................................................51 Plan Data Acquisition and Storage ............................................................................................................51 Plan Data Processing ................................................................................................................................53 Plan Data Export .......................................................................................................................................54 Answers/Hints for Exercise 12.3.................................................................................................................55 Plan Data Acquisition and Storage ............................................................................................................55 Plan Data Processing and Export..............................................................................................................57 Answers/Hints for Exercise 12.4.................................................................................................................58 Identify Alternative Data Storage Solution .................................................................................................58 Answers/Hints for Exercise 12.6.................................................................................................................60 Plan Data Acquisition and Storage ............................................................................................................60 Plan Data Analysis ....................................................................................................................................62 Answers/Hints for Exercise 12.7.................................................................................................................64 Plan Well Logs Acquisition, Storage and Processing ................................................................................64 Democratize Well Logs..............................................................................................................................66 Copyright © Arcitura Education Inc. v2.1 1 Answers/Hints for Exercise 12.8.................................................................................................................67 Develop Predictive Maintenance Solution .................................................................................................67 Develop Continuous Asset Monitoring Solution.........................................................................................69 Answers/Hints for Exercise 12.10...............................................................................................................71 Collate and Correlate Datasets..................................................................................................................71 Answers/Hints for Exercise 12.11...............................................................................................................73 Enable Social Media Data Analysis and Public Data Access ....................................................................73 Copyright © Arcitura Education Inc. v2.1 2 About Using Answers and Hints Answers and hints are located in the back of this booklet. To get the most out of these course materials, be sure to complete the lab exercises on your own to whatever extent possible before reading these sections. Copyright © Arcitura Education Inc. v2.1 3 Reading Exercise 12.1 (15 minutes) In-Class Reading and Discussion: SFI Case Study Background SFI is a large internet service provider (ISP) and a website hosting company. It provides internet services, including broadband and TV, to around 7.5 million customers, 5 million of which are residential customers and 2.5 million of which are business customers. SFI hosts a large number of websites and provides 24/7 support to its customers via telephone, email and online chat. Technical Infrastructure and Automation Environment SFI’s IT landscape can mainly be divided according to its business functions: broadband/TV and website hosting. The broadband/TV services are provided through a fiber optic/cable network stretched over hundreds of miles. A fiber optic carries data between exchanges and the cabinets located at the street level. From the cabinets to the customers’ premises, a cable is used that carries both the broadband and TV data. A wireless router/modem is installed at the customers’ premises for providing the broadband service, while a set-top box is used for the TV service. A number of multiplexers, routers and gateways enable communication of data between the client location and the Internet. Video content is stored on multiple CDN servers. Website hosting infrastructure includes load balancers, DNS servers, numerous web servers, email servers, FTP servers, relational databases, routers and switches. An incident management system is used for recording and resolving service-related issues. This system is linked with the CRM system, which is further used by the customer care agents for registering and answering customer queries. An ERP system is used for the automation of various business processes and activities, such as payroll, accounts and purchases of equipment. A range of operational dashboards are used to monitor the state of services and to ensure that the service delivery is within the published SLA. An account management application provides the customer with the ability to create an account, manage subscriptions and view service usage. A billing application keeps track of customers’ subscriptions/contracts and service usage and generates end-of-month bills. Business Goals and Obstacles SFI guarantees an uptime of 98.99%. However, for the past 6 months, it has not been able to keep its published SLA. Figures show that monthly downtime has been more than 10 hours, whereas the published SLA says the downtime cannot exceed 7.5 hours each month. The inability to keep up with the published SLA has resulted in customer defection to other competitors, which is reflected in the recent quarter’s financial reports. SFI’s management is concerned that if the service provision issues are not resolved in time, SFI’s profit levels may decline exponentially. A customer satisfaction survey conducted by an independent ISP comparison organization has caught the eye of the CEO. According to the survey, SFI’s customer Copyright © Arcitura Education Inc. v2.1 4 satisfaction level is on a continuous decline. Comments left by customers reveal that one of the top reasons cited for decreasing satisfaction levels is the time it takes to resolve customers’ issues. This is causing frustration among the customers and resulting in customers cancelling their contracts/subscriptions. While going through the financial reports related to the IT spending, the CFO noticed an upward trend in the amount spent on data storage software. Upon further querying, the IT managers reveal that the cost corresponds to the acquisition of new licenses for relational databases in order to keep up with the customers’ increased demand for data storage. A breakdown of websites shows that not all hosted websites impose strict relational data storage requirements. Even websites that require so, such as ecommerce websites, do not have all of their data storage operations require ACID support. The CFO directs the IT managers to look into the issue of the spiraling cost of data storage and devise a solution to keep costs to a minimum. SFI recently launched a pay-per-view service as part of its TV service. In order to entice viewers, advertisements about different TV shows are displayed via their set-top boxes. However, the viewer’s response has not been as projected, and the revenue target is not being met. Based on an assessment of the current challenges and the benefits promised by Big Data, SFI’s IT team decides to adopt Big Data technology and techniques. However, none of the IT team members is conversant with the use of Big Data technologies and techniques. Consequently, a consulting company is engaged that specializes in the field of Big Data. Copyright © Arcitura Education Inc. v2.1 5 Lab Exercise 12.2 (90 minutes) Design Big Data Pipeline for SLA Compliance A team of consultants from the Big Data consultation company holds a meeting with SFI’s management and IT staff in order to prioritize the goals that need to be addressed. After much deliberation, SLA fulfillment is given the top priority, for the management believes that achieving SLA compliance will serve as a means of regaining customer confidence and will ultimately help towards customer retention. The consultants start looking into the reasons for non-conformance with the published SLA. Compliance with SLA starts to slip when any of the offered services (broadband/TV/website/email) becomes unavailable for more than the agreed downtime or when, although the service is available, the data transfer speed becomes too slow, resulting in severely degraded service. Normally, the main reason of total or partial unavailability is a hardware failure, such as a failed router or a web server. The current procedure of rectifying service-related issues follows a reactive approach, where an issue is only fixed once it becomes known either when a customer reports it as an incident or through the operational dashboards. Once it is known that there is a service disruption, the next step is to identify the culprit hardware through manual inspection of various log files. At times, the identification of the related log file itself takes a long time. All this time taken to find the actual cause of the issue makes SLA compliance harder to achieve. The consultants propose a proactive strategy for rectifying total or partial service unavailability issues by developing a Big Data analytics solution that can continuously analyze log files to find error conditions. They are planning to develop a Big Data pipeline that enables SFI to automatically collect log files from a variety of data sources, processes these log files within a short time period and generate insights. The pipeline would achieve this via a simple computation of statistics or through the application of machine learning algorithms, and it would help the IT team quickly find the cause of an issue. Each of the following three exercises requires you to identify one or more design patterns that help the development of a Big Data pipeline. Copyright © Arcitura Education Inc. v2.1 6 The Big Data Pipeline compound pattern, provided for reference purposes. Plan Data Acquisition and Storage A list of hardware devices that take part in the transferring or delivering of data in any shape and form is compiled. The list includes load balancers, DNS servers, web servers, email servers, FTP servers, relational databases, gateways, routers, switches and CDN servers. Each of these data sources is allocated a device id, and each data source creates a delimited log file in textual format that registers the functional aspects of each device. Each line represents a separate record entry. For example, the web server log could contain information on the time a client requested a particular resource, such as a webpage or an image, a client’s IP address, the requested resource, the size of the data returned to the client and the HTTP status code. Different data sources produce log files at different intervals. Successful analysis of log data requires that log files from all identified data sources are acquired in their raw forms and on a periodic basis. Furthermore, it is decided that to ease the burden of adding and removing data sources, the management of data sources should be possible via point-and-click operations. Once ingested, the log files will need to be saved in a redundant manner in order to cope with data loss due to hardware failure. Log files will be first cleansed, and then the cleansed files corresponding to the same type of data source will be processed together as a group. The log files will be processed using a distributed processing engine, such that records in each file are processed sequentially. The processed data will consist of different types of statistics for each type of device. The computed statistics for each individual device will need to be stored in such a way that a timeline for all values of each statistic can be established and queried by using the device id as the key. Anticipating that the computed statistics will be heavily queried, the statistics need to be saved in such a way that they lend themselves to achieving maximum read performance. Queries will be restricted to a specific type of hardware device. Copyright © Arcitura Education Inc. v2.1 7 The Poly Source compound pattern, provided for reference purposes. The Poly Storage compound pattern, provided for reference purposes. Copyright © Arcitura Education Inc. v2.1 8 The Random Access Storage compound pattern, provided for reference purposes. The Streaming Access Storage compound pattern, provided for reference purposes. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. (Note that any pattern referenced must be a core member pattern of the Big Data Pipeline compound pattern.) ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 9 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of each of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 10 Plan Data Processing A number of data wrangling operations, including data cleansing, removal of unwanted data and validation and extraction of data from certain fields, will be performed on the ingested log files. Each line in the textual file will be processed separately, while log files originating from the same type of device, such as the router, will be processed together as a single lot. Due to the number and complexity of the data wrangling operations, SFI’s IT team requires a solution that makes the data wrangling logic easy to manage, such that the function of each piece of logic can be easily understood and the required piece of logic can easily be identified and changed in consideration of future requirements. Once cleansed, certain statistics need to be computed that will eventually be used for SLA compliance analysis. Due to the importance of these statistics, SFI requires the means of verifying their authenticity. Apart from the computation of statistics, the cleansed data will further be used to apply correlation, regression and clustering techniques in order to help SFI quickly find the cause of an issue or to predict if an issue is about to occur. The Big Data Processing Environment compound pattern, provided for reference purposes. Copyright © Arcitura Education Inc. v2.1 11 A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. (Note that any pattern referenced can be a core or an optional member pattern of the Big Data Pipeline compound pattern.) ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 12 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the mechanisms required by the pattern(s), as well as any other mechanism(s) not directly covered by the pattern(s), and explain their relevance. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 13 Plan Data Export The computed statistics and the results obtained from the application of statistical and machine learning techniques need to be passed to an operational dashboard that is observed by the IT support team 24/7. The dashboards are browser-based and are rendered by a reporting application that uses a relational database for populating various charts and graphs. Currently, a script is executed in an ad-hoc manner to insert data into the relational database whenever data pertinent to service monitoring becomes available. However, SFI requires that up-to-date analysis results are available via the dashboards through periodic log file import and the processing and export of computed results without requiring any human intervention. The Poly Sink compound pattern, provided for reference purposes. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. (Note that any pattern referenced can be a core or an optional member pattern of the Big Data Pipeline compound pattern.) ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the mechanisms required by the pattern(s), as well as any other mechanism(s) not directly covered by the pattern(s), and explain their relevance. Copyright © Arcitura Education Inc. v2.1 14 ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 15 Lab Exercise 12.3 (60 minutes) Alleviate Customer Dissatisfaction The next problem that the management wants the consultants to tackle is the increasing customer dissatisfaction due to longer issue-resolution time. The objective is to decrease the time it takes to resolve customer-reported service issues, which will alleviate customer dissatisfaction and increase SFI’s rating when compared with other ISPs. A customer can report an incident by calling the customer care team, sending an email, filling an online form on SFI’s website or through online chat with the customer care team agent. Once an incident is registered, first-line technical support presents the customer with a set of standard troubleshooting solutions that may or may not be relevant to the specific nature of the issue that the customer is currently facing. If unresolved, the incident is forwarded to the second-line technical support, where the team uses a combination of previous experience and going through old support incidents to find a similar incident in the past. If the incident still remains unresolved, in the case of a broadband/TV service issue, an engineer is sent to the customer’s location. This adds to SFI’s operational costs, whereas in the case of website/email issue, the incident is forwarded to third-line support. The consulting team proposes an analytics-driven solution to reduce the time it takes to successfully resolve customer service issues. The team plans to empower first-line support by providing first-line support team members with incident-specific troubleshooting information. The idea is that by providing case-specific troubleshooting information, the time it takes to find the right solution can be greatly reduced. This will further reduce support-related costs by saving money on unnecessary callouts to customers’ premises. The incident management system keeps a record of all issues raised by customers. This system uses a relational database for storing incident related data. Although the current system has been in use for the past 5 years, due to the large number of incidents that get generated and the limited storage space of the relational database, only incidents going back as far as 2 years are available. Older incidents are periodically archived by exporting the data as XML files that currently amount to around 1.5 petabytes in size. The proposed solution will employ text analytics and semantic search techniques to find similar incidents reported within the last 5 years. The matched incidents’ resolutions can then be recommended to the first-line support team members in order to achieve a targeted and timely resolution of the current incident. Furthermore, it is also planned to find the total number of similar incidents reported by customers in the past 24 hours within the same area. This will help support team members determine if it is an issue that is local to a particular customer or a more general issue. The solution will be based on the frequent querying of current incidents to find out the total number of incidents that share the same incident type. Copyright © Arcitura Education Inc. v2.1 16 Plan Data Acquisition and Storage The implementation of the solution requires current incident data from the incident management system’s relational database as well as the archived data in order to build a large-enough repository of different types of incident resolutions. Once acquired, each incident will be processed one-by-one in order to be converted into a structured form that consists of an extremely wide row of data. This structured form will then be used as an input for clustering and distance-based search techniques. In order to find the number of similar reported incidents from the past 24 hours, one of the less experienced IT team member proposes that a simple query can be run every 30 minutes that groups newly reported incidents by type and area. A quick test reveals that such a query can take up to 10 to 15 minutes to complete and that while it is executing, the entire incident management system grinds to a halt. Consequently, this option is disregarded. The consultants come up with a viable solution that will import newly reported incidents every 15 minutes from the incident management system’s relational database as a dataset. CRM data will further be required to get the area information for each reported incident. The CRM data is also stored in a relational database. The two datasets will be joined together to create a single dataset, which will be batch processed to generate per-area statistics. All imported data will be processed using a distributed processing framework. Looking at the amount of data to be imported, the IT team has stipulated that the storage footprint should remain as small as possible. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). Copyright © Arcitura Education Inc. v2.1 17 ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 18 Plan Data Processing and Export Once the 5-year-old incident data has been imported, incident id, incident details and resolution details will be extracted from each incident record. An algorithm will then be applied to each record’s incident details to convert it into a structured form that represents a large matrix. The structured form of the incident records will be subjected to a clustering algorithm to find groups with similar incident details. The clustering algorithm is a highly iterative algorithm that requires the data to be processed repeatedly. When a new incident gets reported, its incident details will be converted to the aforementioned structured form, and then it will be compared against each of the already processed historic incidents using a distance-based comparison method. The historic incident records that are at close proximity to the new incident will then be exported to the relational database of the incident management system. The entire data processing will take place on a cluster of machines. With regards to finding the total number of similar incidents reported by customers in the past 24 hours within the same area, the imported incidents dataset will be first joined together with the CRM customer dataset using customer id as the joining criteria. The joined data will then be processed in a sequential manner, such that the incidents with the same incident type and customer location fields will be grouped together. Then the total will be counted for each group. The generated totals for each area will then be forwarded to the incident management system. The total statistic will be automatically recomputed every 15 minutes once newly imported incidents data becomes available and sent to the incident management system. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms and explain how the mechanisms enable the application of each of the identified patterns. Copyright © Arcitura Education Inc. v2.1 19 ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 20 Lab Exercise 12.4 (30 minutes) Reduce Data Storage Cost SFI currently hosts a large number of a variety of websites. Some of these websites run ecommerce sites, some act as a frontend for a variety of browser-based applications, some host blogs and only a handful display static informational content that gets updated infrequently. The websites that only display static content use a file system as backend storage for the website content. However, all the other hosted websites use relational databases for storing a variety of data. Some of these websites require relational storage with ACID support for enabling transactional operations, such as order processing and payment processing operations. However, not all operations require relational storage, such as the storage of non-mutable data or update of data without strict consistency requirements (data can remain stale for some period of time). Also, most of the websites store structured data and unstructured data, such as images and videos. Semi-structured data, such as blog entries and XML data, is also stored within the relational databases. In the recent past, ecommerce and social media-driven websites have been generating very large amounts of data. To manage the increase in demand for data storage, SFI has had to add additional database servers and buy licenses, resulting in a steep increase in its IT spending. While SFI charges its customers for the amount of data stored, the charge is heavily subsidized by SFI in order to remain competitive. Although SFI can cope with the current data storage demand, the IT team envisages that the added capacity will soon hit its limit, requiring a further increase in capacity. On the other hand, some customers with technical understanding have also started demanding alternative data storage solutions that are more scalable and provide better performance. Copyright © Arcitura Education Inc. v2.1 21 Identify Alternative Data Storage Solution SFI’s IT team needs to implement a data storage solution that will help SFI cut down its spending on the provisioning of the data storage service to its customers. The IT team conducted a survey of SFI’s entire customer base and has compiled a list of the different types of data that currently reside in its relational databases. The compiled list includes: images; videos; nested data, such as invoices and emails in JSON and XML format; blog entries; product/service related comments; social media messages consisting of timestamp, user id, location and message text and customer profiles that consist of a large number of fields, some of which are grouped such as address. All data manipulation operations performed on semi-structured and unstructured data require accessing each record based on a unique key, whereas some data manipulation operations performed on semi-structured data require accessing records based on the value of non-key fields. The IT team further requires that the new storage architecture ensure data availability in the face of hardware failure and provide high performance read/write operations. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). Copyright © Arcitura Education Inc. v2.1 22 ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 23 Reading Exercise 12.5 (15 minutes) In-Class Reading and Discussion: LOC Case Study Background LOC is a large oil company that deals with the exploration, extraction, storage and refining of oil. LOC has been in operation for nearly 4 decades and consists of over 5,000 wells, both onshore and offshore, that jointly produce one-fourth of the entire country’s daily oil production. Oil is extracted from reservoirs by drilling wells. There can be multiple oil wells in a single oil field. The extracted oil is then transferred to different refineries using a network of pipelines, trucks and trains. The refined petroleum (gasoline and diesel) is then delivered to various gas stations across the country. Technical Infrastructure and Automation Environment A number of applications and information management systems make up LOC’s IT environment. Some of these systems and applications are legacy in nature and are specific to the oil industry. A range of dashboard-based applications are used for monitoring reservoirs, wells, pipelines and refinery operations. Specialized geographical information systems (GIS) are used for analyzing existing wells and exploring prospective sites for drilling new wells. High performance computing (HPC) systems are used for creating various types of models and running simulations. Details about the amount of crude oil extracted from each well, the amount of crude oil entering and leaving the refinery and various other production-related statistics are recorded in various spreadsheets. These spreadsheets are then imported into an ERP system. Financial data, such as data on various costs, volume of oil sold, profit and loss statements and balance sheets, is also stored in the ERP system. Data regarding various types of equipment and their maintenance is stored in an asset management system. An enterprise data warehouse, which is periodically updated with data from a range of applications and systems, is used to generate different reports at different intervals, such as end-of-day and end-of-month, for analyzing production and refinery operations, and profitability of LOC as well as ensuring regularity compliance. Business Goals and Obstacles A number of LOC’s oil wells are nearing the ends of their lives, due to which LOC is constantly exploring new oil reservoirs. However, LOC’s latest financial reports show an upward trend with regards to the amount spent on oil exploration with suboptimal returns. The main contributory reason is the selection of sites that either contain substandard oil or where the oil reserve contains less oil than originally predicted. Any well drilled on such substandard sites results in a loss of millions of dollars. LOC needs to find a way to select only sites that contain quality and plentiful oil reserves and accelerate its oil exploration operations in order to keep a healthy inventory of oil wells and to gain a competitive edge over other oil companies. Copyright © Arcitura Education Inc. v2.1 24 Oil exploration reports further show that new oil reserves are getting harder to find. This, coupled with the fact that LOC operates in an industry where the resource (oil) depletes over time, requires LOC to focus on making existing oil wells more profitable. To ensure profitability, LOC’s board directors emphasize on obtaining maximum return from LOC’s existing oil wells by making sure that each well is delivering maximum output. While focusing on the optimization of well operations, the board observes that maintenance costs are devouring a large portion of the profit. Unplanned repairs lead to downtime, thereby affecting the yield. The board advises the operations managers to investigate the issue in order to reduce maintenance costs. Another issue that is affecting LOC’s profitability is its inability to fully comply with the newly introduced industry regulations, due to which LOC has had to pay heavy fines on different occasions. Some main areas with regards to regulatory compliance include operational safety, environmental considerations and detailed oil production and financial reporting. In order to address its business goals and objectives, LOC needs to adopt a data-driven approach such that all of its operations and decision-making take into account all available data. To implement this approach, LOC decides to incorporate Big Data technologies and techniques. However, in the absence of any in-house Big Data skills, LOC turns to you to guide them towards achieving their goals via Big Data. Copyright © Arcitura Education Inc. v2.1 25 Lab Exercise 12.6 (60 minutes) Solution for Intelligent Oil Exploration The first issue that you have been asked to look into is how to enhance oil exploration so that only sites that can provide the best ROI are chosen. In order to design the required Big Data solution environment, you perform some preliminary analysis in terms of how the process works and the type of data involved. Oil exploration involves analyzing large amounts of rock formation data, seismic data and geospatial data. Historical reservoir data and well production data within the same area or between similar areas is further analyzed to determine the quality and quantity of the potential oil reserves. Once an oil reservoir is found, the required land is leased via bidding. The amount of the bid and the duration of the lease depend upon the predicted amount and the grade of the oil reserves and how much oil can be extracted each day. Determination of these factors takes a considerable amount of time because the engineers have to analyze, correlate and develop models from terabytes of data from different information systems, for each system specializes in handling only a specific type of dataset. The engineers believe that the process of finding the right oil reserve can be greatly expedited if all required data that needs to be analyzed is available at one place. They further believe that access to increased amount of data will help them improve the accuracy of their predictions. However, the current IT infrastructure does not provide a means for storing and analyzing large volumes of non-relational data. Based on your findings, you plan to design a repository of semi-structured and unstructured data through the implementation of an unstructured data store. Each of the next two exercises requires you to identify one or more design patterns that help towards the development of a Big Data unstructured data store. Copyright © Arcitura Education Inc. v2.1 26 The Unstructured Data Store compound pattern, provided for reference purposes. Plan Data Acquisition and Storage You start with finding the required datasets that are analyzed by the engineers. This includes country-wide borehole data, geospatial data, seismic data, historical reservoir data and well production data. The borehole dataset contains data on around 500,000 boreholes, along with their high resolution images. The borehole dataset is petabytes in volume and consists of XML and image data. The geospatial dataset contains vector and raster data. The vector data consists of the location and some general attributes of the boreholes, while the raster data comprises aerial photographs, both of which are in binary format. A seismic dataset that contains data from a large number of surveys also consists of binary data and amounts to petabytes in volume. The reservoir and well production datasets consist of data on the entire set of historical oil reservoirs and oil wells that have been drilled in the past, respectively. These two datasets are terabytes in size and consist of thousands of spreadsheets. Apart from the images, the binary data will be processed record-by-record for extracting attributes required for numerical data analyses. The borehole XML conforms to the standard format of storing borehole data, containing multiple levels of nested data for each borehole that will be parsed and stored in the original standard format. This is required because the engineers understand the borehole data only when it conforms to the standard format. After preprocessing the spreadsheets, which involves certain data Copyright © Arcitura Education Inc. v2.1 27 validation checks, records will be stored such that their structure resembles the original spreadsheet so that queries against such data can be made based on the structure that the engineers are familiar with. All processed data and images should be stored in way such that specific records that fulfill criteria can be individually retrieved. Nearly all of these datasets have been obtained from commercial exploration companies. Hence, it is imperative that they are stored in a redundant manner. It is believed that multiple teams of engineers would be evaluating different areas for potential oil reserves at the same time to increase the success rate. You keep this requirement in mind and plan to save datasets in a way such that none of the teams experience degraded data access performance. Bearing in mind LOC’s profitability, you plan to design a storage architecture that offers storage for very large amounts of data without too much investment. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. (Note that any pattern referenced can be a core or an optional member pattern of the Big Data Unstructured Data Store compound pattern.) ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 28 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 29 Plan Data Analysis The raw datasets, apart from the image data, are processed such that each record is cleansed and validated independently. The processed borehole dataset and the seismic dataset are used to determine the location of oil reservoirs. The geospatial dataset is used to analyze the location where the well needs to be drilled in order to get an idea about costs involved, such as the type of the land, how easy it is to transport equipment to the location and any structures that might need to be removed for enabling access. Once a potential oil reservoir is found, the reservoir and well production datasets are then subjected to different regression-based predictive models for predicting the quantity and quality of oil in the reservoir and finding the time period for which the reservoir will remain productive. A classification algorithm is further used for finding the type of the oil reservoir. However, before any of these advanced analysis techniques can be applied, the datasets need to be preprocessed by applying a range of data reduction techniques. The entire preprocessing involves multiple steps that need to be executed one after the other, where each step can potentially take a long time to execute. In the case of an error, the entire set of preprocessing steps needs to be executed from scratch. Sensing that this could impact the entire oil exploration data processing operation, you plan to implement a data processing strategy that does not require the re-execution of all data preprocessing steps. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. (Note that additional pattern(s) not part of the Big Data Unstructured Data Store compound pattern may also be required.) ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 30 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 31 Lab Exercise 12.7 (60 minutes) Enhance Oil Well Production Next, you are asked to help LOC in optimizing its well operations in order to obtain the maximum possible yield from each well. To design a solution, you investigate how oil wells are currently monitored. Subsurface sensors and sensors installed on the well-head take continuous measurements in the form of well logs. Gigabytes of data are continuously generated by these sensors each day. However, in the absence of a storage infrastructure that can store gigabytes of data generated by each sensor each day, readings are currently taken manually by the engineers once a day. These readings are entered into a spreadsheet. The spreadsheets are sent to the head office via FTP on a weekly basis. One of the IT team members then imports all 2500 spreadsheets received for each oil well into the ERP system via a script. Following this, queries are run against the imported data to generate various statistics, which are then made available to the engineers and business managers via different dashboards. The aforementioned process, from receiving the spreadsheets to generating the statistics, takes around 4 to 5 days. At present, this weekly import of well data coupled with the time it takes to import them into the ERP means that the engineers and the managers do not have access to the latest well production data. Any decisions taken to adjust production parameters, related to well operations, are based on stale data. Furthermore, due to storage space limitations, the ERP dashboard can show production statistics going back to 6 months only. The type of the statistics displayed in the dashboards is predetermined. If the engineers and the managers need a new set of statistics, although they understand SQL, they need to ask the IT team, for they do not have direct access to the well log data. The IT team can take up to 15 days to implement the requested changes. Having completed your investigation, you believe that the lack of up-to-date information about the operation of wells is inhibiting LOC from making the right decisions at the right time for optimizing well production. To resolve this issue, you plan to develop a Big Data solution that is capable of ingesting well logs on a daily basis from across all wells and that can process them to generate the required statistics overnight so that the latest statistics are available to the engineers and managers for analysis the very next day. Furthermore, you intend to make the raw well log data available to the engineers and the managers so that they can query the data and generate new statistics as needed. Apart from this, to enhance tactical decision-making, you aim to provide access to the previous 5 years of well logs. By looking at long-term data, more confidence can be instilled in the decision-making process. Copyright © Arcitura Education Inc. v2.1 32 Plan Well Logs Acquisition, Storage and Processing A single oil well produces around 5 gigabytes of log files each day. This amounts to around 12.5 terabytes of data from across all oil wells. For each well, different sensors record different types of readings in different logs. Although containing different types of readings, all logs are textual in nature, and each log entry consist of comma separated values on a single line. Log files from all oil wells will be imported daily and then subjected to a series of validation checks, which involve line-by-line parsing of data and then verifying if each value falls within the expected range of values. Next, the data needs to be normalized because some sensors that provide the same type of reading use different measurement units. Normalization involves converting data to the same measurement units so that data from different oil values can be aggregated later. Various statistics will be generated for each well, oil field and all oil fields. These statistics need to be saved in a way such that data for each well is uniquely identifiable. However, you discover that the calculation of some of the statistics is not possible, for some required values currently appear on two separate lines in the log file. You plan to implement a storage strategy that solves this issue. The generated statistics need to be presented to the engineers and the managers based on a graphical interface with pointand-click functionality. The well logs contain very detailed data about various operational aspects of an oil well, and it is anticipated that it may become the basis of different data-driven applications. However, it is not known in advance which technologies will be used to develop those applications. Although the managers are excited to be able to view statistics on the latest productions, they need to be sure that log data remains secure even if accidently accessed, for the well logs contain sensitive production data. Furthermore, the IT team has requested that the entire data processing only require minimum human intervention. After a brief chat with the engineers, you further decide to include a dataset from the ERP system, which uses a relational database as its storage backend, containing reference data to be used for setting threshold values for certain statistics. Copyright © Arcitura Education Inc. v2.1 33 A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 34 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, along with any additional mechanisms that are not directly required by the pattern(s) but are mandatory for the solution, and explain how they enable the application of the identified patterns. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 35 Democratize Well Logs Having set up a Big Data solution for processing well logs, you further plan to democratize well logs so that the engineers and managers can access raw log data without involving the IT team. Although the engineers and managers have SQL knowledge, they do not have any programming skills. Apart from addressing the log data querying requirement, you also find out that the raw log files may also be analyzed via oil industry-specific analysis tools. These tools use a relational database as their storage backend. You plan to enable this requirement without duplicating log files. You further notice that all of the information systems within LOC use a federated security model and that LOC’s IT team would like to extend this security model to the Big Data solution environment. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 36 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 37 Lab Exercise 12.8 (60 minutes) Reduce Maintenance Costs and Achieve Regulatory Compliance LOC’s management is really satisfied with the progress you have been making so far. They are already reaping the benefits of Big Data adoption in the form of increased profits via timely analysis of a variety of voluminous data, which LOC was unable to perform in the past. Building on your success, you start looking into LOC’s final set of business objects: reducing the cost of maintaining equipment and ensuring full compliance with the newly introduced industry regulations. Equipment is currently serviced/replaced based on predetermined intervals or when the engineers perform a visual inspection, the timing of which can vary between engineers and is normally dependent upon the experience of the engineer. The service, repair and inspection records are stored in the asset management system. An inventory of parts is kept in multiple warehouses across the country. Parts are ordered from different suppliers spread across the globe and can take up to 7 days to arrive. However, parts often fail unexpectedly, and when that happens, drilling, oil production from wells or refinery operations grind to a halt, requiring emergency part replacement. This can further create logistical problems, especially if the breakdown occurs at a remote site. With regards to the activities undertaken for assuring regulatory compliance, all types of operations, especially well drilling, need to demonstrate adherence to strict safety guidelines at all times. This is a real concern for LOC because operational safety is only maintained via infrequent physical inspections. One of the reasons behind the infrequent inspections is the remote nature of the sites. A simple incident left unchecked can result in a catastrophic accident, such as a blowout, posing grave danger to human lives as well as the surrounding environment. After a detailed consultation with the engineers and managers, you come to the conclusion that the best way to reduce LOC’s maintenance and repair costs is to develop an intelligent asset management solution based on predictive analytics that can forecast if a part is about to fail. Advance knowledge of service requirements will help the engineers schedule a service in good time before the part fails, thus reducing well or refinery downtime. A proactively planned service will further help ensure that a healthy inventory level of the required parts exists. On the other hand, for achieving full regulatory compliance, you propose the continuous monitoring of all oil wells and pipelines. Such a monitoring system will provide advance warning of any imminent issues. Furthermore, detailed data regarding all areas of operations will be kept that will become the basis for fulfilling the newly imposed regulatory requirement of detailed operational reporting. Copyright © Arcitura Education Inc. v2.1 38 Develop Predictive Maintenance Solution You plan to develop a predictive asset maintenance solution based on an unsupervised machine learning technique. The unsupervised machine learning technique will work in an offline manner and will provide information on the common reasons for part failure. The input data required by the underlying algorithm consists of millions of service, repair and inspection records stored in the asset management system, based on a relational database, that amount to approximately 250 terabytes of data. Apart from other information, each record also contains the engineers’ notes in the form of free-form text. These notes along with other information will be mined to find failure patterns for different categories of equipment, such as drill heads, pumps, valves and pipes. Once the commonly occurring reasons have been identified, a day’s worth of sensor data from all oil wells and the entire network of pipes will be gathered once a day as log files and then automatically analyzed to see if the data manifests any of the previously identified failure patterns. The entire log processing will take place without any human intervention. Due to the extremely large amount of service, repair and inspection records, you are thinking to use a machine learning algorithm that can process data in a distributed fashion. Also, the selected algorithm performs multiple passes over the data. The extracted patterns will be saved in a delimited file format. Daily sensor data log files will then be processed, where each record is compared against the already identified failure reasons. If a match is found, details about the corresponding equipment and part are forwarded to the asset management system for the engineers’ attention. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 39 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 40 Develop Continuous Asset Monitoring Solution To enable the continuous monitoring of oil wells and pipelines, you plan to design a solution that is capable of analyzing the sensor data in realtime. A number of sensors are installed on the drilling equipment, inside the wells, on the wellhead and across the entire network of pipes to provide a range of measurements, including temperature, pressure, flow rate and revolutions per minute (RPM), every 10 seconds. This data will be ingested as it gets generated by the sensors and will be analyzed instantaneously. If any of the values does not fall within the normal range of values, that value will be flagged, and the engineers will be notified instantly via alerts. After its initial analysis, the sensor data will be saved in its raw form, such that the sensor data can be queried by the managers based on different selection criteria for each type of equipment. You are further thinking of enhancing the predictive maintenance solution (designed previously) by enabling the instant analysis of incoming streams of sensor data rather than analyzing log files, which means that each incoming stream of data also needs to be forwarded to the solution that currently analyzes sensor log files for finding a match against identified failure patterns. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 41 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 42 Reading Exercise 12.9 (15 minutes) In-Class Reading and Discussion: TXC Case Study Background TXC is the local government for a large metropolis collecting taxes and providing a range of services to a population of over 15 million. Services include fire, ambulance, police, libraries, waste collection, recycling, social care, streets and parks maintenance and schools. Apart from getting subsidy from the federal government, TXC finances its services through the collection of taxes, rates and fines. It is further responsible for enforcing building regulations, urban development and maintaining electoral register. Technical Infrastructure and Automation Environment TXC’s IT environment is littered with a number of stand-alone systems. A number of legacy systems are in use, and most of the systems are based on 10 to 15-year-old technology. Two main reasons behind TXC’s slow adoption of contemporary IT solutions are excessive bureaucracy and a very long consultation period. A separate system is used for managing each service. For example, two different clientserver applications exist for managing taxes and rates. A GIS system is used for land management and to perform property-related searches. Benefits-related systems manage childcare and adult care disbursements. A range of services such as rubbish collection and recycling are provided via partner organizations. These partner organizations send invoices at the end of each month that are manually entered into a legacy accounting software that uses a proprietary database. A complaints and incident reporting application, which uses a relational database as its backend, is used to record complaints and incidents by the citizens. Different document management systems are used for keeping citizens’ personal data, property information and other documents. TXC’s website enables the online payment of taxes, rates and fines, reporting of complaints and incidents, such as potholes, and further provides information about important developments and events in the metropolis. A HR system is used for maintaining employees’ record and payroll. Spreadsheets are used as the common means of data analysis and reporting. Business Goals and Obstacles In the wake of the recent economic downturn, the federal government has introduced austerity measures. A large chunk of the subsidy offered by the federal government is not available anymore, due to which TXC needs to make massive cuts in spending and make savings. To enable this, it needs to streamline its revenue acquisition by making sure that the projected amount of revenue actually gets collected. On the other hand, TXC needs to identify processes where waste is occurring and eliminate waste. The spending cuts have forced TXC to work with a lean workforce. Although the budget has been reduced, citizens still expect the provisioning of quality services in a timely Copyright © Arcitura Education Inc. v2.1 43 manner. TXC needs to strategically allocate its thinly spread resources to make sure that the citizens are satisfied with the level of services provided. Apart from budgetary issues, TXC also needs to implement the federal government’s vision of open data access. This requirement warrants TXC to not only provide public access to a variety of datasets that it holds but also fulfill custom data requests from citizens within a constrained timeframe. TXC understands that the solution to its currently faced issues lies within full visibility and understanding across its entire set of operations. For this, TXC looks towards Big Data as a means of fulfilling its business goals. You have been brought in as the lead Big Data architect to design a solution environment based on Big Data technologies. Copyright © Arcitura Education Inc. v2.1 44 Lab Exercise 12.10 (45 minutes) Identify Fraud and Eliminate Waste TXC’s priority is to maximize its revenue collection, as past 5 years’ statistics reveal that, on average, it has only been able to collect 83% of the targeted tax and rates. Similarly, the recovery of fines, such as the collection of parking fines, has not been 100%. These discrepancies in revenue collection mean a smaller budget for providing services. Statistics further reveal that fraud within childcare and adult social care is responsible for million-dollar losses. One other major area of improvement that TXC envisages for cost savings is the mitigation of waste that not only occurs within service delivery but also within the current business practices of TXC. For example, different departments procure the same supplies from different suppliers, which, if consolidated, can result in massive savings. Last but not the least, a study conducted by the auditors has revealed that in some cases, suppliers were paid more than once, further devouring TXC’s already shrinking budget. Preliminary analysis shows that the main reason behind the aforementioned issues is the lack of cross-functional understanding of TXC’s operations and timely reporting. You believe that fraud identification and waste elimination can be achieved through a datadriven strategy that collates data from siloed applications in order to provide full and timely visibility across multiple business functions. How much tax would be charged on a building, domestic and non-domestic, depends on the information provided by the occupant. To fall into a lower band of tax, the payee provides false information, such as false information about property or annual revenue generated by a business, which is lower than the payee’s actual payment. However, TXC can only perform a limited number of physical inspections to verify the facts. The same principle applies to the payment of benefits for social care. You propose a Big Data solution that will correlate TXC’s tax and benefit records both against internal and external datasets in order to detect fraud. Furthermore, the solution will assemble data from different departments to get a unified view of TXC’s operations in order to identify opportunities for reducing waste. Copyright © Arcitura Education Inc. v2.1 45 Collate and Correlate Datasets The data held on domestic and non-domestic properties for tax/rate calculation and data regarding individuals for dispensing monthly social care payouts is stored on proprietary systems that can only export data in a delimited file format. The combined size of the files amounts to around 5 terabytes of data. You are planning to correlate this data with census, revenue and building data. The census and revenue datasets are external datasets consisting of XML data, while the building data resides in a relational database that serves as a backend for an application that keeps a record of planning applications submitted by the residents. All datasets will be individually processed in a distributed fashion to extract the required fields from each record. Records corresponding to each dataset will then be stored in a single database so that queries can be executed against disparate datasets based on common fields, such as property address and business name. To identify waste, you are thinking to start by importing procurement data from across all departments that is available in the form of spreadsheets. By collecting all procurement data in one place, inter-department queries can be executed to identify items that are commonly purchased between different departments. You anticipate that queries will be dynamic in nature and will be executed by the data analysts that can only manipulate data using SQL. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 46 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 47 Lab Exercise 12.11 (45 minutes) Prioritize Resource Allocation and Enable Open Data Access Your next task is to help TXC in the strategic deployment of its limited resources and to enable public access to a variety of datasets. A meeting is held to decide on the best course of action for deploying resources. It is suggested that, as the services are provided to the general public, it would be ideal to incorporate the public’s opinion on which services should be given priority. Some mangers suggest conducting a survey based on a sample of individuals. However, others are of the opinion that doing so would not only take a long time but may also be biased, as it will be based on the opinion of a handful of people. You step in and propose that social media data can be analyzed to find out what the public actually values more. Based on public opinion, budget and other resources can be allocated accordingly. The implementation of the open data access policy requires TXC to collate data from across different departments and make it available to the general public. However, before making the data public, certain information, such as personally identifiable data, will either need to be anonymized or completely removed. Additionally, members of the public may also request data that requires gathering specific data elements from multiple datasets based on individual request criteria. Copyright © Arcitura Education Inc. v2.1 48 Enable Social Media Data Analysis and Public Data Access You are planning to incorporate two different sources of social media data: one source provides data the moment a user sends a message to TXC (estimates show that, on average, 15,000 messages may be sent), while the other source provides user comments in the form of a delimited textual file at the end of each day with an average size of 2 gigabytes. At this stage, you are only planning to analyze social media once a day. The social media data will be processed using a distributed text analytics algorithm that extracts relevant text from each message or comment and then applies specialized text processing techniques. The results will be displayed in a purpose built dashboard that requires data in XML format. The dashboard will be automatically refreshed as new data gets incorporated each day. For enabling open data access, spreadsheets will be imported on a monthly basis to create 25 different datasets. However, these datasets will first be processed to remove certain fields and to apply anonymization logic to personally identifiable data. The processed datasets will then be exported to a webserver for FTP access. To fulfill custom data requests, the data analysts need to be able to execute different queries against these processed datasets so that they can extract the required data. You are told that the number of datasets will increase in the future, each requiring a different set of fields to be anonymized or removed. Anticipating that this can create dataset management issues, such as leaving personally identifiable data as is and instead anonymizing a field that is not required to be anonymized, you plan to implement policybased management of datasets that would enable TXC’s IT team to effectively and easily manage datasets. A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements and describe the application. ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 49 B. Illustrate the Big Data analytics logical architecture resulting from the application of the previously identified pattern(s) by identifying the required mechanisms, and explain how the mechanisms enable the application of the identified pattern(s). ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Copyright © Arcitura Education Inc. v2.1 50 Answers/Hints for Exercise 12.2 Plan Data Acquisition and Storage Patterns: x x x Poly Source o File-based Source Poly Storage o Streaming Access Storage Streaming Storage Dataset Decomposition o Random Access Storage High Volume Tabular Storage Automatic Data Sharding o Automatic Data Replication and Reconstruction Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 51 Big Data Analytics Logical Architecture: (See the Module 10 and 11: Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 52 Plan Data Processing Patterns: x Big Data Processing Environment o Large-Scale Batch Processing o Complex Logic Decomposition o Automated Processing Metadata Insertion (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Big Data Analytics Logical Architecture: Copyright © Arcitura Education Inc. v2.1 53 (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns and the fulfillment of any requirement not directly covered by these patterns.) Plan Data Export Patterns: x x Poly Sink o Relational Sink Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Big Data Analytics Logical Architecture: Copyright © Arcitura Education Inc. v2.1 54 (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns and the fulfillment of any requirement not directly covered by these patterns.) Answers/Hints for Exercise 12.3 Plan Data Acquisition and Storage Patterns: x x x Poly Source o Relational Source o File-based Source Poly Storage o Data Size Reduction o Streaming Access Storage Streaming Storage Dataset Decomposition o Random Access Storage High Volume Tabular Storage Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 55 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 56 Plan Data Processing and Export Patterns: x x x Big Data Processing Environment o Large-Scale Batch Processing o Large-Scale Graph Processing Poly Sink o Relational Sink Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns and the fulfillment of any requirement not directly covered by these patterns.) Copyright © Arcitura Education Inc. v2.1 57 Answers/Hints for Exercise 12.4 Identify Alternative Data Storage Solution Patterns: x Poly Storage o Random Access Storage High Volume Binary Storage High Volume Hierarchical Storage High Volume Tabular Storage Automatic Data Sharding o Automatic Data Replication and Reconstruction (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 58 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 59 Answers/Hints for Exercise 12.6 Plan Data Acquisition and Storage Patterns: x x Poly Source o File-based Source Poly Storage o Streaming Access Storage Streaming Storage Dataset Decomposition o Random Access Storage High Volume Binary Storage High Volume Tabular Storage High Volume Hierarchical Storage Automatic Data Sharding o Automatic Data Replication and Reconstruction o Data Size Reduction (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 60 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 61 Plan Data Analysis Patterns: x x Big Data Processing Environment o Large-Scale Batch Processing o Intermediate Results Storage Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 62 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns and the fulfillment of any requirement not directly covered by these patterns.) Copyright © Arcitura Education Inc. v2.1 63 Answers/Hints for Exercise 12.7 Plan Well Logs Acquisition, Storage and Processing Patterns: x x x x x x Poly Source o Relational Source o File-based Source Poly Storage o Streaming Access Storage o Random Access Storage o Confidential Data Storage Big Data Processing Environment o Large-Scale Batch Processing Automated Dataset Execution Canonical Data Format Dataset Denormalization (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 64 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 65 Democratize Well Logs Patterns: x x x Processing Abstraction Direct Data Access Integrated Access (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 66 Answers/Hints for Exercise 12.8 Develop Predictive Maintenance Solution Patterns: x x x x x Poly Source o Relational Source o File-based Source Poly Storage o Streaming Access Storage Streaming Storage Dataset Decomposition Big Data Processing Environment o Large-Scale Graph Processing Poly Sink o Relational Sink Automated Dataset Execution (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 67 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 68 Develop Continuous Asset Monitoring Solution Patterns: x x x x Poly Source o Streaming Source o Fan-out Ingress Poly Storage o Random Access Storage o Realtime Access Storage Big Data Processing Environment o Large-Scale Batch Processing o High Velocity Realtime Processing o Processing Abstraction Poly Sink o Streaming Egress (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 69 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 70 Answers/Hints for Exercise 12.10 Collate and Correlate Datasets Patterns: x x x Poly Source o Relational Source o File-based Source Poly Storage o Streaming Access Storage o Random Access Storage Big Data Processing Environment o Large-Scale Batch Processing o Processing Abstraction (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 71 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 72 Answers/Hints for Exercise 12.11 Enable Social Media Data Analysis and Public Data Access Patterns: x x x x x x Poly Source o File-based Source o Streaming Source Poly Storage o Streaming Access Storage o Random Access Storage Big Data Processing Environment o Large-Scale Batch Processing o Processing Abstraction Poly Sink o File-based Sink Automated Dataset Execution Centralized Dataset Governance (See the Module 10 and 11 Big Data Design Patterns supplements for pattern descriptions.) Copyright © Arcitura Education Inc. v2.1 73 Big Data Analytics Logical Architecture: (See the Module 10 and 11 Big Data Design Patterns supplements to find out how these mechanisms enable the application of the previously identified patterns.) Copyright © Arcitura Education Inc. v2.1 74