Analytics Case Studiescan@admin2020-09-05T03:56:01+00:00
Analytics Case Studies
We go the extra mile. To help you get ahead of the curve
Analytics Case Studies & Big Data Case Studies
Big data is a term that describes the large volume of data both structured and unstructured that inundates a business on a day-to-day basis. The amount of data is not important, what organizations do with the data is important. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Read more about Canspirit’s Analytics Cases, Big Data Case Studies, Big Data Project Life Cycle and Data Processing Data Engineering.
A leading Client has an existing system collect the account data, account details, transaction details and other account related details, processes them and generates a file which is utilised by email sending system to send out emails. Approximately 5 Million e-statements among ten are need to be processed by the existing system of the client. This process of collecting and processing the data will take around 18hours (approx.) of production batch run time which is very long.
End Users of Data Output
IT team, Customer Care Departments of customers’ organization.
The client wants to bring down the data processing time from the current 18 hours.
Big Data Project Lifecycle
The solution provided for batch processing of certain unstructured data is Hadoop because; it can handle processing of unstructured data and also large volumes of data fast by its nature of parallelism technique. For this type of raw data and data like transactions, the backup should be crucial and so Hadoop deals with the back up as three replicas of same data as a part of security rather than secondary name node. The Hadoop approach of map and reduce makes the processing of data fast. This map-reduce processing can be achieved through JAVA, PIG and HIVE and some process named as SQOOP is a map only processor which is the easiest way of importing the data from sql. So, Hadoop is selected as a solution for this project because, it is easier to join, filter and sort the datasets and also good in processing and maintaining back-up.