IT team, Customer Care Departments of customers’ organization.
The client wants to bring down the data processing time from the current 18 hours.
The solution provided for batch processing of certain unstructured data is Hadoop because; it can handle processing of unstructured data and also large volumes of data fast by its nature of parallelism technique. For this type of raw data and data like transactions, the backup should be crucial and so Hadoop deals with the back up as three replicas of same data as a part of security rather than secondary name node. The Hadoop approach of map and reduce makes the processing of data fast. This map-reduce processing can be achieved through JAVA, PIG and HIVE and some process named as SQOOP is a map only processor which is the easiest way of importing the data from sql. So, Hadoop is selected as a solution for this project because, it is easier to join, filter and sort the datasets and also good in processing and maintaining back-up.