A leading Banking Institution has an existing system collect the account data, account details, transaction details and other account related details, processes them and generates a file which is utilised by email sending system to send out emails. Approximately 5 Million e-statements among ten are need to be processed by the existing system of the client. This process of collecting and processing the data will take around 18hours (approx.) of production batch run time which is very long.

End Users

IT team, Customer Care Departments of customers’ organization.

The Vision

The client wants to bring down the data processing time from the current 18 hours.

Project Lifecycle

The solution provided for batch processing of certain unstructured data is Hadoop because; it can handle processing of unstructured data and also large volumes of data fast by its nature of parallelism technique. For this type of raw data and data like transactions, the backup should be crucial and so Hadoop deals with the back up as three replicas of same data as a part of security rather than secondary name node. The Hadoop approach of map and reduce makes the processing of data fast. This map-reduce processing can be achieved through JAVA, PIG and HIVE and some process named as SQOOP is a map only processor which is the easiest way of importing the data from sql. So, Hadoop is selected as a solution for this project because, it is easier to join, filter and sort the datasets and also good in processing and maintaining back-up.