Job title: Big Data Engineer
Company: Trigent Software Limited
Job description: Roles and Responsibilities Hiring for KPMG Location Gurgoan Experience 3.5 To 6 Years Skill : Python,SQL,Big Data Data Engineer Big Data: Consultant Role & ResponsibilitY * Evaluating, developing, maintaining and testing big data solutions for
Roles and Responsibilities Hiring for KPMG Location Gurgoan Experience 3.5 To 6 Years Skill : Python,SQL,Big Data Data Engineer Big Data: Consultant Role & ResponsibilitY Evaluating, developing, maintaining and testing big data solutions for advanced analytics projects. Design and implement big data platform with components like Apache Spark, HBASE, Hive, Impala, PIG, Oozie etc Develop ETL pipelines for processing large volumes of data using Spark Framework using Spark (Scala/Java/Python). Gather and process raw data at scale (including writing scripts, write SQL queries, etc.) to build features that will be used in modelling. Responsible for ensuring data processing pipelines and systems are: secure, reliable, fault-tolerant, scalable, accurate and efficient Cleaning data as per business requirements using streaming APIs or user defined functions. Design and implement column family schemas of Hive and HBase within HDFS, assign schemas and create Hive tables. Develop efficient Hive scripts with joins on datasets using various techniques. Responsible for infrastructure that provides insight from raw data and handles diverse sources of data seamlessly. Help in performance tuning of the platform, Hive queries etc. Development of highly scalable, performance efficient APIs/Solutions/Microservices to enable downstream availability of data for various applications. THE INDIVIDUAL Excellent problem-solving skills in any object-oriented/ functional scripting languages: Java, Scala, Python / R. Strong development experience with Apache Spark and its components (Core API, SparkSQL , Streaming) Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN; MR & HDFS) and associated technologies — one or more of Hive, Sqoop, Kafka Flume, Oozie, Zookeeper, etc. Knowledge of different HDFS file formats ORC, AVRO, Parquet etc Experience with building stream-processing systems, using solutions such as Spark-Streaming, Flume, Kafka etc. Experience in developing and deployment Spark jobs in Scala / Java / Python to cluster programmatically. Experience with Hive Tuning, Bucketing, and Partitioning and create UDFs, UDAFs as per business needs. Technical expertise regarding data models, data analytics, Big data, database design development, data mining and segmentation techniques Experience on fine tuning and optimizing spark jobs, joins, map reduce jobs for performance. Experience with working on any of the major Hadoop distributions (Cloudera, HortonWorks etc) For those with Data Science skills: Familiarity with machine learning frameworks (like Keras or PyTorch) and libraries (like scikit-learn) Experience in Machine Learning techniques such as Forecasting, Classification, Clustering, Text Mining, Decision Trees, Random Forest and Search algorithms Understand the model lifecycle of cleansing/standardizing raw data, feature creation/selection, writing complex transformation logic to generate independent and dependent variables, model selection, tuning, A/B testing and generating production ready code. Experience in deploying Machine Learning models onto Spark clusters (or any distributed data processing engine). Familiarity with R, PySpark, PyMC3/Theano/Tensorflow and other scientific Python/R libraries / frameworks. Qualification BE/BTech/MCA 3-6 years of strong experience in 3-4 of the above-mentioned skills. Desired Candidate Profile Perks and Benefits
Job date: Tue, 12 Jan 2021 23:33:33 GMT
Apply for the job now!