About the Job
We are looking for a skilled and passionate Big Data Engineer with a strong background in data analysis and distributed computing. This role is ideal for professionals who enjoy working on high-volume data pipelines, developing scalable data solutions, and leveraging cloud-native services to drive business insights. You will work with a collaborative team of engineers and analysts to build and optimize data systems across a modern tech stack.
Education Requirements
- Bachelor’s degree in a related discipline.
Experience Requirements
- 5+ years of proven experience in Big Data Engineering or Data Analysis roles.
- Strong hands-on expertise with Apache Spark (Core Spark, Spark Streaming, DataFrames, RDD, Spark SQL) working on large-scale data processing.
- Solid experience in writing complex SQL queries and performing SQL performance tuning using Hive or Impala.
- Experience working with AWS cloud services, particularly EMR, Glue, S3, Athena, Lambda, CloudWatch, and IAM in a serverless architecture.
- Proficiency in Python for data transformation, automation, and scripting tasks.
- Working knowledge of Hadoop ecosystems and distributed data processing frameworks.
- Experience with Elasticsearch/OpenSearch and building dashboards using Kibana.
- Familiarity with job schedulers like Airflow, Autosys, or AWS Data Pipeline.
- Experience with Key/Value data stores like HBase.
- Prior exposure to cloud-based infrastructure, data security, and performance optimization is highly desirable.
Role & Responsibilities
- Design, develop, and maintain scalable big data solutions using Apache Spark, Python, and SQL.
- Build and optimize data pipelines and architectures for efficient data ingestion, transformation, and storage.
- Work extensively with AWS services including EMR, Glue, S3, Athena, Lambda, and others to support data processing in a serverless/cloud environment.
- Develop streaming and batch data processing solutions using Spark Streaming and related technologies.