(IND-Pune) Data Engineer - Senior
Jobs descriptions
- Pune
- Cummins India
**Although the role category specified in the GPP is Remote, the requirement is for Hyrid.**
**Key Responsibilities** :
+ **Design&Development** : Design and automate the deployment of distributed systems for ingesting and transforming data from various types of sources (relational, event-based, unstructured).
+ **Data Quality&Integrity** : Design and implement frameworks for continuously monitoring and troubleshooting data quality and integrity issues.
+ **Data Governance** : Implement data governance processes, ensuring effective management of metadata, access, and retention for both internal and external users.
+ **ETL Pipelines** : Design and provide guidance on building reliable, efficient, and scalable data pipelines that integrate data from diverse sources using ETL/ELT tools or custom scripting.
+ **Database Optimization** : Develop and implement physical data models and optimize database performance using efficient indexing and table relationships.
+ **Cloud Data Solutions** : Create and manage large-scale data storage and processing solutions using cloud-based platforms like Azure Databricks, Data Lakes, Hadoop, and NoSQL databases (e.g., Cassandra, MongoDB).
+ **Automation&Productivity** : Leverage modern tools and techniques to automate repeatable data preparation and integration tasks, minimizing manual efforts and error-prone processes.
+ **Agile Methodologies** : Participate in agile development methodologies such as DevOps, Scrum, and Kanban to ensure timely delivery of critical analytics initiatives.
+ **Mentorship&Collaboration** : Mentor junior developers, collaborate with cross-functional teams, and contribute to the overall success of the data platform.
**RESPONSIBILITIES**
**Qualifications** **:**
**Knowledge/Skills** :
+ Proven track record in developing efficient data pipelines and mentoring junior developers.
+ Hands-on experience with Spark (Scala/PySpark), SQL, and Spark Streaming.
+ Proficient in troubleshooting and optimizing batch/streaming data pipeline issues.
+ Expertise in Azure Cloud Services (Azure Databricks, ADLS, EventHub, EventGrid, etc.).
+ Strong understanding of data models (SQL/NoSQL), including Delta Lake or Lakehouse.
+ Experience with CI/CD tools for automating deployments.
+ Knowledge of big data storage strategies, performance optimization, and database indexing.
+ Familiarity with Agile software development methodologies.
**Nice-to-Have** :
+ Understanding of the machine learning lifecycle and experience integrating ML models into data pipelines.
+ Exposure to open-source big data technologies and IoT.
+ Familiarity with building analytical solutions in cloud environments.
+ Experience in large file movement and data extraction tools.
**Education&Certifications** **:**
+ A degree in Computer Science, Engineering, Information Technology, or a related field, or equivalent relevant experience is required.
+ Additional certifications in Azure, Spark, or cloud-based data engineering solutions are a plus.
**Competencies** **:**
+ **System Requirements Engineering** : Ability to translate stakeholder needs into verifiable system requirements, ensuring alignment with project goals.
+ **Collaboration** : Strong ability to build partnerships and work effectively within cross-functional teams to achieve shared objectives.
+ **Effective Communication** : Skilled in delivering clear communications to diverse audiences, both technical and non-technical.
+ **Customer Focus** : Dedicated to building strong customer relationships and delivering solutions that meet their needs.
+ **Problem Solving** : Proficient in using systematic analysis and industry-standard methodologies to solve complex technical challenges.
+ **Data Quality** : Knowledgeable in identifying, understanding, and correcting data quality issues across operational processes.
+ **Solution Documentation&Testing** : Thorough in documenting solutions and validating them through structured testing practices to ensure they meet business requirements.
+ **Decision Making** : Able to make timely, data-driven decisions that maintain project momentum.
**QUALIFICATIONS**
**Skills&Experience** **:**
+ **Experience** :
+ 6-8 years of hands-on experience in data engineering, with a focus on building data pipelines, and working with cloud-based data solutions (preferably Azure Databricks).
+ Advanced knowledge in Spark (Scala/PySpark), SQL, and cloud platforms like Azure.
+ Familiarity with the design, development, and maintenance of large-scale data storage solutions (Hadoop, NoSQL databases, Data Lakes).
+ Experience in mentoring junior developers and working in Agile development teams.
+ **Technical Skills** :
+ Advanced proficiency in SQL and Spark.
+ Expertise in data pipeline design and automation.
+ Knowledge of cloud data services (Azure Databricks, ADLS, EventHub, EventGrid).
+ Experience with CI/CD tools for pipeline deployment automation.
+ Familiarity with big data tools like Hive, Kafka, Hbase, and the use of Delta Lake.
+ Experience in building and optimizing ETL/ELT pipelines.
**Work Schedule** **:**
This role requires collaboration with stakeholders in the US, with an expected overlap of 2-3 hours during EST working hours on an as-needed basis.
**Job** Systems/Information Technology
**Organization** Cummins Inc.
**Role Category** Remote
**Job Type** Exempt - Experienced
**ReqID** 2412312
**Relocation Package** No