ETL Pipeline Developer, Data Engineer
Job Description: • Work collaboratively with Product Managers, Designers, and Engineers to set up, develop, and maintain critical back-end integrations for data and analytics platform. • Create and maintain new and existing data pipelines, Extract, Transform, and Load (ETL) processes, and ETL features using Azure cloud services. • Build, expand, and optimize data and data pipeline architectures. • Optimize data flow and collection for cross functional teams of database architects, data analysts, and data scientists. • Operate large-scale data processing pipelines and resolve business and technical issues pertaining to the processing and data quality. • Assemble large, complex sets of data that meet non-functional and functional business requirements. • Identify, design, and implement internal process improvements including re-designing data infrastructure for greater scalability, optimizing data delivery, and automating manual processes. • Develop and document standard operating procedures (SOPs) for new and existing data pipelines. • Build analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition. • Write unit and integration tests for all data processing code. • Read data specifications and translate them into code and design documents. Requirements: • All candidates must pass public trust clearance through the U.S. Federal Government. • Bachelor's degree in Computer Science, Software Engineering, Data Science, Statistics, or related technical field. • 8+ years of experience in software/data engineering, including data pipelines, data modeling, data integration, and data management. • Expertise in data lakes, data warehouses, data meshes, data modeling and data schemas (star, snowflake…). • Extensive experience with Azure cloud-native data services, including Synapse, Data Factory, DevOps, KeyVault, etc. • Expertise in SQL, T-SQL, and Python with applied experience in Apache Spark and large-scale processing using PySpark. • Proficiency with data formats: parquet, distributed snappy parquet, and .csv. • Understanding of common connection protocols, such as SFTP. • Proven ability to work with incomplete or ambiguous data infrastructure and design integration strategies. • Excellent analytical, organizational, and problem-solving skills. • Strong communication skills, with the ability to translate complex concepts across technical and business teams. • Proven experience working with petabyte-level data systems. Benefits: • Highly competitive salary • Full healthcare benefits Apply tot his job