Senior Data Engineer - Python, PySpark
5+ years
Full-time (40h)
Software Development
Full Remote
Python
PySpark
Azure
Databricks
Requirements
Must-haves
- 5+ years of data engineering experience
- Proficiency in Python and PySpark
- Azure experience
- Databricks experience
- Experience with B2B solutions in areas such as CPG, retail, or supply chain
- Experience designing and deploying large-scale data management systems
- Experience with optimization and AI-enabled industries
- Deep understanding of big data technologies and platforms
- Ability to work collaboratively in a cross-functional team environment
- Strong communication skills in both spoken and written English
- Bachelor's Degree in Computer Engineering, Computer Science, or equivalent
Nice-to-haves
- Data Science background
- Experience with data warehousing and ETL technologies (e.g. Airflow, Redshift, Snowflake)
- AWS and/or GCP experience
- Startup experience
What you will work on
- Create and maintain data integration, ETL pipelines, and data warehouse structures
- Develop scalable, available, fault-tolerant, data management systems that support AI models and analytics applications
- Testing and validation to establish results that downstream users and processes can confidently depend on
- Implement strong and adaptable data pipelines that support immediate decision-making
- Develop reusable documentation to facilitate knowledge transfer among teams
- Collaborate with cross-functional teams including Data Scientists, Software Engineers, Product Managers, and Customer Success to identify and solve problems concerning data quality, ingestion, and integration
- Help drive innovation and best practices
- Advise on tools to improve data infrastructure's performance, scalability, and security