Senior Data Engineer
The GFP Analytics Data & Infrastructure team consists of experienced engineers and manages a suite of core data services that ingests and processes all data related to Amazon’s rapidly growing delivery fleet of vans, trucks, electric vehicles, and more. The tech stack designed, built, and operated by the team uses an event-driven architectural paradigm enabled through creative use of AWS- and Amazon-internal data services.
As a data engineer on this team, you will have the exciting opportunity to work on one of the largest vehicle based data sets in the world. You will work closely with program managers, engineers, analysts and business teams to design solutions for their respective needs, which will require creativity and strong problem solving skills. You will also help in developing data engineering roadmaps that help build platforms to solve complex problems in efficient and scalable ways. This role will work backwards from customer problems to design solutions to meet the goals.
Additionally, you will be responsible for designing, developing, and operating a data service platform using Python, Apache Spark, and SQL to build the various ETL, analytics, and data quality components. You’ll automate deployments using AWS CodeDeploy, AWS CodePipeline, AWS Cloud Development Kit (CDK), and AWS Cloud Formation. You will design and implement complex data models and build the end-to-end infrastructure for reports and dashboards to be created by our customers. You will work with AWS services like Redshift, Glue, S3, IAM, CloudWatch, and more.
This role may be located out of Nashville, Seattle, or Austin.
· Work with external data partners to establish EDI connections to ingest various datasets
· Interface with other technology teams to extract, transform, and load data from a wide variety of data sources using different AWS products such as Lambda, Glue, EMR, Kinesis Firehose
· Maintain and enhance existing data pipelines
· Create extensible designs and easy to maintain solutions with the long term vision in mind
· Interface with cross functional teams, gathering requirements and delivering data solutions
· Improve tools, processes, scale existing solutions, create new solutions as required based on team and stakeholder needs
· Degree in Computer Science, Engineering, Mathematics, or a related field or 5+ years industry experience.
· 5+ years of relevant work experience in Big Data engineering, ETL pipeline development, Data Modeling, and Data Architecture
· 5+ years of hands-on experience in writing complex, highly-optimized SQL queries across large data sets
· Knowledge of software engineering best practices across the development life cycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations
· Experience with coding languages like Python/Java/Scala
· Exposure to large databases, BI applications, data quality and performance tuning
· Excellent written and spoken communication skills
· Proficiency in the DevOps style of software deployment (infrastructure-as-code)
· Proficiency with AWS database/ETL tools including Lambda, Glue, Redshift, DynamoDB
· Proficiency with AWS technologies including SNS, SQS, SES, Route 53, Cloudwatch, VPC
· Background in Big Data, non-relational databases, Machine Learning and Data Mining is a plus