SW Data Engineer
Cerebri AI
 Washington, DC
Cerebri AI is an advanced customer analytics company that uses state-of-the-art AI technologies including reinforcement learning to analyze customer touchpoints across multiple channels. Cerebri AI measures a customer’s commitment to a brand or a product, at any point in time, expressed in monetary values and derive Best Actions that drive customer commitment and financial results.

Our Series A financing was led by M12 (formerly Microsoft Ventures). To date, the team has filed 11 patents pertaining to the Cerebri AI way. We now have 60 employees in three offices in Austin, Toronto and Washington DC. Over 80% of the staff are in technical roles in data science and software engineering. Our team of senior executives averages 20+ years selling and deploying software successfully to enterprises worldwide. Cerebri AI is a proud Microsoft Partner and an active member of the Mastercard Start Path network.
To learn more, visit cerebriai.com.

“Cerebri AI was named a 2019 Cool Vendor in Artificial Intelligence for Customer Analytics by Gartner“

Role: Architect, build, test, deploy distributed, scalable, and resilient Spark/Scala/Kafka Data processing, AI, and Machine Learning data pipelines for batch, micro-batch, and streaming workloads connecting to various source systems using flat files, API, streaming, and other interfaces .

Responsibilities

  • Architect, build, test, deploy distributed, scalable, and resilient Spark/Scala/Kafka Data processing, AI, and Machine Learning data pipelines for batch, micro-batch, and streaming workloads connecting to various source systems using flat files, API, streaming, and other interfaces 
  • Deploy pipelines using scheduling and orchestration frameworks: Cron, Airlfow/Oozie or equivalent
  • Architect, build, test, deploy distributed, scalable, and resilient persistence and storage layers in support of the Data processing, AI, and Machine Learning data pipelines utilizing a combination of SQL (PostgreSQL) and No-SQL databases (Cassandra), distributed file systems (HDFS, Azure blob, AWS S3 or equivalent), Search (Elastic), and SQL Acceleration layers (Presto)   
  • Architect, build, test, deploy read-optimized data projections to support API and front-end applications, including materialized views (PostgreSQL), OLAP data cubes, Elastic Search indices, No-SQL (Cassandra) projections 
  • Deploy fully containerized Docker/Kubernetes Data processing, AI, and Machine Learning data pipelines into Azure, AWS, GCP cloud environments and on-premise systems as necessary
  • Tune database queries using best practices as required 
  • Document Detailed Designs (including source to target mappings) and Code for Data Quality frameworks that can measure and maintain Data Completeness, Data Integrity and Data Validity between interfacing systems
  • Leverage commercial and open source Big Data ecosystem tools, BI and Data Warehousing tools, AI / Machine Learning frameworks, Data Quality, Metadata Management tools, Digital SaaS platforms and APIs to deliver required solutions
  • Ensure all solutions comply with the highest levels of security, privacy, and data governance requirements as outlined by Cerebri and Client legal and information security guidelines, law enforcement, and privacy legislation, including data anonymization encryption, and security in transit and at rest, etc.
  • Train and mentor junior team members
  • Effectively leverage continuous integration, continuous development and continuous deployment agile and DevOps tools and, including Git, Jira, Jenkins and others as required
  • Acts as a Subject Matter Expert and a Thought Leader, continuously following industry trends, the latest competitive developments, and delivering papers and presentations at major industry conferences and events

Qualifications

  • A degree in Computer Science, Engineering, AI, Machine Learning, BI, MIS, or an equivalent technology field
  • Big Data developer certification, including Hadoop, Spark, Scala, Kafka, or equivalent production experience
  • 5+ years of production programming experience in Java, Scala, Spark, Kafka, Elastic Search, Python
  • 2+ years Production experience in Big Data or AI/Machine Learning solution development, including delivering Enterprise Data Lakes and Machine Learning solutions using Hadoop, Hive, Scala, Spark, Python, Java, SQL and No-SQL databases (HBase, Cassandra) and orchestration frameworks like Oozie, Airflow or equivalent
  • 5+ years of production experience in AI, ML, BI data pipeline application development, including delivering ETL/ELT and Data Warehousing solutions
  • 5+ years Production ETL, ELT development experience using BI Data Integration tools like Talend, Informatica, Pentaho, SSIS or equivalent
  • 5+ years of production experience deploying solutions using both SQL (Postgres or equivalent) and No-SQL (Cassandra or equivalent) DBMS
  • Experience deploying containerized Docker/Kubernetes applications 
  • Experience configuring and deploying cloud applications and services on Azure, AWS, or GCP
  • Streaming and micro-batch application development experience would be an asset, including Kafka, Storm, NiFi, Spark Streaming, Confluent or equivalent
  • Production systems integration experience
  • Proficiency with Linux/Unix operating systems, utilities and tools
  • Big Data application architecture experience and in-depth understanding of the Big Data ecosystem, applications, services, and design patterns
  • Production experience in BI and Big Data query performance tuning for SQL databases, Hive, Spark, LLAP, Druid
  • Experience with open source BI and visualization tools like Superset, Apache Drill or equivalent would be an asset
  • Experience with Continuous Integration, Development and Deployment methodologies and agile development tools, including source control (Git), Jira/Confluence, Sharepoint, Jenkins
  • Experience with Digital Ecosystem, including, clickstream analytics with Omniture, Audience, Digital Advertising ecosystem and digital campaign management tools (Google DBM, DCM, etc.) would be an asset