Director, Productivity and Production Engineering
AppZen, Inc.
 San Jose, CA
AppZen delivers the world’s leading AI platform for modern finance teams. Starting with business spend, we automate manual process, uncover problems, and optimize decision making for enterprises around the globe, including one-fourth of the Fortune 500. Our platform combines patented deep learning, computer vision, and semantic analysis with intelligence from thousands of online data sources to understand financial transactions in business context and make decisions before those transactions happen. AppZen is a must have for CFOs and their teams to reduce spend, achieve compliance, and streamline process.

We’ve taken off this year! Since we released our platform in 2016, over 1,500 enterprises have standardized on AppZen, including three of the top ten banks, four of the top ten media companies, three of the top ten pharmaceutical manufacturers, two of the top five aerospace companies, and five of the top ten software providers. We were a Gartner Cool Vendor last year, have been recognized as one of the fastest-growing technology companies in the market, and we just announced $50 million in Series C funding.

As a hands-on engineering leader for Productivity and Production Engineering you will lead, manage, and inspire engineering teams in a 24/7 environment. You will develop, manage, and operate next-generation platforms and infrastructure. You will work closely with software engineering teams building infrastructure and will focus on driving productivity, availability, reliability, and security of services using leading cloud native technologies and public cloud infrastructure.

The role requires a deep engagement with product development teams across various business units on a daily basis to help onboard, manage, troubleshoot and operate our leading AI applications. In addition to technical acumen and experience with highly scalable and reliable service infrastructure, this role requires expertise in building teams and optimizing them for execution.

Responsibilities:

  • Design, develop, and implement a strategy for 24/7 service operations
  • Provide infrastructure operations vision to enable innovation and leverage industry trends to create business value
  • Establish metrics, key performance indicators, and service level agreements for driving the performance of platforms, software, and infrastructure
  • Set standards, best practices, and measures for monitoring and optimizing deployments at scale
  • Manage the continuous integration and continuous deployment process, quality, reliability management, and automation for systems reliability and scalability
  • Accountability for all aspects of deployment and technical operations 
  • Collaborate with developers to automate, build, and deploy using latest public cloud technologies like docker, jenkins, kubernetes
  • Responsible for operational excellence in deployments, SDLC without sacrificing stability and scalability
  • Hands-on technical depth to enable direct oversight
  • Provide problem-solving leadership in capacity, change, crisis, and incident management
  • Providing implementation leadership, identifying automation opportunities and cost savings to improve service quality
  • Optimize, detect, and recover in real time by creating practices around big data operations relevant to data sciences

Skills and Requirements:

  • BS/MS in Computer Engineering, Computer Science
  • 10+ years leading globally distributed, highly-technical teams in both solid-line and matrix environments
  • Experience in managing internal infrastructure, developer tools, or distributed computing infrastructure and/or quality/release engineering
  • Excellent written and verbal communication skills in a technical and non-technical environment and dealing with ambiguity
  • Experience in building, deploying, operating complex, real-time software systems involving data and machine learning
  • Experience in real-time execution of machine learning models, and the scale concerns of such systems
  • Knowledge in large scale data systems, offline batch processing, online stream processing and queueing systems