High Performance Computing Lab Task Lead (ADV0002VL) - ERC

Engineering Research
 Houston, TX

For more than 30 years, ERC has been delivering the advantage needed to solve our customers’ and the nation’s toughest challenges. A trusted partner to the Defense and Space communities. We provide the advanced engineering, innovative technology, and subject-matter expertise to deliver the mission results our customers need to keep the nation safe and secure. Our unique solutions enhance readiness, optimize performance, and help ensure success in the air, on the ground and in space.

An essential part of our success is our corporate culture built on respect, empowerment, and collaboration. Our culture elevates our people, ensuring they provide best-in-class service and solutions to our customers. We always strive to do the right thing. We believe that are our people are our best asset and human connection our greatest strength.

Check us out on ERC.US

Your role:

The High Performance Computing Lab Task Lead will join our team with GeoControl Systems, Inc., a Jacobs Engineering teammate company.

What you’ll do all day:

The JSC Flight Sciences Laboratory (FSL) is seeking a highly qualified team member to join our team. The selected candidate will:

  • Lead its high-performance computing (HPC) laboratory.
  • The FSL is one of JSC's primary computing labs and hosts a wide variety of analyses, which support almost all of the major programs at JSC including the International Space Station (ISS), Orion, Space Launch System (SLS), Commercial Crew Program, Lunar Gateway, Human Landing System (HLS), and many others.
  • The FSL systems are currently comprised of over 700 machines, 15,000 cores and 3 PB of storage, which serve over 1000 users. Responsibilities will include identifying lab deficiencies, planning future purchases, investigating system problems, proactively monitoring system health, guiding a team of system administrators, providing inputs to management for team members' performance, and helping lab users to effectively utilize the system.
  • Perform other duties as required.

You will love this job if you:

  • Are a self-starter and require little oversight to complete given tasks
  • Enjoy a fast-paced work environment
  • Love working with diverse groups of individuals, including engineers and technicians
  • Are good at communicating with people
  • Have a positive attitude
  • Enjoy multitasking

Minimum requirements for this position:

This position has been posted at multiple levels. Depending on the candidate's experience, requirements, and business needs, we reserve the right to consider candidates at any level for which this position has been advertised.

  • BS degree in a computer or system science discipline from an accredited college or university and eight (8) years of progressive, relevant experience, or and MS degree in a computer or system science discipline and seven (7) years of progressive experience, or a Ph.D. degree in a computer or system science discipline and two (2) years or progressive experience.
  • Experience architecting and maintaining systems for an HPC lab in all areas, including:
  • Linux system administration
  • Cluster management
  • System and software configuration management
  • High speed networking
  • Resource managers and schedulers
  • High speed parallel storage
  • Monitoring and alerting
  • Experience supporting and managing a group of Linux system administrators.
  • Experience implementing, maintaining, and verifying defined security policies.
  • To be willing to maintain a flexible work schedule.
  • A positive attitude and willingness to help enable the lab users for success.
  • Excellent guidance and teamwork skills.

Desired skills and experience for this position:

Preferred familiarity with:

  • RedHat-based systems
  • OpenHPC
  • InfiniBand
  • IPA
  • Lustre architecture and maintenance
  • Provisioners (xCAT, warewulf)
  • Configuration management software (ansible, puppet)
  • SLURM resource manager
  • SPACK software manager
  • Log consolidation and monitoring
  • Git/Gitlab and software development (CI/CD)
  • Johnson Space Center campus network
  • NASA security mechanisms (security plans, POAMs, ATOs, Risk Assessments).

What we offer:

  • Competitive salaries.
  • Continuing education assistance.
  • Professional development allotment.
  • Multiple healthcare benefit packages.
  • 401K with employer matching.
  • Paid time off (PTO) along with federally recognized holiday schedule.