Head of Platform/AI Cluster Management - System Integrator (San Francisco) Job at Hamilton Barnes Associates Limited, San Francisco, CA

dThVRUNxd1A0UzNrMTRDSS9GdXBZYStSQ3c9PQ==
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

Mozaic Senior Life

Nursing Weekend Shift Program (NEW) Job at Mozaic Senior Life

 ...Come join Mozaic Senior Life RN & LPN - Weekend Shift Program Schedule is 7 pm to 7 am every weekend could float part or all of shift Work 12-hour shift and get paid for 16 hours Part time benefits eligible, including FTO Opportunity to pick up a float... 

The US Oncology Network

Advanced Practice Provider, Genetic Counseling and High Risk Breast Program Job at The US Oncology Network

 ...experienced Advanced Practice Provider (Nurse Practitioner, Physician Assistant, or Clinical Nurse Specialist) to join our expanding Genetic Counseling and High-Risk Breast Program. This APP will play a pivotal role in identifying and managing patients at increased risk for... 

Sunglass Hut

$15/hr Sales Associate - Macy's Job at Sunglass Hut

Requisition ID: 904444Store #: 007285 Sunglass Hut MACYSPosition: Casual Part-TimeTotal Rewards: Benefits/Incentive InformationAt Sunglass Hut, were always in the sun. Youll find a dynamic space with endless possibilities to grow your career. We are a culture... 

Coimbra Family Medical Center PA

Medical coder and biller Job at Coimbra Family Medical Center PA

Job Description Job Description Need a Medical coder/biller with 2 year experience, Knowledge onICD-10 Billing software, denials, coding,statements, and other office duties.

Sinclair Broadcast Group

Statewide Political Reporter Job at Sinclair Broadcast Group

 ...minimum of 2 years reporting experience Bachelor's Degree in Journalism or related field Experience with live shots Experience...  ...diversified media company and a leading provider of local news and sports. The Company owns, operates and/or provides services to 185...