Head of Platform/AI Cluster Management - System Integrator (San Francisco) Job at Hamilton Barnes Associates Limited, San Francisco, CA

dThVRUNxd1A0UzNrMTRDSS9GdXBZYStSQ3c9PQ==
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

AMR Clinical

Clinical Research Coordinator 1 Job at AMR Clinical

Position Overview The Clinical Research Coordinator I will perform multiple and varied tasks critical to the management of clinical research studies. The Clinical Research Coordinator I is expected to exhibit basic knowledge of clinical research activities and adhere to... 

Merry Maids

House Cleaner Job at Merry Maids

 ...Maids is a professional housecleaning company that offers the best cleaning solutions and customer service to our clients. As trusted in-...  ...employee. This Job Description does not create an employment contract, implied or otherwise, and employment with the Company remains... 

TEKsystems

Commercial Loan Closing Specialist 3 Job at TEKsystems

 ...collaboratively across departments Excellent written and verbal communication skills Preferred Qualifications: Internship or coursework related to finance, banking, or real estate Experience using CRM or loan processing platforms Ability to manage multiple tasks... 

Reed's Adventures

Remote Vacation Planner Job at Reed's Adventures

 ...About the Role: Join our team as a Remote Vacation Planner, you will behelp clients design and book memorable vacations, cruises, and resort stays. Youll provide destination guidance, manage bookings, and deliver high-quality service all from a remote setting.... 

Choice Healthcare Services

Oral Surgeon Job at Choice Healthcare Services

OverviewOral Surgeon - Pay Range: $300,000.00-$700,000.00You are a Board Eligible Oral Surgeon who prides yourself on delivering high quality care to children everywhere. You have a strong work ethic, superhero attitude, and thrive in a fast-paced environment. You work...