Principal Engineer - High-Performance AI Infrastructure

Company: Diversity Talent Scouts
Location: San Jose
Posted on: February 15, 2026

Job Description:

Job Description Job Description As a Principal Engineer for HPC and AI Infrastructure , you’ll take a lead role in designing the low-level systems that maximize GPU utilization across large, mission-critical workloads. Working within our GPU Runtime & Systems team, you’ll focus on device drivers, kernel-level optimizations, and runtime performance to ensure GPU clusters deliver the highest throughput, lowest latency, and greatest reliability possible. Your work will directly accelerate workloads across deep learning, high-performance computing, and real-time simulation. This position sits at the intersection of systems programming, GPU architecture, and HPC-scale computing —a unique opportunity to shape infrastructure used by developers and enterprises worldwide. Key Responsibilities Build and optimize device drivers and runtime components for GPUs and high-speed interconnects. Collaborate with kernel and platform teams to design efficient memory pathways (pinned memory, peer-to-peer, unified memory). Improve data transfers across NVLink, InfiniBand, PCIe, and RDMA to reduce latency and boost throughput. Enhance GPU memory operations with NUMA-aware strategies and hardware-coherent optimizations. Implement telemetry and observability tools to monitor GPU performance with minimal runtime overhead. Contribute to internal debugging/profiling tools for GPU workloads. Mentor engineers on best practices for GPU systems development and participate in peer design/code reviews. Stay ahead of evolving GPU and interconnect architectures to influence future infrastructure design. Minimum Qualifications Bachelor’s degree in a technical field (STEM), with 10 years in systems programming, including 5 years in GPU runtime or driver development. Experience developing kernel-space modules or runtime libraries (CUDA, ROCm, OpenCL). Deep familiarity with NVIDIA GPUs, CUDA toolchains, and profiling tools (Nsight, CUPTI, etc.). Proven ability to optimize workloads across NVLink, PCIe, Unified Memory, and NUMA systems. Hands-on background in RDMA, InfiniBand, GPUDirect, and related communication frameworks (UCX). Strong C/C++ programming skills with systems-level expertise (memory management, synchronization, cache coherency). Preferred Qualifications Expertise in HPC workload optimization and GPU compute/memory tradeoffs. Knowledge of pinned memory, peer-to-peer transfers, zero-copy, and GPU memory lifetimes. Strong grasp of multithreaded and asynchronous programming patterns. Familiarity with AI frameworks (PyTorch, TensorFlow) and Python scripting. Understanding of low-level CUDA/PTX assembly for debugging or performance tuning. Experience with storage offloads (NVMe, IOAT, DPDK) or DMA-based acceleration. Proficiency with system profiling/debugging tools (Valgrind, cuda-memcheck, gdb, Nsight Compute/Systems, perf, eBPF). An advanced degree (PhD) with research in GPU systems, compilers, or HPC is a plus.

Keywords: Diversity Talent Scouts, Tracy , Principal Engineer - High-Performance AI Infrastructure, IT / Software / Systems , San Jose, California

Didn't find what you're looking for? Search again!

Let San Jose recruiters find you. Post your resume for free!

Get San Jose IT / Software / Systems jobs via email.

View more Tracy IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Senior Machine Learning Engineer
Description: At BILL, we believe in empowering the businesses that drive our economy. By replacing outdated financial processes with innovative tools, we help businesses from startups to established brands make (more...)
Company: Bill.com
Location: Campbell
Posted on: 02/6/2026

Software Senior Quality Engineer - Platform Architecture Services
Description: As a Software Quality Senior Engineer, you will be responsible for developing sophisticated systems and software based on the customer s business goals, needs, and general business environment. You (more...)
Company: Boomi
Location: Campbell
Posted on: 02/5/2026

Senior Records Management Project Manager - Remote
Description: Senior Records Management Project Manager The Position: We are seeking motivated, detail-oriented professionals with a strong background in federal records management, CUI compliance, and project leadership (more...)
Company: Censeo Consulting Group
Location: Campbell
Posted on: 02/5/2026

Salary in Tracy, California Area | More details for Tracy, California Jobs |Salary

Lead Product Marketing Manager - Remote
Description: Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data globally, nearly instantly and less expensively (more...)
Company: Circle
Location: Campbell
Posted on: 02/6/2026

Senior Software Engineer II
Description: Rockerbox with Double Verify empowers marketing executives to confidently make data-driven decisions, helping brands such as Tula, Figs, and Burton with the strategic decision-making that drives growth. (more...)
Company: DoubleVerify
Location: Campbell
Posted on: 02/5/2026

Senior Manager, Salesforce Administration
Description: What We Need Corpay is currently looking to hire a Sr. Salesforce Manager within our Corpay Operations division. This position falls under our Revenue Operations line of business and is fully remote, (more...)
Company: Corpay
Location: Campbell
Posted on: 02/5/2026

Director, Product Management - Remote
Description: We are not just another B2B solution provider. Were problem solvers. We believe that data is the key to unlocking effective solutions that span a range of marketing challenges - from customer acquisition (more...)
Company: Anteriad
Location: Campbell
Posted on: 02/6/2026

Senior Software Engineer (BE) - Payment
Description: At BILL, we believe in empowering the businesses that drive our economy. By replacing outdated financial processes with innovative tools, we help businesses from startups to established brands make (more...)
Company: Bill.com
Location: Campbell
Posted on: 02/5/2026

IT JOB Training Program
Description: Year Up United is a one-year or less, intensive job training program that provides young adults with in-classroom skill development, access to internships and/or job placement services, and personalized (more...)
Company: Year Up United
Location: Campbell
Posted on: 02/5/2026

Director, Security Architecture & Engineering
Description: You will be a key leader in Boomis cybersecurity team, responsible for the strategic direction, design, and oversight of all security architecture and engineering efforts. You will manage the Cyber Security (more...)
Company: Boomi
Location: Campbell
Posted on: 02/5/2026

Loading more jobs...

Principal Engineer - High-Performance AI Infrastructure

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account