HPC on AWS Specialist/ SME/Architect- REMOTE

Remote Full-time
Job Title: AWS High Performance Computing (HPC) Architect & Subject Matter Expert (SME)Overview:The AWS HPC Architect & SME is responsible for designing, implementing, and optimizing high-performance computing solutions on the AWS Cloud platform. This role combines deep technical expertise in distributed computing, data-intensive workflows, and AWS HPC services with the ability to lead architecture design sessions, define best practices, and ensure scalability, performance, and bolthires efficiency across enterprise or research workloads.Key Responsibilities:• Architect and Design: Develop scalable, high-performance architectures leveraging AWS HPC services such as AWS ParallelCluster, FSx for Lustre, EFA (Elastic Fabric Adapter), AWS Batch, and EC2 HPC instances. • Solution Implementation: Deploy, automate, and optimize HPC clusters and data pipelines for compute- and memory-intensive workloads, including modeling, simulation, genomics, CFD, AI/ML training, and financial risk analysis. • Performance Optimization: Benchmark, tune, and monitor system performance for compute, storage, and networking components to achieve optimal throughput and bolthires efficiency.• Infrastructure as Code (IaC): Implement reproducible environments using Terraform, AWS CDK, or CloudFormation to streamline provisioning, bolthires/CD, and configuration management. • Data and Storage Management: Design high-throughput parallel storage solutions using S3, FSx for Lustre, EBS, and EFS; integrate with hybrid and on-prem HPC environments. • Security and Compliance: Apply AWS Well-Architected Framework and HPC security best practices to ensure compliance with enterprise, academic, or government standards.• Collaboration and Leadership: Partner with application scientists, DevOps teams, and business stakeholders to translate workload requirements into optimized HPC architectures. Provide mentoring and technical leadership across multidisciplinary teams. • Documentation and Knowledge Sharing: Develop architecture diagrams, reference implementations, and technical playbooks to support ongoing HPC adoption and operations. Required Skills &Experience:• 8+ years of experience in high-performance computing, distributed systems, or cloud architecture.• Proven expertise in AWS HPC services (EC2 HPC, ParallelCluster, Batch, FSx for Lustre, EFA). • Strong knowledge of Linux systems administration, networking (Infiniband, EFA, MPI), and job schedulers (Slurm, Torque, PBS Pro). • Hands-on experience with automation and IaC (Terraform, Ansible, CloudFormation). • Scripting and development proficiency (Python, Bash, or similar). • Experience with monitoring tools (CloudWatch, Grafana, Prometheus) and bolthires-optimization strategies. • AWS Certified Solutions Architect –Professional or AWS CertifiedAdvanced Networking preferred.• Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field. Preferred Attributes:• Experience with GPU workloads, containerized HPC (ECS/EKS with ParallelCluster), or hybrid/on-prem to cloud HPC migrations. • Strong communication and presentation skills for executive and technical audiences. • Demonstrated thought leadership in HPC strategy, performance benchmarking, and AWS innovation. Apply tot his job
Apply Now →
← Back to Home