← All Jobs
Posted May 11, 2026

AWS Cloud Ops SME Remote (Rockville MD) Fulltime FTE

Apply Now
Role: AWS Cloud Ops SME Location: Rockville MD (Remote) Duration: Fulltime FTE Need 8-10+ Years of experience. Required Technical Skills: • AWS, Terraform, IAC, Python • AWS Cloud Infra Management • Control Tower, Organization policies and management • Multi-Account deployment and management • AWS Backups and SSM Patching process - in detail. • AMI deployments & pushing config to multiple accounts • AWS EC2, ECS, EKS, RDS, S3, Sage Maker, CloudFront, Lambda etc... • AWS S3, SFTP and Site externalization methods. • IaC - Terraform, Cloud Formation templates and Python. • IAM polices and access management and restrictions. • AWS Networking - VPC, ALB, NLB, Transit gateways, WAF • Azure AD SSO and App Proxy. • CI/CD and basic Dev Ops • Linux OS troubleshooting, Bash & Ansible. • Any Windows AD skills would be an added advantage. Responsibilities • Oversee the management and maintenance of cloud infrastructure, ensuring high availability and reliability. Act as the primary point of contact for all Cloud infrastructure related issues and escalations. • Ensure cloud resources are optimally configured and managed to meet performance and cost objectives. • Implement and maintain monitoring solutions to track the health and performance of cloud infrastructure. • Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations. • Ensure due diligence and impact analysis for all the changes that get implemented in the cloud platforms. • Lead and mentor a team of cloud engineers and administrators, fostering a collaborative and high-performing work environment. • Provide guidance and support to team members, facilitating their professional development and growth. • Coordinate and manage the team's daily activities, ensuring alignment with organizational goals and priorities. • Lead the response to cloud-related incidents, ensuring timely resolution and minimal impact on business operations. • Develop and implement incident management processes and procedures. • Perform root cause analysis and implement preventive measures to avoid recurrence of issues. • Identify opportunities to automate repetitive tasks and processes to improve efficiency and reduce operational overhead. • Develop and implement automation scripts and tools, leveraging Infrastructure as Code (IaC) practices. • Continuously evaluate and improve cloud operations processes and procedures. • Ensure cloud infrastructure adheres to security policies, standards, and best practices. • Implement and maintain security controls to protect cloud resources and data. • Ensure compliance with regulatory requirements and industry standards (e.g., GDPR, HIPAA). • Monitor and analyze cloud resource usage, ensuring efficient utilization and avoiding over-provisioning. • Conduct capacity planning to support future growth and demand. • Implement cost management strategies to optimize cloud spending. • Develop and implement disaster recovery and business continuity plans for cloud infrastructure. • Ensure regular testing and validation of disaster recovery procedures. • Ensure cloud infrastructure is resilient and can recover quickly from failures or disruptions. • Work closely with other IT teams, business units, and stakeholders to understand requirements and deliver cloud solutions that meet their needs. • Collaborate with vendors and service providers to evaluate and integrate new cloud technologies and services. • Communicate effectively with stakeholders, providing regular updates on cloud operations and performance. • Maintain comprehensive documentation of cloud infrastructure, configurations, processes, and procedures. • Generate regular reports on cloud performance, incidents, and operational metrics. • Ensure documentation is up-to-date and accessible to relevant stakeholders. Apply Now Apply Now
Interested in this role?Apply on iHire