Sr DevOps Engineer

Sr DevOps Engineer

Job Title: Sr DevOps Engineer

Location: Chicago, IL or Dallas, TX (3 days – Hybrid)

Duration; 12 Months+

Role Summary:

We are seeking a highly experienced Senior DevOps Engineer (Production Support) with deep expertise in AWS, Kubernetes, CI/CD, and cloud-native platforms. This role will focus on operating, stabilizing, and continuously improving production environments, ensuring high availability, performance, and scalability of mission-critical applications.
The ideal candidate is a hands-on DevOps/SRE professional who thrives in fast-paced production environments and can automate, troubleshoot, and optimize distributed systems at scale.
You will work extensively with AWS, Kubernetes (Rancher), Jenkins, GitHub, Terraform, Kafka, Harness, and Python while partnering with engineering, platform, and product teams. 

Key Responsibilities:

Production Operations & Reliability

  • Provide L2/L3 production support for cloud-native applications running on AWS and Kubernetes.
  • Own incident triage, root cause analysis (RCA), and resolution for high-severity production issues.
  • Participate in on-call rotations and drive post-incident improvements.
  • Improve system reliability, resilience, and observability using SRE best practices. 

AWS & Cloud Infrastructure

  • Design and operate scalable AWS environments using:
  • EC2, EKS, VPC, ALB/NLB
  • S3, RDS, DynamoDB
  • IAM, CloudWatch, EventBridge
  • Optimize cloud cost, performance, and security posture.
  • Implement multi-account, multi-region architectures. 

Kubernetes & Container Platforms

  • Manage and operate Kubernetes clusters (Rancher-managed or EKS).
  • Troubleshoot:
  • Pod failures
  • Resource constraints
  • Networking issues (CNI, ingress)
  • Stateful workloads
  • Improve:
  • Autoscaling strategies
  • Cluster resilience
  • Deployment reliability 

CI/CD & Developer Enablement

  • Design and maintain CI/CD pipelines using:
  • Jenkins
  • GitHub Actions
  • Harness (preferred)
  • Implement:
  • Blue/green and canary deployments
  • GitOps workflows
  • Automated rollbacks
  • Enable developer self-service deployment platforms. 

Infrastructure as Code & Automation

  • Build and maintain infrastructure using:
  • Terraform (primary)
  • Python automation
  • Develop reusable:
  • IaC modules
  • Platform templates
  • Deployment accelerators
  • Automate provisioning, scaling, and recovery workflows. 

Kafka & Streaming Platforms

  • Design and manage Kafka infrastructure including:
  • Clusters, topics, brokers
  • Producers/consumers
  • Schema evolution
  • Ensure:
  • High availability
  • Throughput optimization
  • Secure connectivity
  • Integrate Kafka with AWS and Kubernetes ecosystems. 

Observability & Platform Health

  • Implement monitoring and alerting using:
  • CloudWatch / Splunk Observability
  • Define:
  • SLIs/SLOs
  • Alerting thresholds
  • Runbooks
  • Proactively identify bottlenecks and prevent outages. 

Security & Compliance

  • Implement DevSecOps best practices:
  • Secrets management
  • IAM least privilege
  • Container scanning
  • Supply chain security
  • Ensure infrastructure adheres to security and compliance standards. 

Collaboration & Continuous Improvement

  • Partner with development teams to:
  • Improve deployment maturity
  • Reduce operational toil
  • Increase automation coverage
  • Drive:
  • Platform standardization
  • Developer experience improvements
  • Operational excellence initiatives 

Qualifications

Experience

  • 4 – 10 years in DevOps / SRE / Production Support roles
  • Strong experience managing production-grade cloud environments
  • Proven track record handling live incident management 

Technical Skills

Must Have

  • AWS (deep hands-on)
  • Kubernetes (EKS/Rancher)
  • Splunk
  • Terraform
  • Jenkins / GitHub
  • Kafka
  • Python or Shell scripting
  • Linux systems expertise 

Good to Have

  • Harness CI/CD
  • GitOps (ArgoCD/Flux)
  • Service mesh (Istio/Linkerd)
  • Observability tools (New Relic, Datadog, Prometheus)
  • Platform engineering mindset 

Soft Skills

  • Strong troubleshooting and debugging mindset
  • Excellent communication during incidents
  • Ability to work in high-pressure production environments
  • Ownership-driven and automation-first approach 

Mandatory: Overall DevOps, AWS, Kubernetes/Helm, Terraform/Ansible, Jenkins/Harness, Python/Groovy scripting, Linux, Splunk, Production Support

Secondary: Claude Code, Rancher, DataOps, Consul, Kafka, DevSecOps

 

Employment Type:

Not setup for job in Matador

Job ID:

8863

Location:

Texas,

United States

Date Posted:

February 20, 2026

Pay Rate:

Not setup for job in Matador

Similar Jobs:

Application Form

Attach a Resume file. Accepted file types are DOC, DOCX, PDF, HTML, and TXT.
Required fields are marked *

We are uploading your application. It may take a few moments to read your resume. Please wait!

Your Industry.
Our Expertise.

From complex regulations to shifting consumer demands, we partner with insdutry leaders to build future-ready solutions that deliver mesurable results.

Your Progress.
Our Progress.

At Tek Ninjas, we combine technology depth with agile thinking to deliver scalable, secure, and AI-powered solutions.

Your Growth.
Our Knowledge.

Tek Ninjas brings you real-world learnings, fresh perspectives, and deep expertise through our curated content — crafted to help you stay ahead in an ever-evolving tech landscape.

Careers