What Jobs are available for Devops Engineers in Malaysia?

Showing 809 Devops Engineers jobs in Malaysia

Site Reliability Engineer

Cyberjaya HCL Singapore Pte Ltd

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities

Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA/vRO, and Tanzu.

Design, implement, and maintain automation scripts and tools to improve system reliability and operational efficiency.

Provide expert-level technical support and troubleshooting for infrastructure-related issues.

Collaborate with development and operations teams to integrate CI/CD pipelines and DevOps practices.

Ensure system security, compliance, and performance through proactive monitoring and maintenance.

Document procedures, configurations, and best practices for internal knowledge sharing.

Qualifications

Proven experience as a

Senior System Engineer

or

Cloud Administrator

in enterprise environments.

Strong background in customer-facing support roles with a focus on reliability and service excellence.

Hands‑on experience managing multiple VMware products such as vCenter, vSphere, NSX, and vSAN.

Solid understanding of Linux system administration and networking fundamentals.

Familiarity with DevOps methodologies and tools including CI/CD pipelines and infrastructure as code.

Tools & Systems

Automation & Configuration Management:

Ansible

Networking & Security:

Firewalls, VPNs, VLANs, IDS/IPS

Work set-up Onsite

Application Questions

Which of the following statements best describes your right to work in Malaysia?

What's your expected monthly basic salary?

Which of the following types of qualifications do you have?

How many years' experience do you have as a Site Reliability Engineer?

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Sepang HCLTech

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA/vRO, and Tanzu.

Design, implement, and maintain automation scripts and tools to improve system reliability and operational efficiency.

Provide expert-level technical support and troubleshooting for infrastructure-related issues.

Collaborate with development and operations teams to integrate CI/CD pipelines and DevOps practices.

Ensure system security, compliance, and performance through proactive monitoring and maintenance.

Document procedures, configurations, and best practices for internal knowledge sharing.

Requirements

Proven experience as a

Senior System Engineer

or

Cloud Administrator

in enterprise environments.

Strong background in customer-facing support roles with a focus on reliability and service excellence.

Hands‑on experience managing multiple VMware products such as vCenter, vSphere, NSX, and vSAN.

Solid understanding of Linux system administration and networking fundamentals.

Familiarity with DevOps methodologies and tools including CI/CD pipelines and infrastructure as code.

Tools & Systems

Automation & Configuration Management:

Ansible

Networking & Security:

Firewalls, VPNs, VLANs, IDS/IPS

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Kuala Lumpur, Kuala Lumpur Jobstreet Malaysia

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Infrastructure & Server Design : Plan, configure, and optimize on-premise and cloud servers, including DNS, IP routing, and load balancing;

Network Management : Manage network devices, bandwidth, and traffic; implement QoS for critical applications;

System Deployment : Prepare staging or production environments, ensure scalability, backups, and coordinate application deployments;

Security & Compliance : Maintain server or network security, access control, encryption, vulnerability assessments, and incident response;

Network & Edge Architecture : Design and implement multi-CDN, low-latency delivery stacks with caching, security, streaming, and failover strategies;

Monitoring & Maintenance : Track performance and uptime, tune systems, perform preventive maintenance, and manage backups;

Data Planning & Analytics : Analyze system usage and performance, generate reports, and recommend infrastructure improvements;

Documentation & Support : Maintain diagrams or configurations, provide advanced technical support, and coordinate with vendors or data centers.

Requirements Bachelor’s degree in Computer Science, Information Technology, or a related field;

Minimum 3–5 years in IT infrastructure, system administration, or site reliability engineering;

Strong problem-solving abilities and experience with performance tuning and system optimization;

Excellent communication and teamwork skills across technical and non-technical teams;

Detail-oriented with strong documentation and reporting habits;

Proficient in English, and able to speak in Mandarin would be an advance (to deal with Mandarin speaking Management/client).

Full-Time Job (Permanent Employee);

5 Working Days / Week;

Medical Benefit;

Annual Bonus, 13th Month salary;

Increment Adjustment.

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Kuala Lumpur, Kuala Lumpur Razer Inc.

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Razer Inc. offers a

Site Reliability Engineer

role based in Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer‑centric experience that will put you in an accelerated growth, both personally and professionally.

We are seeking a skilled and driven Site Reliability Engineer (SRE) to join Razer Gold’s growing infrastructure and platform engineering team. The ideal candidate will have hands‑on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.

Job Description

Design, develop, and maintain Infrastructure as Code (IaC) using tools such as Terraform or AWS CloudFormation.

Implement and operate reliable, scalable cloud infrastructure primarily on AWS (e.g., EC2, ECS, RDS, S3, Lambda, ElastiCache, SQS, SES, Auto‑Scaling, Load Balancers).

Lead and participate in architecture reviews focusing on reliability, scalability, security, and performance.

Develop and manage robust monitoring, alerting, and logging solutions (e.g., CloudWatch, Prometheus, Grafana, ELK) to detect and resolve issues proactively.

Perform incident management, post‑mortems, root cause analysis, and implement continuous improvement strategies.

Collaborate with software engineering teams to improve CI/CD pipelines, deployment automation, and release management.

Automate infrastructure operations, reduce manual toil, and improve reliability using scripting (Python, Bash, Node.js, or Ruby).

Maintain and troubleshoot environments involving web servers, databases, firewalls, DNS, load balancers, and networking.

Ensure systems are compliant with security standards, including patching, hardening, and secure access policies.

Provide on‑call support and participate in incident rotations.

Monitor and maintain service‑level objectives (SLOs), SLAs, and error budgets to ensure reliability targets are met.

Provide support and solution handling to incidents and tickets assigned.

Requirements

Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.

Minimum 2 years of experience in SRE, DevOps, Cloud Infrastructure, or Systems Administration roles.

Solid hands‑on experience with AWS Cloud services including (but not limited to) Compute (EC2, Lambda, ECS, Auto Scaling), Networking (VPC, Load Balancers, Route 53), Messaging & Storage (SQS, S3, RDS, ElastiCache, SES), Monitoring (CloudWatch, X‑Ray).

Proficient in Infrastructure as Code using Terraform and/or CloudFormation.

Experience with CI/CD tools (e.g., GitLab CI, Jenkins, CodePipeline, ArgoCD).

Strong understanding of Linux and Windows system administration and troubleshooting.

Comfortable with one or more scripting/programming languages such as Python, Node.js, Bash, Ruby, or JSON/YAML for automation.

Strong grasp of network fundamentals, including DNS, HTTP(S), TLS/SSL, firewalls, and TCP/IP.

Experience with containerization and orchestration (Docker, ECS, or Kubernetes is a plus).

Familiar with observability tools and incident management best practices.

Pre‑Requisites Are you game?

Seniority Level

Entry level

Employment Type

Full‑time

Job Function

Engineering and Information Technology

Computers and Electronics Manufacturing

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Kuala Lumpur, Kuala Lumpur FINEXUS Group

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from FINEXUS Group

Senior Specialist, Talent Acquisition | IT Recruitment Expert @ Finexus Hiring Top Talent! Ex-BNM | Driving Excellence in Recruitment System Reliability & Operations

Ensure high availability and reliability of IT systems, applications, and PCI DSS‑certified data centres, supporting both internal operations and client‑facing platforms.

Perform system administration (Linux and Windows servers), including installation, configuration, patching, monitoring, and performance tuning.

Manage data storage, backup, and disaster recovery (DRP) to ensure data integrity, resilience, and compliance with industry standards.

Conduct capacity planning and lifecycle management of infrastructure resources, ensuring optimal performance and scalability.

Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to measure and improve reliability.

Implement chaos testing and fault‑injection practices to proactively identify weaknesses and improve system resilience.

Optimize observability and alerting systems (e.g., Prometheus, Grafana, ELK, Nagios or equivalent) to ensure actionable insights and minimal alert fatigue.

Security & Compliance

Implement and maintain system and network security controls, including firewall management, VPN, identity/access management, and endpoint security.

Ensure compliance with BNM RMiT, PCI DSS, and ISO 27001 standards, supporting internal and external audits.

Manage system logs and integrate with SIEM platforms to strengthen monitoring and incident response capabilities.

Support vulnerability management programs by coordinating with Security Operations teams for timely patching and remediation.

Participate in risk assessment and security architecture reviews, ensuring SRE practices align with compliance requirements.

Cloud, Containerization & Automation

Support and optimize hybrid cloud environments (AWS, Azure, GCP) to align with Finexus’ cloud strategy and cost efficiency.

Deploy, configure, and maintain Kubernetes clusters (SUSE Rancher Prime) and containerized workloads to improve scalability and reliability.

Build and maintain CI/CD pipelines for automated deployment, testing, and operational efficiency.

Automate configuration and patch management using tools such as Ansible, Puppet, or equivalent.

Implement Infrastructure as Code (IaC) using Terraform or equivalent for consistent and auditable environment provisioning.

Develop auto‑healing and self‑recovery automation scripts to reduce manual interventions and mean time to recovery (MTTR).

Implement cost optimization and performance monitoring for cloud and container workloads.

Networking & Core Services

Administer and troubleshoot DNS, DHCP, VPN, load balancers, and core network services to ensure smooth operations.

Support virtualization platforms (Proxmox/etc) and physical server infrastructure within Finexus data centres.

Integrate network observability tools for real‑time visibility into latency, bandwidth, and routing anomalies.

Collaborate on zero‑trust network segmentation and service mesh integration for improved security and reliability.

Monitoring & Support

Provide on‑call support on a rotational basis for production issues and incidents, ensuring rapid resolution and minimal downtime.

Collaborate with application, database, and security teams to deliver reliable, compliant, and high-performance services for clients.

Lead post‑incident reviews (PIRs) and blameless retrospectives to identify root causes and preventive actions.

Maintain runbooks and operational documentation to streamline response and improve knowledge transfer.

Leverage AIOps or event‑correlation tools to enhance proactive incident detection and reduce false positives.

Job Requirements

Bachelor’s or Master’s Degree in Computer Science, Information Technology, Engineering, or related field.

4+ years of experience in Site Reliability Engineering, System Administration, or IT Infrastructure.

Proven experience in Linux and Windows system administration.

Hands‑on experience with cloud operations (AWS, Azure, GCP) and container orchestration (Kubernetes, Rancher).

Strong knowledge of networking, firewalls, DNS, DHCP, VPN, and enterprise security best practices.

Experience in database management (MySQL, PostgreSQL, or equivalent), including backup, tuning, and recovery.

Knowledge of compliance frameworks (PCI DSS, ISO 27001, BNM RMiT) is highly desirable.

Strong problem‑solving and troubleshooting skills in mission‑critical environments.

Excellent communication skills in English and Malay (spoken and written).

Ability to work independently and collaboratively in a fast‑paced, regulated technology environment.

Experience with SRE toolchains: Prometheus, Grafana, ELK, Terraform, Ansible, Jenkins, GitLab CI/CD, or equivalent.

Possession of relevant certifications, including AWS Certified SysOps Administrator, RHCE, Kubernetes Administrator (CKA), or ISO 27001 Implementer, will be considered an added advantage.

Seniority level

Associate

Employment type

Full‑time

Job function

Engineering, Administrative, and Information Technology

Industries

Technology, Information and Media

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Kuala Lumpur, Kuala Lumpur Canonical

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

Overview

Site Reliability Engineer role at Canonical. Location: Globally remote. What we do

We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. What you will do

To succeed in this role, you need to have a strong background in Linux, Python, networking, and knowledge of how clouds work. Your work will encompass the entire stack, from bare-metal networking and kernel up to Kubernetes and open source applications. You can expect to be trained in our core technologies like OpenStack, Kubernetes, security standards, open source products like Kubeflow, Kafka, OpenSearch, databases, and many others. Automation for us is a software engineering problem that we approach with a scientific mindset to bring operations at scale, driven by metrics and code. Responsibilities

Deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. Identify and address incidents, monitor and observe applications, anticipate potential issues, and enable product refinement to achieve high-quality standards in our open source portfolio. Work across the full open source infrastructure stack from bare metal to containers. Collaborate with global teams to support mission-critical services for brand-name customers. Qualifications

Degree in software engineering or computer science Python software development experience Operational experience in Linux environments Experience with Kubernetes deployment or operations Excellent interpersonal skills, curiosity, flexibility, and accountability Ability to travel internationally twice a year, for company events up to two weeks long Bonus skills

Familiarity with OpenStack deployment or operations Familiarity with public cloud deployment or operations Familiarity with private cloud management What we offer

We offer a distributed work environment with in-person team sprints twice a year, personal learning and development budget, regular compensation reviews, and a range of benefits to reflect our values and global footprint.

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Canonical

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

Overview

Site Reliability Engineer role at Canonical. Location: Globally remote. What we do

We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. What you will do

To succeed in this role, you need to have a strong background in Linux, Python, networking, and knowledge of how clouds work. Your work will encompass the entire stack, from bare-metal networking and kernel up to Kubernetes and open source applications. You can expect to be trained in our core technologies like OpenStack, Kubernetes, security standards, open source products like Kubeflow, Kafka, OpenSearch, databases, and many others. Automation for us is a software engineering problem that we approach with a scientific mindset to bring operations at scale, driven by metrics and code. Responsibilities

Deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. Identify and address incidents, monitor and observe applications, anticipate potential issues, and enable product refinement to achieve high-quality standards in our open source portfolio. Work across the full open source infrastructure stack from bare metal to containers. Collaborate with global teams to support mission-critical services for brand-name customers. Qualifications

Degree in software engineering or computer science Python software development experience Operational experience in Linux environments Experience with Kubernetes deployment or operations Excellent interpersonal skills, curiosity, flexibility, and accountability Ability to travel internationally twice a year, for company events up to two weeks long Bonus skills

Familiarity with OpenStack deployment or operations Familiarity with public cloud deployment or operations Familiarity with private cloud management What we offer

We offer a distributed work environment with in-person team sprints twice a year, personal learning and development budget, regular compensation reviews, and a range of benefits to reflect our values and global footprint.

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in Malaysia !

Site Reliability Engineer

Kuala Lumpur, Kuala Lumpur Ampstek

Posted 12 days ago

Job Viewed

Tap Again To Close

Job Description

Ampstek Federal Territory of Kuala Lumpur, Malaysia

Site Reliability Engineer Position Summary:

We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for building scalable, reliable, and high-performance systems while ensuring continuous uptime and operational excellence. The SRE will work closely with development, DevOps, and infrastructure teams to automate processes, enhance observability, and improve system resilience.

Key Responsibilities

Design, build, and maintain highly available and scalable infrastructure across cloud and on-premise environments.

Implement monitoring, alerting, and incident response systems using tools such as Prometheus, Grafana, ELK, or Splunk.

Automate deployment, scaling, and operations using Infrastructure-as-Code (IaC) tools like Terraform, Ansible, or CloudFormation.

Drive CI/CD pipeline enhancements and ensure seamless integration and deployment workflows (e.g., Jenkins, GitLab CI, or Azure DevOps).

Collaborate with development teams to improve system reliability, observability, and performance.

Troubleshoot production issues, perform root cause analysis (RCA), and implement long-term fixes.

Manage incident response and postmortems, reducing Mean Time To Recovery (MTTR).

Work with Kubernetes/Docker environments to support microservices and containerized deployments.

Ensure robust disaster recovery and backup strategies, along with adherence to security and compliance requirements.

Must-Have Skills

Strong experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer in large-scale production environments.

Proficiency in Linux/Unix system administration and shell scripting.

Hands-on experience with cloud platforms (AWS, Azure, or GCP).

Expertise in containerization and orchestration tools such as Docker and Kubernetes

Experience with CI/CD tools (Jenkins, GitLab CI, or Azure DevOps).

Knowledge of Infrastructure-as-Code tools (Terraform, Ansible, or CloudFormation).

Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Splunk, Datadog, or New Relic).

Experience in automating repetitive tasks using Python, Bash, or Go

Seniority level

Mid-Senior level

Employment type

Contract

Job function

Information Technology

Industries

IT Services and IT Consulting

Referrals increase your chances of interviewing at Ampstek by 2x

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Petaling Jaya, Selangor GXBank

Posted 15 days ago

Job Viewed

Tap Again To Close

Job Description

Join to apply for the

Site Reliability Engineer

role at

GXBank 2 days ago Be among the first 25 applicants Join to apply for the

Site Reliability Engineer

role at

GXBank Get To Know Our Company

GX Bank Berhad - the Grab-led Digital Bank - is the FIRST digital bank in Malaysia, approved by BNM to commence operations. We aim to leverage technology and innovation to serve the financial needs of the unserved and underserved individuals, and micro and small medium enterprises. Get To Know Our Company

GX Bank Berhad - the Grab-led Digital Bank - is the FIRST digital bank in Malaysia, approved by BNM to commence operations. We aim to leverage technology and innovation to serve the financial needs of the unserved and underserved individuals, and micro and small medium enterprises.

We are driven by our shared purpose and passion to bring positive transformation to the banking industry, starting with solutions that address the financial struggles of Malaysians and businesses.

Get To Know The Role

As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. As an SRE you’ll be focused on running better production applications and systems. SRE is a key contributor to core infrastructure and functional development teams throughout the life cycle to help support software for reliability and scale. Key areas of focus include automation, application/platform uptime and quality, packaging/distribution techniques, platform design “operability”, analytics, deployment, adoption, and tool development, among others. The position will wear many hats from owning day to day health and performance, to identifying incidents/developing remediation plans, to working with open source software and experienced packaging techniques, to working with development teams and contributing to the strategic roadmap and execution. Candidates from a variety of software, platform, or automation engineering backgrounds will be considered for this position.

The Day-to-day Activities

Assist in projects across teams, learning from design through to implementation and rollout. Proactively identify and troubleshoot basic issues across the infrastructure stack and application codebase to ensure system reliability and performance under guidance. Contribute to the design and improvement of automated infrastructure, aligned with Infrastructure-as-Code (IaC) principles. Support operational excellence by identifying recurring issues and assisting in the implementation of automation to eliminate them. Collaborate with engineering teams to enhance system reliability, scalability, and performance. Actively participate in knowledge sharing and best practices within the team to support growth. Participate in on-call rotation to ensure maximum service availability.

The Must Haves

Basic knowledge of cloud infrastructure across AWS, GCP, and Azure, along with container orchestration technologies such as Kubernetes and Docker; any relevant certifications will be a plus. Some hands-on experience with Infrastructure as Code (IaC) tools including Terraform, CloudFormation, and Ansible. Familiar with observability tools such as Datadog, CloudWatch, Prometheus or ELK stack for effective monitoring and logging. Proficient in one or more scripting or programming languages, such as Bash, Python, Go, or JavaScript. Basic understanding of networking fundamentals and internet protocols Ability to learn and assist in building and maintaining CI/CD pipelines (GitLab CI/CD, Github and Jenkins). Consistently applies a strong security-first mindset across all tasks and responsibilities.

Seniority level

Seniority level Entry level Employment type

Employment type Full-time Job function

Job function Engineering and Information Technology Industries Banking Referrals increase your chances of interviewing at GXBank by 2x Get notified about new Site Reliability Engineer jobs in

Petaling Jaya, Selangor, Malaysia . Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia MYR4,000.00-MYR5,000.00 1 month ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 weeks ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 weeks ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 5 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 8 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia MYR3,800.00-MYR5,000.00 3 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 weeks ago Kota Damansara, Selangor, Malaysia 3 weeks ago Petaling Jaya, Selangor, Malaysia 2 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 6 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia MYR3,500.00-MYR4,000.00 2 weeks ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 6 months ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 weeks ago Petaling Jaya, Selangor, Malaysia 19 hours ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 4 months ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Petaling Jaya, Selangor GX Bank Berhad

Posted 15 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer page is loaded Site Reliability Engineer Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-

Get to know our Company:

GX Bank Berhad - the Grab-led Digital Bank - is the FIRST digital bank in Malaysia, approved by BNM to commence operations. We aim to leverage technology and innovation to serve the financial needs of the unserved and underserved individuals, and micro and small medium enterprises. We are driven by our shared purpose and passion to bring positive transformation to the banking industry, starting with solutions that address the financial struggles of Malaysians and businesses.

Get to know the Role: As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems.

Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation.

You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks.

As an SRE you’ll be focused on running better production applications and systems.

SRE is a key contributor to core infrastructure and functional development teams throughout the life cycle to help support software for reliability and scale.

Key areas of focus include automation, application/platform uptime and quality, packaging/distribution techniques, platform design “operability”, analytics, deployment, adoption, and tool development, among others.

The position will wear many hats from owning day to day health and performance, to identifying incidents/developing remediation plans, to working with open source software and experienced packaging techniques, to working with development teams and contributing to the strategic roadmap and execution.

Candidates from a variety of software, platform, or automation engineering backgrounds will be considered for this position.

The day-to-day activities: Assist in projects across teams, learning from design through to implementation and rollout.

Proactively identify and troubleshoot basic issues across the infrastructure stack and application codebase to ensure system reliability and performance under guidance.

Contribute to the design and improvement of automated infrastructure, aligned with Infrastructure-as-Code (IaC) principles.

Support operational excellence by identifying recurring issues and assisting in the implementation of automation to eliminate them.

Collaborate with engineering teams to enhance system reliability, scalability, and performance.

Actively participate in knowledge sharing and best practices within the team to support growth.

Participate in on-call rotation to ensure maximum service availability.

The Must Haves: Basic knowledge of cloud infrastructure across AWS, GCP, and Azure, along with container orchestration technologies such as Kubernetes and Docker; any relevant certifications will be a plus.

Some hands-on experience with Infrastructure as Code (IaC) tools including Terraform, CloudFormation, and Ansible.

Familiar with observability tools such as Datadog, CloudWatch, Prometheus or ELK stack for effective monitoring and logging.

Proficient in one or more scripting or programming languages, such as Bash, Python, Go, or JavaScript.

Basic understanding of networking fundamentals and internet protocols

Ability to learn and assist in building and maintaining CI/CD pipelines (GitLab CI/CD, Github and Jenkins).

Consistently applies a strong security-first mindset across all tasks and responsibilities.

Similar Jobs (2)

Senior Site Reliability Engineer locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago Lead Site Reliability Engineer locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago GX Bank Berhad (GXBank) is Malaysia’s first digital bank that commenced operation on 1 September 2023. With a workforce of more than 95% Malaysians from both the finance and technology sectors, the bank aims to disrupt the current banking industry with customised innovative solutions that empower Malaysians to be financially resilient and support their financial goals. Powered by Grab, GXBank is a subsidiary of GXS Bank Pte. Ltd., – the digital bank joint venture between Grab Holdings Limited and Singapore Telecommunications Limited (Singtel) – and a consortium of other Malaysian investors, including Kuok Group.

#J-18808-Ljbffr
Is this job a match or a miss?
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs