SR. DEVOPS ENGINEER

At TE, you will unleash your potential working with people from diverse backgrounds and industries to create a safer, sustainable and more connected world.

Job Overview

Singapore AI-Hub at TE is developing various applications/tools to support integration of AI in product development process. We are on an exciting journey to build and scale AI-Hub team. TE is looking for a Staff Machine Learning (ML/LLM ) Ops Engineer with experience in defining, building, productionizing, and operating ML/LLM workloads. This role is expected to provide thought leadership around architectural best practices by leveraging experience and current industry trends.
.
As a Staff Machine Learning (ML/LLM) Ops Engineer, you will be working with data scientists to deploy data science models to our cloud platform using ML and AWS technologies such as SageMaker. Along-with model training and deployment in a production environment, you will also be responsible to orchestrate all the other processes like data cleaning, preprocessing, data management, auditing, logging, monitoring, security, etc. You will use your expertise to provide recommendations around security, cost, performance, reliability, and operational efficiency to accelerate projects.

The ideal candidate is passionate about data science and stays abreast with the latest developments in the field. You will mature our machine learning engineering processes that are implemented by a toolchain and build around DevOps practices to orchestrate components of ML lifecycle.

Job Responsibilities

Understand current state architecture, including pain points.
Create and document future state architectural options to address specific issues or initiatives using Machine Learning.
Innovate and scale architectural best practices around building and operating ML workloads by collaborating with stakeholders across the organization.
Develop CI/CD & ML pipelines that help to achieve end-to-end ML model development lifecycle from data preparation and feature engineering to model deployment and retraining.
Deploy and maintain machine learning models within scalable web applications, ensuring high availability and low latency.
Implement comprehensive logging, monitoring, and alert systems to detect and address model degradation, data drift, and anomalies in production.
Provide recommendations around security, cost, performance, reliability, and operational efficiency and implement them
Provide thought leadership around the use of industry standard tools and models (including commercially available models and tools) by leveraging experience and current industry trends.
Collaborate with the Enterprise Architect, consulting partners and client IT team as warranted to establish and implement strategic initiatives.
Make recommendations and assess proposals for optimization.
Identify operational issues and recommend and implement strategies to resolve problems.

Job Requirements

B.S or Master’s in computer science, AI with 12+ years equivalent experience
5+ years of experience in developing CI/CD & ML pipelines for end-to-end ML model/workloads Productionization.
10+ years of experience in software development/ devop’s role
Strong knowledge in ML operations and DevOps workflows and tools such as Git, AWS Code Build & Code Pipeline, Jenkins, AWS CloudFormation, and others
Strong knowledge of AWS cloud and its technologies such as S3, Redshift, Athena, Glue, SageMaker etc.
Strong proficiency with containerization and orchestration technologies (Docker, Kubernetes, etc.).
Strong programming skillset with high proficiency in Python, R, etc.
Background in ML algorithm development, AI/ML Platforms, Deep Learning, ML Operations in the cloud environment.
Knowledge of LLM ops is preferable and or should be willing to learn quickly
Working knowledge of databases, data warehouses, data preparation and integration tools, along with big data parallel processing layers such as Apache Spark or Hadoop
Knowledge of pure and applied math, ML and DL frameworks, and ML techniques, such as random forest and neural networks
Ability to collaborate with Data scientist, Data Engineers, Leaders, and other IT teams
Ability to work with multiple projects and work streams at one time. Must be able to deliver results based upon project deadlines.
Willing to flex daily work schedule to allow for time-zone differences for global team communications
Strong interpersonal and communication skills

We Value

Strong problem-solving capabilities. Results oriented. Relies on fact-based logic for decision-making.
Time management skills - Ability to manage multiple projects and designs at once with changing requirements during the intake process
Experience working in an Agile environment
Ability to work in a fast paced, resource constrained environment to deliver value to business.
Certification in LLM ops or ML ops is preferred.

Location:

Singapore, 01, SG, 239920

City: Singapore

State: 01

Country/Region: SG

Travel: 10% to 25%

Requisition ID: 119646

Alternative Locations:

Function: Engineering & Technology