LEAD DEVOPS
Job Overview
We are on an exciting journey to build and scale our advanced analytics practice. TE is looking for a Senior Machine Learning (ML) Ops Engineer with experience in defining, building, productionizing, and operating ML workloads. This role is expected to provide thought leadership around architectural best practices by leveraging experience and current industry trends.
As a Senior Machine Learning (ML) Ops Engineer, you will be working with data scientists to deploy data science models to our cloud platform using ML and AWS technologies such as SageMaker. Along-with model training and deployment in a production environment, you will also be responsible to orchestrate all the other processes like data cleaning, preprocessing, data management, auditing, logging, monitoring, security, etc. You will use your expertise to provide recommendations around security, cost, performance, reliability, and operational efficiency to accelerate projects.
The ideal candidate is passionate about data science and stays abreast with the latest developments in the field. You will mature our machine learning engineering processes that are implemented by a toolchain and build around DevOps practices to orchestrate components of ML lifecycle.
Job Responsibilities
- Understand current state architecture, including pain points.
- Create and document future state architectural options to address specific issues or initiatives using Machine Learning.
- Innovate and scale architectural best practices around building and operating ML workloads by collaborating with stakeholders across the organization.
- Develop CI/CD & ML pipelines that help to achieve end-to-end ML model development lifecycle from data preparation and feature engineering to model deployment and retraining.
- Deploy and maintain machine learning models within scalable web applications, ensuring high availability and low latency.
- Implement comprehensive logging, monitoring, and alert systems to detect and address model degradation, data drift, and anomalies in production.
- Provide recommendations around security, cost, performance, reliability, and operational efficiency and implement them
- Provide thought leadership around the use of industry standard tools and models (including commercially available models and tools) by leveraging experience and current industry trends.
- Collaborate with the Enterprise Architect, consulting partners and client IT team as warranted to establish and implement strategic initiatives.
- Make recommendations and assess proposals for optimization.
- Identify operational issues and recommend and implement strategies to resolve problems.
Job Requirements
- B.S or Master’s in computer science, AI with 5+ years equivalent experience
- 5+ years of experience in developing CI/CD & ML pipelines for end-to-end ML model/workloads Productionization.
- Strong knowledge in ML operations and DevOps workflows and tools such as Git, AWS Code Build & Code Pipeline, Jenkins, AWS CloudFormation, and others
- Strong knowledge of AWS cloud and its technologies such as S3, Redshift, Athena, Glue, SageMaker etc.
- Strong proficiency with containerization and orchestration technologies (Docker, Kubernetes, etc.).
- Strong programming skillset with high proficiency in Python, R, etc.
- Background in ML algorithm development, AI/ML Platforms, Deep Learning, ML Operations in the cloud environment.
- Knowledge of LLM ops is preferable and or should be willing to learn quickly
- Working knowledge of databases, data warehouses, data preparation and integration tools, along with big data parallel processing layers such as Apache Spark or Hadoop
- Knowledge of pure and applied math, ML and DL frameworks, and ML techniques, such as random forest and neural networks
- Ability to collaborate with Data scientist, Data Engineers, Leaders, and other IT teams
- Ability to work with multiple projects and work streams at one time. Must be able to deliver results based upon project deadlines.
- Willing to flex daily work schedule to allow for time-zone differences for global team communications
- Strong interpersonal and communication skills
We Value
- Strong problem-solving capabilities. Results oriented. Relies on fact-based logic for decision-making.
- Time management skills - Ability to manage multiple projects and designs at once with changing requirements during the intake process
- Experience working in an Agile environment
- Ability to work in a fast paced, resource constrained environment to deliver value to business.
- Certification in LLM ops or ML ops is preferred.
Singapore, 01, SG, 239920
Job Segment:
Computer Science, Engineer, Consulting, Technology, Engineering