Apply now »

Senior Machine Learning Engineer

Yerevan, AM

Join SADA, an Insight Company as a Senior Machine Learning Engineer!

The role

We are hiring a Senior ML Engineer to join our GCP AI/ML delivery practice. You'll lead the modeling work on enterprise engagements where ML models are integrated directly into multi-agent systems, with non-negotiable accuracy targets, automated promotion gates, and customer-facing acceptance milestones. The work spans data exploration, feature engineering, model development across multiple business segments, integration into agent response loops, and operating the full MLOps lifecycle on Vertex AI.

This is a hands-on senior IC role. You'll own model architectures end-to-end, commit to estimated accuracy ranges after initial data exploration, defend those commitments to customer engineering leadership, and ship into production through rigorous CI/CD.

What you will do

Build and tune anomaly detection and regression models across multiple business segments, producing tuned model instances per segment from a shared base architecture. Engineer the feature pipelines, define tolerance bands, and surface model decisions in a form a domain SME can act on.

Commit to per-segment accuracy ranges. After initial data exploration per segment, deliver a written estimated accuracy range and a recommended target threshold. Iterate to hit the agreed acceptance criterion within the budgeted iteration cycles, or co-author the recalibration recommendation if the data won't support it.

Integrate models into agent response loops. Models are not API endpoints in a vacuum, they are invoked by ADK agents through a managed gateway. You'll work with agent engineers to wire model calls into agent reasoning, write the integration tests, and own the end-to-end latency budget.

Migrate legacy ML workloads onto Vertex AI. Translate existing models from platforms such as Dataiku or SageMaker into KFP-based pipeline templates on Vertex AI. Use the migration as the proving ground for the broader MLOps platform.

Own drift detection and retraining. Configure input and output distribution shift detection and accuracy regression checks in the evaluation harness. Define per-model thresholds and cadence, wire alerting and the KFP Continuous Training (CT) workflow, and ensure retraining deploys through the standard CD pipeline with canary, approval gate, and automated rollback.

Register everything. Every model in Vertex AI Model Registry with model cards capturing ownership, lineage, evaluation results, and lifecycle state. No shadow models.

Stand up Vertex AI Feature Store. Configure Feature Store for low-latency online serving and point-in-time correctness. Be ready to argue for or against simplifying to BigQuery managed online serving based on what discovery shows.

Contribute reusable Terraform blueprints. Co-author Vertex AI and Feature Store modules and contribute them back to customer Cloud Foundation repositories as engagement deliverables.

Required experience

5+ years applied ML engineering, with at least 2 years productionizing models at scale. You have shipped models that real users or downstream systems depend on, not just notebooks.

Deep Vertex AI experience. Vertex AI Model Registry, Vertex AI Pipelines (KFP), Vertex AI Experiments, Vertex AI Feature Store, and Vertex AI Workbench. You can stand a project up from zero, not just consume an existing one.

Strong regression and anomaly detection chops. Gradient boosting (XGBoost / LightGBM / CatBoost), classical statistical anomaly methods (IQR, isolation forests, robust z-scores), and at least one deep approach (autoencoders, normalizing flows). You can defend an architecture choice with empirical results, not preferences.

Production Python. Type-annotated, tested, packaged. Comfortable with pandas, NumPy, scikit-learn, and the Vertex AI SDK. Familiar with KFP DSL for pipeline authoring.

BigQuery at intermediate level or above. Window functions, partitioning, query optimization. You can read a complex BigQuery query plan and recommend a fix.

MLOps fundamentals. CI/CD for ML (GitHub Actions or equivalent), promotion gates, automated evaluation, drift detection, and the difference between batch and online evaluation. You understand why a model registry exists and what gets lost when you skip it.

Cross-team execution. You have worked alongside data engineers, software engineers, and business SMEs in the same delivery, not in a research silo.

Customer-facing communication. You can defend a modeling commitment in a room that includes the customer's VP of engineering, a data science lead, and a domain SME, on the same call.

Strongly preferred

Hands-on with Gemini Enterprise Agent Platform. Agent Runtime, Agent Identity, Agent Sessions, Agent Memory Bank, ADK, managed MCP servers. Even early Public Preview exposure counts.

Agent integration experience. You have integrated an ML model into an agent response loop where the agent decides when to call the model based on a tool spec. Bonus for ADK, LangGraph, AutoGen, or CrewAI.

Migration projects from Dataiku or SageMaker to Vertex AI. You know what breaks in translation: feature engineering DAGs, custom transformers, scheduling, lineage.

Multi-tenant feature engineering. You have designed a single feature pipeline that serves multiple downstream models with shared and tenant-specific features.

Domain experience in supply chain, manufacturing, or financial services. Comfortable with messy enterprise data (master data hierarchies, supplier or counterparty distributions, transactional volumes) and the procurement, planning, or risk workflows that consume model outputs.

Professional services or consulting background. You have worked across multiple customer engagements and understand the difference between an internal product timeline and a customer-facing delivery milestone.

Tooling you'll use day-to-day

Gemini Enterprise Agent Platform: Agent Runtime, ADK, Agent Identity, Agent Sessions, Agent Memory Bank, Agent Registry.

Vertex AI: Model Registry, Pipelines (KFP), Experiments, TensorBoard, Feature Store, Workbench, Colab Enterprise.

Data and storage: BigQuery, Cloud Storage, Cloud SQL, Dataflow.

Networking and gateway: Apigee API Management, Application Load Balancer, Model Armor.

Source and CI/CD: GitHub, GitHub Actions, Terraform, Cloud Build.

Observability: Cloud Logging, Cloud Trace, OpenTelemetry.

Languages and frameworks: Python (primary), SQL, occasional TypeScript for tool integrations.

Selection signals

Strong signals during interviews and the technical screen:

You can walk us through a model you have shipped end-to-end including how it failed at first, what you measured, and how you fixed it.

You can articulate the difference between training-serving skew, concept drift, and data drift, and which one you would detect with which signal.

You have decided when not to use a feature store, and you can explain why.

You have owned a postmortem for a model in production. Not the writeup, the call.

What we are not looking for

This is not a research role. We are not building novel architectures from first principles. We are not training foundation models. We apply well-understood techniques to messy real-world enterprise data, tune per business segment, and ship into production with rigorous MLOps. If you would rather publish than ship, this is not the role for you.

Job Segment: Cloud, Manufacturing Engineer, Engineer, Supply Chain, Banking, Technology, Engineering, Operations, Finance

Apply now »