Principal Engineer
Company: Blackline
Location: Pleasanton
Posted on: January 24, 2026
|
|
|
Job Description:
Since being founded in 2001, BlackLine has become a leading
provider of cloud software that automates and controls the entire
financial close process. Our vision is to modernize the finance and
accounting function to enable greater operational effectiveness and
agility, and we are committed to delivering innovative solutions
and services to empower accounting and finance leaders around the
world to achieve Modern Finance. Being a best-in-class SaaS
Company, we understand that bringing in new ideas and innovative
technology is mission critical. At BlackLine we are always working
with new, cutting edge technology that encourages our teams to
learn something new and expand their creativity and technical
skillset that will accelerate their careers. Work, Play and Grow at
BlackLine! Make Your Mark: The Principal AI/ML Operations Engineer
leads the architecture, automation, and operationalization of both
machine learning and AI systems at scale. This role defines the
strategy and technical standards for ML-Ops and AIOps across the
organization, ensuring models and agents are evaluated, deployed,
governed, and monitored with reliability, efficiency, and
compliance. The candidate will collaborate across AI, data, and
product engineering teams to drive best practices for serving,
observability, automated retraining, evaluation flywheels, and
operational guardrails for AI systems in production Youll Get To:
Leadership and Strategy • Define enterprise-level standards and
reference architectures for ML-Ops and AIOps systems. • Partner
with data science, security, and product teams to set evaluation
and governance standards (Guardrails, Bias, Drift, Latency SLAs). •
Mentor senior engineers and drive design reviews for ML pipelines,
model registries, and agentic runtime environments. • Lead incident
response and reliability strategies for ML/AI systems. AI System
Deployment and Integration: • Lead the deployment of AI models and
systems in various environments. • Collaborate with development
teams to integrate AI solutions into existing workflows and
applications. • Ensure seamless integration with different
platforms and technologies. • Define and manage MCP Registry for
agentic component onboarding, lifecycle versioning, and dependency
governance. • Build CI/CD pipelines automating LLM agent
deployment, policy validation, and prompt evaluation of workflows.
• Develop and operationalize experimentation frameworks for agent
evaluations, scenario regression, and performance analytics. •
Implement logging, metering, and auditing for agent behavior,
function calls, and compliance alignment. • Create scalable
observability systems—tracking conversation outcomes, factual
accuracy, latency, escalation patterns, and safety events. •
Architect end-to-end guardrails for AI agents including prompt
injection protection, identity-aware routing, and tool usage
authorization. • Collaborate cross-functionally to standardize
authentication, authorization, and session governance for
multi-agent runtimes. Model Deployment and Integration: • Architect
and standardize model registries and feature stores to support
version tracking, lineage, and reproducibility across environments.
• Lead the deployment of machine learning models into production
environments, ensuring scalability, reliability, and efficiency. •
Collaborate with software engineers to integrate machine learning
models into existing applications and systems. • Implement and
maintain APIs for model inference. Infrastructure and Environment
Management: • Design and manage training infrastructure including
distributed training orchestration, GPU/TPU resource allocation,
and automatic scaling. • Implement CI/CD for model workflows using
pipelines integrated with model validation, bias checks, and
rollback automation. • Build standardized experimentation
frameworks for reproducible training, tuning, and deployment cycles
(MLflow, W&B, Kubeflow). • Manage and optimize the
infrastructure required for machine learning operations in cloud. •
Work closely with other teams to ensure the availability, security,
and performance of machine learning systems. Monitoring and
Maintenance: • Implement robust monitoring solutions for deployed
machine learning models to detect issues and ensure performance. •
Collaborate with data scientists and engineers to address and
resolve model performance and data quality issues. • Conduct
regular system maintenance, updates, and optimizations to ensure
optimal performance of machine learning solutions. Automation and
Orchestration: • Develop and maintain automation scripts and tools
for managing machine learning workflows. • Implement orchestration
systems to streamline the end-to-end machine learning lifecycle,
from data preparation to model deployment. Collaboration with Data
Science Teams: • Collaborate with data scientists to understand
model requirements and constraints for deployment. • Facilitate the
transition of machine learning models from research to production,
ensuring scalability and efficiency. Performance Optimization: •
Identify and implement optimizations to enhance the performance and
efficiency of machine learning models in production. • Conduct
performance analysis and implement improvements based on resource
utilization of metrics. Security and Compliance: • Implement
security measures to protect machine learning systems and data. •
Ensure compliance with regulatory requirements and industry
standards related to machine learning and data privacy. • Integrate
audit controls, metadata storage, and lineage tracking across ML
and AI workflows. • Ensure complete monitoring and feedback loops
including event logs, evaluations, and automated retraining
triggers. • Enforce secure deployment patterns with
Infrastructure-as-Code and cloud-native secrets management. •
Define SLAs, error budgets, and compliance reporting mechanisms for
ML and AI systems. What Youll Bring: Education and Experience: •
Bachelor’s or Master’s degree in Computer Science, Machine
Learning, Data Science, or a related field. • 10 years in ML
infrastructure, DevOps, and software system architecture; 4 years
in leading MLOps or AI Ops platforms. Technical Skills: • Strong
programming skills in languages such as Python, Java, or Scala. •
Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and
orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow). •
Proven experience operating production pipelines for ML and
LLM-based systems across cloud ecosystems (GCP, AWS, Azure). • Deep
familiarity with LangChain, LangGraph, ADK or similar agentic
system runtime management. • Strong competencies in CI/CD, IaC, and
DevSecOps pipelines integrating testing, compliance, and deployment
automation. • Hands-on with observability stacks (Prometheus,
Grafana, Newrelic) for model and agent performance tracking. •
Understanding of governance frameworks for Responsible AI,
auditability, and cost metering across training and inference
workloads. • Proficiency in containerization technologies (e.g.,
Docker, Kubernetes). Operations and Infrastructure: • Proficient in
scripting languages (e.g., Bash, python) for automation. •
Experience with workflow orchestration tools (e.g., Apache
Airflow). • Expertise in managing and optimizing cloud-based
infrastructure. • Familiarity with DevOps practices and tools for
automated deployment. • Understanding of network configurations and
security protocols. Problem-solving and Critical Thinking: •
Ability to define problems, collect and analyze data, and propose
innovative solutions. Strong critical thinking skills to evaluate
models, identify limitations, and • Adaptability and Learning
Agility: • Comfortable working in a fast-paced, rapidly evolving
environment. Proactive in staying up to date with the latest
trends, techniques, and technologies in AI/data science Thrive at
BlackLine Because You Are Joining: • A technology-based company
with a sense of adventure and a vision for the future. Every door
at BlackLine is open. Just bring your brains, your problem-solving
skills, and be part of a winning team at the worlds most trusted
name in Finance Automation! • A culture that is kind, open, and
accepting. Its a place where people can embrace what makes them
unique, and the mix of cultural backgrounds and varying interests
cultivates diverse thought and perspectives. • A culture where
BlackLiners continued growth and learning is empowered. BlackLine
offers a wide variety of professional development seminars and
inclusive affinity groups to celebrate and support our
diversity.
Keywords: Blackline, San Francisco , Principal Engineer, IT / Software / Systems , Pleasanton, California