Data Architect, Data Foundry
Company: Eli Lilly and Company
Location: San Francisco
Posted on: March 20, 2026
|
|
|
Job Description:
At Lilly, we unite caring with discovery to make life better for
people around the world. We are a global healthcare leader
headquartered in Indianapolis, Indiana. Our employees around the
world work to discover and bring life-changing medicines to those
who need them, improve the understanding and management of disease,
and give back to our communities through philanthropy and
volunteerism. We give our best effort to our work, and we put
people first. We’re looking for people who are determined to make
life better for people around the world. Position: Data Architect,
Data Foundry Location: San Diego, CA; San Francisco, CA; Boston,
MA; Louisville, CO; Indianapolis, IN Overview Lilly Small Molecule
Discovery is purpose-built to create molecules that make life
better for people. Discovery Technology and Platforms (DTP)
accelerates molecule discovery by building optimized foundational
platforms, streamlining lab operations through advanced
technologies and data connectivity, and investing in novel
capabilities. Data Foundry is a multidisciplinary team within DTP
that enables AI-native drug discovery through four integrated
pillars: Architecture4Insight (data infrastructure and scientific
software), Methods4Insight (analytical and computational methods),
Automation & Scale4Insight (lab automation and agentic workflows),
and Preparedness4Insight (data governance and readiness). These
pillars empower every Lilly scientist to make optimal decisions by
providing seamless access to data, insights, and AI-driven
capabilities—serving both human scientists and autonomous AI
agents. Position Summary We are seeking Data Architects at multiple
levels to design and build the data infrastructure that makes
AI-native drug discovery possible. You will create the schemas,
ontologies, data models, knowledge graphs, and platform
architectures that transform raw scientific data into
machine-actionable, FAIR-compliant, insight-ready assets—serving
both discovery scientists and autonomous AI agents. This role is
the foundation of Architecture4Insight . Everything the software
engineering team builds—pipelines, APIs, prototypes—depends on the
data models and platform architecture this team designs. You will
work with deep knowledge of scientific data (chemical, biological,
HTE, automation-generated) to create custom-fit solutions, then
partner with Tech@Lilly to scale and maintain them. The role spans
three focus areas depending on expertise: data modeling &
ontologies , data platform & lakehouse architecture , and knowledge
graph & specialized data systems . Responsibilities Data Modeling &
Ontologies Design and implement data models, schemas, and
ontologies for chemical, biological, and automation-generated data
that serve discovery workflows across the portfolio. Define and
maintain controlled vocabularies, metadata standards, and
FAIR-compliant data frameworks in partnership with
Preparedness4Insight. Implement semantic data standards (RDF, OWL,
SPARQL) and ontology engineering practices to create interoperable,
machine-readable scientific data. Data Platform & Lakehouse
Architecture Design and implement data lakehouse architecture using
modern platforms (Databricks, Snowflake, or equivalent), including
data storage patterns, partitioning strategies, and query
optimization. Build and optimize ETL/ELT pipelines using Spark,
dbt, or similar tools to transform raw scientific data into
analytical and ML-ready formats. Implement real-time and streaming
data integration (Kafka, Kinesis, event-driven patterns) connecting
LIMS, instruments, and lab automation systems to the data
infrastructure. Knowledge Graph & Specialized Data Systems Design
and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph)
that capture molecular, target, pathway, and experimental
relationships across the discovery landscape. Architect specialized
data solutions: array databases (TileDB) for genomics/imaging,
document stores (MongoDB) for experimental records, and vector
databases for embedding-based retrieval supporting ML and RAG
workflows. Build query and traversal patterns that enable
scientists and AI agents to ask relational questions across the
entire data landscape. Cross-Functional Partnership Partner with
scientific software engineers to ensure data architectures are
implementable, performant, and well-documented. Collaborate with
Methods4Insight to design data structures that support analytical
model training, deployment, and evaluation. Work with Tech@Lilly to
define scaling strategies, ensure enterprise compliance, and
transition data architectures to production-grade management.
Contribute to build-versus-buy-versus-adopt decisions by evaluating
commercial and open-source data platforms against Data Foundry
requirements. Basic Requirements B.S. or M.S. in Computer Science,
Data Science, Bioinformatics, Computational Biology, Information
Science, or related STEM field; Ph.D. valued for ontology and
knowledge graph roles. B.S. with 7 years and M.S. with 5 years of
data architecture, data engineering, or scientific informatics'
experience. SQL skills and experience in multiple database
paradigms (relational, graph, document, columnar, key-value).
Qualified applicants must be authorized to work in the United
States on a full-time basis. Lilly will not provide support for or
sponsor work authorization or visas for this role, including but
not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1,
E-3, H-1B1, or L-1. Preferred Qualifications Expertise in at least
one of: data modeling/ontologies, data platform engineering
(Databricks, Snowflake, Spark), or graph/specialized databases
(Neo4j, Neptune, MongoDB). Familiarity with cloud platforms (AWS,
Azure, or GCP) and modern data integration patterns. Understanding
of scientific data types and experimental workflows in life
sciences or pharma (chemical, biological, HTE data). Strong
communication skills with ability to translate data architecture
concepts for both technical and scientific audiences.
Pharmaceutical or biotech research industry experience,
particularly in discovery data management or research informatics.
Experience with semantic web technologies: RDF, OWL, SPARQL,
Protégé, or equivalent ontology engineering tools. Hands-on
experience with graph databases (Neo4j, Neptune, TigerGraph) and
knowledge graph design patterns for scientific data. Data lakehouse
architecture experience: Databricks (Delta Lake, Unity Catalog),
Snowflake, or equivalent; ETL/ELT with Spark, dbt. Experience with
streaming/real-time data platforms (Kafka, Kinesis, Flink) and
event-driven architectures. Familiarity with LIMS, ELN systems
(e.g., Benchling), and laboratory instrument data integration.
Experience with vector databases (Pinecone, Weaviate, pgvector) and
embedding-based retrieval for ML/RAG applications. Array database
experience (TileDB, Zarr) for genomics, imaging, or
high-dimensional scientific data. Experience with bioinformatics
data formats (FASTA, BAM/CRAM, VCF) and biological sequence
databases; familiarity with NGS data pipelines and proteomics data
management. FAIR data principles implementation experience and Data
Readiness Level frameworks. Scientific data standards and
controlled vocabularies in chemistry (InChI, SMILES) or biology
(Gene Ontology, UniProt, pathway databases such as Reactome or
KEGG). Lilly is dedicated to helping individuals with disabilities
to actively engage in the workforce, ensuring equal opportunities
when vying for positions. If you require accommodation to submit a
resume for a position at Lilly, please complete the accommodation
request form (
https://careers.lilly.com/us/en/workplace-accommodation ) for
further assistance. Please note this is for individuals to request
an accommodation as part of the application process and any other
correspondence will not receive a response. Lilly is proud to be an
EEO Employer and does not discriminate on the basis of age, race,
color, religion, gender identity, sex, gender expression, sexual
orientation, genetic information, ancestry, national origin,
protected veteran status, disability, or any other legally
protected status. Our employee resource groups (ERGs) offer strong
support networks for their members and are open to all employees.
Our current groups include: Africa, Middle East, Central Asia
Network, Black Employees at Lilly, Chinese Culture Network,
Japanese International Leadership Network (JILN), Lilly India
Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ
Allies), Veterans Leadership Network (VLN), Women’s Initiative for
Leading at Lilly (WILL), enAble (for people with disabilities).
Learn more about all of our groups. Actual compensation will depend
on a candidate’s education, experience, skills, and geographic
location. The anticipated wage for this position is $132,000 -
$193,600 Full-time equivalent employees also will be eligible for a
company bonus (depending, in part, on company and individual
performance). In addition, Lilly offers a comprehensive benefit
program to eligible employees, including eligibility to participate
in a company-sponsored 401(k); pension; vacation benefits;
eligibility for medical, dental, vision and prescription drug
benefits; flexible benefits (e.g., healthcare and/or dependent day
care flexible spending accounts); life insurance and death
benefits; certain time off and leave of absence benefits; and
well-being benefits (e.g., employee assistance program, fitness
benefits, and employee clubs and activities).Lilly reserves the
right to amend, modify, or terminate its compensation and benefit
programs in its sole discretion and Lilly’s compensation practices
and guidelines will apply regarding the details of any promotion or
transfer of Lilly employees. WeAreLilly
Keywords: Eli Lilly and Company, San Francisco , Data Architect, Data Foundry, Science, Research & Development , San Francisco, California