Selected systems / technical notes

Selected systems and technical notes.

Representative examples across production AI, data platforms, analytics, privacy-aware infrastructure, commerce automation, and ML data systems.

These notes are based on public career history and sanitized project descriptions. Company names provide employment context only; confidential implementation details are omitted.

Technical Focus

Production AI and data systems

Recurring problem spaces from applied engineering work: governed analytics, durable data platforms, reliable LLM workflows, self-service reporting, and commerce data systems.

Applied AI and Analytics Systems

Production AI and analytical systems that turn ambiguous business, editorial, commerce, and operational signals into governed, reviewable decisions.

  • Conversational analytics, anomaly investigation, classification, and defect-detection workflows
  • LLM workflows with structured outputs, guardrails, review gates, and observable failure modes
  • Retrieval, classification, multimodal verification, and privacy-aware NLP patterns

Data Platform Strategy

Practical architecture and sequencing for fragmented pipelines, warehouse models, and ownership boundaries.

  • Architecture and sequencing across BigQuery, dbt, Airflow, streaming, and PySpark
  • Data quality, lineage, and model contracts for shared datasets
  • Operating model for roadmap, support, and stakeholder intake

LLM Workflow Reliability

Auditable AI workflows with deterministic checks, structured outputs, tool use, human review, and production observability.

  • Guardrails for confidence, safety, retries, and failure investigation
  • Evaluation harnesses for SQL generation, anomaly narratives, page verification, and classification
  • Review gates for multimodal, classification, agentic, and privacy-sensitive systems

Analytics Engineering and Self-Service

Semantic models, dashboard contracts, and enablement practices that let analysts and business teams answer repeat questions with less ad hoc support.

  • Reusable dbt models and metrics with tested definitions
  • Self-service paths that keep sensitive logic governed
  • Training and documentation for engineers, analysts, and operators

Commerce and Operational Data Systems

Commerce and operational reporting systems where inconsistent source signals, delayed updates, and reconciliation make the data hard to trust.

  • Reconciliation tests between source systems and reporting contracts
  • Data models that balance explainability, operational ownership, and reporting trust
  • Workflow checks that make source-system drift visible before it compounds

Selected Work

Production systems across AI, data, and ML infrastructure

The examples below emphasize constraints, architecture, role, and outcomes rather than confidential implementation detail.

AI Analytics

Conversational Analytics Agent

A governed natural-language analytics agent that turns business questions into validated BigQuery analysis across Ads, Editorial, Commerce, and operations workflows.

Context
At Hearst Magazines, business teams needed faster analytical answers from shared warehouse data without bypassing metric ownership, access expectations, or reviewable SQL.
Problem
Naive question-to-SQL was not enough: schema names were ambiguous, metric definitions lived across multiple layers, and users expected follow-up questions, not one-shot query generation.
Constraints
The workflow had to stay governed: no unsafe SQL execution, no unverified answers, and graceful handling of zero-result or ambiguous questions.
Architecture
Built a LangGraph workflow on Vertex AI/Gemini and BigQuery with schema metadata retrieval using embeddings, keyword search, business-term matching, and fuzzy search. Added dry-run validation, SQL safety checks, retries, error-driven tool use, zero-result investigation, and AI judge verification.
Role
Led the architecture and implementation path from prototype behavior to a governed internal workflow, including retrieval design, agent state, validation loops, and single-turn and multi-turn interaction patterns.
Outcome
Conceived and built the initial workflow, then expanded it from prototype behavior into a governed analytics agent used across Ads, Editorial, and Commerce workflows while preserving source-of-truth data boundaries and answer traceability.
Demonstrates
One slice of broader applied AI work: metadata-grounded retrieval, SQL safety, stateful agents, evaluation discipline, and governed self-service analytics.

Focus

  • AI analytics
  • conversational BI
  • SQL safety
  • semantic retrieval
  • evals

Stack

  • LangGraph
  • Vertex AI/Gemini
  • BigQuery
  • embeddings
  • schema metadata
Applied AI Systems

AI Anomaly Analysis and Classification Systems

Applied AI and statistical analysis workflows that classify content, investigate performance anomalies, and surface operational changes with structured outputs, rules, and reviewable evidence.

Context
At Hearst Magazines, editorial, commerce, and business teams needed faster ways to understand performance changes and classify content without turning every exception into a manual analytics request.
Problem
Performance anomalies, content classification edge cases, and traffic-channel shifts required a mix of statistical detection, business context, and reviewable AI outputs rather than a single model or dashboard.
Constraints
The workflows had to avoid noisy conclusions, preserve business-rule overrides, support human review, and expose enough evidence for operators and stakeholders to trust the result.
Architecture
Led technical direction for a multi-stage LLM anomaly-analysis platform with deterministic guardrails, and built LLM-based article classification using structured outputs, business-rule overrides, and content signals. Complemented the AI workflows with statistical tests, seasonality-aware anomaly detection, and traffic channel shift detection.
Role
Owned architecture and sequencing across AI workflow design, deterministic validation, model-output structure, business-rule integration, and the handoff path from detected signal to actionable investigation.
Outcome
Created reusable patterns for turning performance changes and content ambiguity into reviewed, structured signals that business, editorial, and commerce teams could act on.
Demonstrates
Applied AI beyond chat: anomaly reasoning, LLM classification, statistical detection, business-rule integration, and production workflows where evidence and reviewability matter.

Focus

  • anomaly analysis
  • LLM classification
  • statistical detection
  • business rules
  • reviewable AI

Stack

  • LLM workflows
  • structured outputs
  • statistical tests
  • seasonality models
  • content signals
Data Platform

Shared Data Platform and Self-Service Analytics

A shared analytics foundation for high-volume digital media and commerce data, built to reduce repeated requests and increase safe self-service.

Context
At Hearst Magazines, a large analytics environment processed 10TB+ of daily workloads, including 5TB+ of clickstream data, while many teams depended on repeated custom SQL and a small group of specialists.
Problem
Analytics demand was growing faster than the platform operating model. Definitions drifted, pipeline ownership was fragmented, and business users needed governed self-service instead of ad hoc ticket queues.
Constraints
The work had to improve reliability without stopping delivery: existing reporting could not break, teams had different skill levels, and source systems spanned batch, streaming, warehouse, and transformation layers.
Architecture
Led platform architecture across BigQuery, Airflow, dbt, Kinesis, and PySpark. Established modeling standards, semantic-layer patterns, data quality checks, ownership practices, and reusable datasets for common analytical paths.
Role
Set roadmap and standards while remaining hands-on in implementation, stakeholder intake, model design, pipeline delivery, training, and migration planning.
Outcome
Reduced analytics backlog by 60%, delivered 100+ pipelines and data models in six months, and helped engineers and analysts adopt safer self-service practices.
Demonstrates
Data platform leadership at scale: technical architecture, operating model, education, and delivery discipline moving together.

Focus

  • data platform
  • semantic layer
  • self-service analytics
  • data quality
  • enablement

Stack

  • BigQuery
  • Airflow
  • dbt
  • Kinesis
  • PySpark
AI Workflow Automation

AI Commerce Defect Detection

A reviewable AI workflow that detects commerce catalog and retailer-page defects before operational issues compound.

Context
At Hearst Magazines, commerce operations depended on product availability, retailer content, and catalog state staying aligned across systems that changed outside direct control.
Problem
Manual review did not scale, rules alone missed visual and contextual failures, and operators needed actionable signals rather than noisy alerts.
Constraints
The system had to tolerate unstable web pages, partial extraction, retailer variation, unavailable products, visual ambiguity, and the need for human-reviewable evidence.
Architecture
Combined async web extraction, deterministic rules validation, structured outputs, and gated multimodal review. Gemini screenshot verification produced existence, availability, and confidence signals rather than opaque pass/fail labels.
Role
Designed the workflow boundaries, validation stages, confidence schema, and review path so AI would be used where visual reasoning added value and deterministic checks would handle known cases.
Outcome
Created a defect-detection loop that lets operators prioritize likely catalog and retailer-page issues instead of manually inspecting every product page.
Demonstrates
Practical multimodal AI system design: use rules where possible, use LLM vision where useful, and expose confidence and evidence for operational decisions.

Focus

  • multimodal review
  • catalog validation
  • operator workflow
  • confidence scoring

Stack

  • Gemini
  • async extraction
  • multimodal review
  • rules engine
Privacy Data Infrastructure

Consumer-Scale Event and NLP Privacy Systems

High-scale event processing and privacy-oriented data systems supporting safe analytics over large consumer-product datasets.

Context
At Meta, event and product datasets supported analytics, product decisions, and privacy-sensitive workflows across rapidly changing consumer systems.
Problem
Teams needed safer analytics over high-volume data while reducing storage cost, detecting sensitive information earlier, and preserving continuity during an organizational pivot.
Constraints
The systems had to handle 5B+ daily events, support analytics across 60+ NoSQL collections, maintain backward compatibility, and avoid exposing sensitive personal information through analytical workflows.
Architecture
Worked on real-time NLP-based PII detection, safe analytics patterns over NoSQL-derived datasets, cumulative table design, and a backward-compatible data model that could support changing product and organizational requirements.
Role
Contributed to data modeling, pipeline design, privacy-aware analytics infrastructure, and migration support inside large-scale product data environments.
Outcome
Supported privacy-aware analytics at consumer scale, enabled safer access patterns across broad NoSQL-derived data, and reduced storage by 65% through cumulative table design.
Demonstrates
Experience with high-scale event systems, privacy-sensitive data engineering, storage-efficient modeling, and migration work where compatibility matters.

Focus

  • event data
  • PII detection
  • privacy-safe analytics
  • NoSQL analytics
  • storage optimization

Stack

  • real-time event pipelines
  • NLP classification
  • NoSQL-derived datasets
  • cumulative tables
ML Infrastructure

Fintech and E-Commerce ML Data Infrastructure

Machine-learning data infrastructure for fraud, risk, and marketplace workflows, built to shorten model iteration cycles and improve production responsiveness.

Context
Point Predictive and 1stDibs both operated in environments where model performance, data availability, and response time directly affected fraud, risk, marketplace, and analytics workflows.
Problem
Model lifecycle steps were too slow, derived datasets were not yet centralized, and scaling constraints limited how quickly the team could improve and serve analytical signals.
Constraints
The platform needed reliable orchestration, warehouse-backed derived data, streaming ingestion, batch processing, and model-supporting datasets without disrupting active business workflows.
Architecture
Built infrastructure across AWS Step Functions, Redshift, PySpark on EMR, and Kinesis Firehose. Helped establish the first derived-data warehouse and production paths for model lifecycle data, and worked on ML platform modernization including SageMaker migration and AWS ML service adoption.
Role
Worked across data engineering and ML infrastructure, connecting ingestion, transformation, warehouse modeling, and model-supporting datasets into a more durable platform.
Outcome
Reduced model lifecycle time from weeks to hours, improved model performance by 20%, delivered 5x faster response, and increased scalability by 10x.
Demonstrates
Ability to build ML data infrastructure where orchestration, derived data, model iteration, and service responsiveness are all part of the same system.

Focus

  • ML data infrastructure
  • fraud analytics
  • model lifecycle
  • derived warehouse
  • ML platform modernization

Stack

  • AWS Step Functions
  • Redshift
  • PySpark
  • EMR
  • Kinesis Firehose
  • SageMaker
Financial Data Systems

Post-Trade and Financial Data Systems

Financial data and post-trade systems spanning reference data, derivatives clearing migration, reporting infrastructure, cloud migration, and developer tooling.

Context
Earlier financial-technology work at Barclays spanned post-trade systems, reference data, derivatives workflows, client reporting, and platform modernization.
Problem
Financial data systems required accuracy, auditability, migration discipline, and user-facing reporting tools while supporting complex securities and derivatives workflows.
Constraints
Work had to fit regulated environments, on-prem infrastructure, SQL Server-backed systems, legacy integration points, data quality expectations, operational reporting needs, and production change-management practices.
Architecture
Built ETL pipelines for Enterprise Security Master data, supported derivatives clearing migration and post-trade technology, contributed to cloud migration and DevOps work from on-prem systems, and built self-service tooling including a SQL generator, visualization platform, and client reporting infrastructure.
Role
Contributed as an engineer across delivery, migration, automation, reporting, and tooling efforts, with earlier algo-trading internship work providing exposure to market-facing systems.
Outcome
Delivered financial-data pipelines and workflow tools that improved reporting access, migration readiness, and operational support across post-trade and reference-data domains.
Demonstrates
Foundation in disciplined financial data engineering: ETL reliability, regulated workflows, reporting infrastructure, and practical tools for technical and business users.

Focus

  • financial data
  • post-trade systems
  • ETL
  • reporting infrastructure
  • migration

Stack

  • SQL Server
  • ETL pipelines
  • reporting infrastructure
  • SQL tooling
  • on-prem infrastructure

Contact

Continue the conversation

Email or LinkedIn are usually best for notes, context, or continuing a conversation. Calendly is there when a scheduled chat is easier.