Anthony Vila

18 years of nonprofit technology operations. Independent AI production systems on the Anthropic Claude API. This portfolio documents methodology, production artifacts, quality assurance systems, and live output.

[email protected] · Bloomfield, NJ · LinkedIn · Medium

Case Study

AI Content Production System

In early 2026 I independently designed and built a daily AI content production system on the Anthropic Claude API. The system generates essays in ten distinct character voices, converts them to multi-speaker podcast audio via VibeVoice on Kaggle GPUs, verifies citations through a three-stage adversarial process, runs a four-pass editorial review, and publishes finished episodes overnight with no manual intervention. In daily production since launch across nine publications.

Orchestrated across four platforms: Make.com automation scenarios handle the primary essay pipeline, Power Automate integrates the Claude API with Microsoft 365, Python scripts via Windows Task Scheduler handle batch submission and news aggregation, and Kaggle T4 GPU notebooks run VibeVoice for text-to-speech.

Make.com overnight batch scenario

Make.com overnight batch scenario: 24 modules from Gmail news fetch through Claude API Batch submission, poll/retrieve cycle, to email delivery and OneDrive storage.

Batch cleanup scenario

Batch cleanup: polls for completed batches, extracts text, delivers via email, archives to OneDrive.

Power Automate flow

Power Automate daily digest: Claude API + Microsoft 365. The voice guide is a plain Word document on OneDrive that non-technical program staff can edit on their own — no code, no pipeline access. The pipeline reads it at runtime.

Methodology Artifact

Voice Guide Development Framework

A reusable methodology for defining and testing AI voice guides — structured prompt documents that control how the Claude API generates text in a specific character voice. Codified in a meta-document and successfully taught to a non-technical user who produced working voice guides independently.

Full methodology document →

Core Framework (Excerpt)

Thinking Style vs. Costume

Every voice has two layers. Thinking Style is the cognitive pattern — how the character processes the world, what tensions they carry. Costume is the surface layer — sentence rhythm, tone, verbal habits. A voice defined at the thinking level produces original output. A voice defined at the surface level produces mimicry.

Balance Rule

If the costume is more detailed than the thinking style, the output sounds right but thinks blandly — mannerisms draped over default Claude reasoning. This is the most common failure mode.

Audience Alignment

Every voice needs a tailored audience model that pulls in the same direction as the costume. A fast, combative voice needs a bored or resistant audience. Mismatched audiences cause the voice to spend generation energy managing the conflict instead of thinking.

Grounding Rule

Every voice needs an explicit instruction for moving from abstraction to concreteness. Without one, the default is vague anecdote. Either fully specific or fully mythic. The middle ground is where every voice sounds the same.

Separation of Generation and Editing

Including editorial rules during generation causes the model to self-edit while writing, producing safer but flatter output. The editing checklist belongs in a separate pass only.

Production Artifact

Sample Voice Guide (Excerpt)

One of twenty voice guides in production. The passthrough section below is the compact, token-efficient definition sent to the Claude API. Full guides include Part 1 (development rationale, failure mode analysis) and a separate editing-pass checklist.

Full voice guide document →

Voice: “The Commissioner”

Thinking Style: Sees every problem as an implementation problem. Not “what should we do” but “what happens when we try to do it.” Two tensions: genuine belief that institutions improve the world AND willingness to abandon any institution the moment the math says it’s failing. Deep care for individuals AND utilitarian commitment to collective outcomes.

Costume: Clean, measured, direct prose. Medium-length sentences building arguments in visible steps. Key move: the parenthetical admission, where the person arguing is briefly visible inside the argument. Second move: the composure crack — one short, plain sentence that lets something real through.

Audience: Someone who cares about the same problem and has a worse plan. Firm, specific, not unkind.

Grounding: Specific institutional detail: the exact policy mechanism, the exact budget number. Never vague. Either the specific study and its specific finding, or nothing.

Production System

Audio Production Pipeline

Multi-speaker podcast episodes up to 90 minutes, produced using VibeVoice on Kaggle T4 GPUs. 2–4 speakers with consistent voice identity and natural conversational delivery.

Kaggle notebook

Kaggle GPU notebook: VibeVoice pipeline pulling scripts from OneDrive, generating multi-speaker audio on T4 accelerators.

Audacity waveform

Audacity: engineering voice reference samples by analyzing pitch contours and selecting segments with the right prosodic characteristics for TTS cloning.

Technical Details

Voice Reference Engineering

Custom voice references built by isolating speakers from source recordings using pitch-based clustering and MFCC analysis. Speaker-order enforcement, voice prompt trimming, and chunking systems handle voice reassignment at boundaries.

Script Engineering for TTS

Punctuation as prosody control. Energy continuity across speaker switches. The monotone test: if a line doesn’t communicate with zero inflection, rewrite it.

The Chunking Solution

Multi-speaker TTS breaks at long durations. Solved by designing script-level commercial breaks in a 1930s radio format that re-introduce all speakers at each boundary — turning a technical constraint into a format feature.

QA System

Multi-Pass Editorial Review

Every generated piece passes through four sequential editorial stages before publication. Each stage has a dedicated AI editor with a specific jurisdiction, a defined scoring methodology, and an explicit constraint preventing it from evaluating outside its domain. The separation of concerns is the point.

Pass 1

AI Detection Check

The 12-category phrase detection checklist. Binary gate: three or more hits in the same passage is diagnostic of machine-generated text. Catches tells invisible to quality-focused editors.

Pass 2

Conceptual Craft

Is the piece thinking, or performing? Evaluates originality, conceptual architecture, voice presence, humor, and thermal range. Swap test: strip the voice and see what survives.

Originality (2x), Architecture (2x), Voice, Humor, Thermal Range → 1–100

Pass 3

Structural Craft

Is the piece built correctly? Proportion, pacing, transitions, register control, architectural integrity. Watches for structural AI tells: uniform section length, repeated openings, symmetric structure without justification.

Proportion (2x), Pacing (2x), Transitions, Register, Integrity → 1–100

Pass 4

Prose Craft

Are the sentences working? Requires quantitative evidence before scoring: sentence length bands, syntactic opening patterns, verb counts. The count overrides impressions.

Rhythm (2x), Economy, Vitality, Architecture, Control (2x) → 1–100

Prose craft judge feedback

Sample prose judge output: quantitative sentence analysis (word count, length band, syntactic opening type) with diagnostic. The count overrides impressions — if the data shows monotonous openings, the score reflects it regardless of how the prose felt on first read.

Each editor has calibrated scoring (40 = functional, 55 = competent, 75 = sustained craft, 85+ = doing meaning-work), explicit failure modes for the judge itself, and a swap test to prevent scoring outside jurisdiction. Full editor documents: Conceptual · Structural · Prose

QA Artifact

AI Phrase Detection Checklist

12-category detection system based on frequency-ratio analysis. One flagged phrase is normal English. Three or more in the same passage is diagnostic. Developed through analysis of hundreds of generated pieces across multiple model versions.

Full checklist document →

Categories

1. Importance Announcements“it’s important to note” · “cannot be overstated”
2. Role-Playing Constructions“plays a crucial/pivotal role in shaping”
3. False-Depth“delving into the intricacies” · “rich tapestry”
4. Zombie-Passive-Emotional“left an indelible mark” · “paving the way for”
5. Throat-Clearing Openers“In today’s rapidly evolving landscape”
6. Empowerment Closers“By understanding these X, you can...”
7. Trailing GerundsFactual sentence + “, highlighting the importance of” — strongest single tell
8. Extreme-Frequency Compounds500x+ AI-vs-human ratio: “provide valuable insights”
9. Era Clusters3+ phrases from same model generation = likely produced then
10. Reframe Pivot“It’s not X. It’s Y.” — diagnostic at 5+ per 2500 words
11. Meta-Commentary on Language“that word is doing a lot of work” · “sit with that”
12. Figurative Density3+ unrelated metaphors in 150 words. Model optimizes locally, not globally.
QA Artifact

AI Output Degradation Patterns

Seven specific patterns identified through daily operation of ten concurrent voices. Each diagnosed from live output, resolved through prompt architecture or batch configuration changes.

Documented Patterns

Signature Calcification

The model locks onto favorite phrases and repeats them across pieces. Resolved by adding variety constraints and rotating opener patterns.

Convergent Invention

Independent pieces fabricate identical details — same fictional expert, same invented statistic. Resolved through citation verification pipelines.

Exploitation-Default Drift

One analytical lens crowds out all others over repeated runs. Resolved by strengthening secondary tensions in voice guides and varying subjects.

Cross-Contamination

In multi-voice batches, characteristics bleed between voices. Resolved through batch isolation and staggered generation timing.

Structural Pivot Dependency

The “It’s not X, it’s Y” reframe is the most persistent structural crutch. Dressing pivots in more complex language doesn’t fix the problem — if the output relies on 9–12 pivots per piece, the essay lacks something to say. Each pivot looks sharp in isolation; the pattern is diagnostic at scale. Consistently the most challenging check to pass.

Metaphor Homogeneity

Over repeated runs, the model collapses to a narrow set of structural metaphors — particularly “walls,” “load-bearing,” and “angles.” Resolved by seeding a default metaphor domain per voice (plants, weather, music, etc.) to keep figurative language distinct across the roster.

Contextless Superlatives

Essays default to sweeping evaluative claims — “the best in the history of television” — that sound human individually but become repetitive across pieces. The model reaches for superlatives as a compression shortcut instead of building the specific case.

Pipeline Output

Automated Production Samples

These pieces were generated by the overnight pipeline, reviewed by the four-pass editorial system, and published without manual editing.

Blogspot essay with footnotes

Auto-generated essay with inline footnotes. The citation system requires 50+ sources per piece, recorded during research per Stage 1 of the verification process.

Citation apparatus

Footnote apparatus from the same essay. Each citation verified by blind summary (Stage 2) and human spreadsheet review (Stage 3).

Published Writing

Methodology Essays (Medium)

Essays about the production system, written to explain the methodology to a general audience.

Background

Career Summary

Three eras: political campaign data infrastructure, nonprofit database management, AI production systems. The through-line is making complex technical systems work for people who are not engineers.

2025–2026AI Production Systems — Independent. Claude API, Make.com, Power Automate, Python, Kaggle. Daily automated content across 9 publications, 20 voices. 2019–2024Senior Database Manager — Children’s Health Fund. Raiser’s Edge NXT primary admin. RE 7→NXT migration. Finance/Development bridge. 2016–2019Database Manager — Covenant House International. SQL, cross-database integration (Team Approach, Luminate, DonorDrive, SalesForce). Org-wide CRM vendor selection. 2010–2015Data Systems Manager — SEIU 775. 43,000-member union. Full SalesForce migration and buildout. Elections Committee Chair. 2006–2008Campaign Data — VAN, Catalist, fundraising management. Reed Award-winning website data structure.

Harvard University, BA Government and Economics, 2004.

Publication: T. Vila, R. Greenstadt, D. Molnar. “Why we can’t be bothered to read privacy policies.” WEIS ’03, 2003.