pro-ai-captions-architecture
FotoSwipe Pro: AI captions/alt-text + SEO schema (implementation plan)
Purpose: ship the “Aha” Pro feature. Provider‑agnostic AI captions that generate accurate alt/captions and ImageObject JSON‑LD, respecting privacy and .cursorrules.
Scope
- Inputs: image URL (default) or opt‑in image bytes; optional product title/category/context.
- Outputs: alt text, caption, and normalized
ImageObjectschema fields. - Delivery: plugin API for FotoSwipe (Pro), server endpoint for AI calls, demo wiring on
/pro.
Architecture fit (SOLID, DRY)
- Pro gating stays in
src/pro/license.js. Remote validation viawithRemoteAwareGatefromsrc/pro/license-remote.js. - Introduce a small interface (
CaptionProvider) and adapters (OpenRouter first). All provider logic lives insrc/pro/ai/providers/and is injected. - Server owns AI secrets and network calls. Client talks to
/api/ai/captiononly. No secrets in the browser.
File layout
- Client (Pro package):
src/pro/ai/CaptionProvider.js→ interface shape, small helpers (prompt assembly, truncation).src/pro/ai/providers/OpenRouterProvider.js→ calls backend proxy (not the OpenRouter API directly from browser).src/pro/ai/schema/ImageObject.js→ pure functions to build JSON‑LD from slide + AI result.src/pro/ai/plugin.js→ FotoSwipe Pro plugin that orchestrates: read key → license gate → fetch captions → emit schema → update UI.src/pro/ai/ui/announcer.js→ optional ARIA/live region announcements; accessibility first.
- Server:
server/ai/router.js(Express) orapi/ai/caption.ts(serverless) → POST/api/ai/caption.- Validates payload; composes provider request; enforces privacy (no image bytes unless enabled); rate limits; logs minimally.
- Docs demo:
demo-docs-website/src/components/ProDemo/index.js→ call/api/ai/captionwhen user toggles “Generate captions”; render alt/caption and a<script type="application/ld+json">block.- UI clearly labels Mock vs Live.
Server API
- Endpoint:
POST /api/ai/caption- Body:
{ url: string, context?: { title?: string, category?: string }, options?: { maxTokens?: number }, licenseKey?: string } - Response:
{ alt: string, caption: string, confidence?: number } - Errors:
400 invalid_input,402 license_invalid,429 rate_limited,502 provider_error. - Behavior:
- If
licenseKeypresent, validate viawithLicenseGateon server (or reuse existing LS proxy validation); otherwise allow demo in mocked mode (configurable). - Never send image bytes by default. If
ALLOW_IMAGE_BYTES=true, download/resize and hash; redact URLs from logs. - Timeout budget ≤ 6s; retries with backoff x1.
- If
- Body:
Env/config (server)
OPENROUTER_API_KEY(or provider‑specific key)AI_PROVIDER=openrouter(extensible)ALLOW_IMAGE_BYTES=false(default)AI_MAX_TOKENS=256,AI_MODEL(e.g., captioning model),AI_TIMEOUT_MS=6000
Client interfaces
// src/pro/ai/CaptionProvider.js
export class CaptionProvider {
/** @param {{ baseUrl: string }} opts */ constructor(opts) { this.baseUrl = opts.baseUrl.replace(/\/$/, ''); }
/** @param {{ url: string, context?: any, licenseKey?: string }} input */
async generate(input) {
const r = await fetch(`${this.baseUrl}/caption`, {
method: 'POST', headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url: input.url, context: input.context, licenseKey: input.licenseKey })
});
if (!r.ok) throw new Error('caption_failed');
return await r.json(); // { alt, caption, confidence? }
}
}
Schema builder
// src/pro/ai/schema/ImageObject.js
export function toImageObject({ slide, result }) {
return {
'@context': 'https://schema.org',
'@type': 'ImageObject',
contentUrl: slide.src,
caption: result.caption,
description: result.alt
};
}
Pro plugin orchestration
// src/pro/ai/plugin.js
import { withRemoteAwareGate } from '../license-remote.js';
import { CaptionProvider } from './CaptionProvider.js';
import { toImageObject } from './schema/ImageObject.js';
export function createAiSeoPlugin({ baseUrl = '/api/ai', onSchema }) {
const provider = new CaptionProvider({ baseUrl });
const run = async ({ slide, licenseKey }) => {
const result = await provider.generate({ url: slide.src, context: { title: slide.title }, licenseKey });
const schema = toImageObject({ slide, result });
if (onSchema) onSchema(schema);
return { alt: result.alt, caption: result.caption };
};
return withRemoteAwareGate(run, { provider: { validate: async () => ({ valid: true }) } });
}
Docs demo wiring
- Add a toggle “Generate AI captions” in
ProDemoand callcreateAiSeoPluginfor each slide. - Inject the resulting JSON‑LD via a
<script type="application/ld+json">tag (client‑side append; no Helmet). - Display alt/caption next to each image; add a badge for “AI (live)” vs “Mocked”.
Privacy & compliance
- Off by default; requires explicit user action (toggle) or config to enable.
- Do not log URLs or PII; if logging is enabled, hash URLs and redact user data.
- Provide an option to run URL‑only mode (no bytes) for strict privacy.
Performance
- Caption generation is async; never block initial LCP image render.
- Cache by URL hash on the server for 24h; ETag responses where possible.
- Budget: ≤ 6s P95 for caption responses; UI timeout with retry suggestion.
Acceptance criteria
- Given a valid license and AI key,
/api/ai/captionreturns alt/caption within 6s for a public image URL. - Demo
/proshows correct alt/caption and injectsImageObjectJSON‑LD per image. - License off → endpoint rejects with 402 or demo runs in mocked mode (clearly labeled).
- Offline → UI shows fallback and retains prior captions from cache if present.
Testing
- Unit: schema builder, provider error mapping, prompt assembly.
- Integration: endpoint happy path, invalid URL, provider timeout, rate limit.
- E2E: demo toggles, JSON‑LD present, accessibility audit for alt text.
Rollout steps
1) Implement server /api/ai/caption in server/ai/router.js; mount under /api/ai in server/index.js.
2) Add client files under src/pro/ai/* and export createAiSeoPlugin in Pro build.
3) Wire demo UI toggle; label Mock vs Live.
4) Add docs: usage snippet, env setup, privacy notes.
5) Ship acceptance tests; tag release; update docs/fotoswipe-dual-license-e2e.md checklists.