Skip to main content

pro-ai-captions-architecture

FotoSwipe Pro: AI captions/alt-text + SEO schema (implementation plan)

Purpose: ship the “Aha” Pro feature. Provider‑agnostic AI captions that generate accurate alt/captions and ImageObject JSON‑LD, respecting privacy and .cursorrules.

Scope

  • Inputs: image URL (default) or opt‑in image bytes; optional product title/category/context.
  • Outputs: alt text, caption, and normalized ImageObject schema fields.
  • Delivery: plugin API for FotoSwipe (Pro), server endpoint for AI calls, demo wiring on /pro.

Architecture fit (SOLID, DRY)

  • Pro gating stays in src/pro/license.js. Remote validation via withRemoteAwareGate from src/pro/license-remote.js.
  • Introduce a small interface (CaptionProvider) and adapters (OpenRouter first). All provider logic lives in src/pro/ai/providers/ and is injected.
  • Server owns AI secrets and network calls. Client talks to /api/ai/caption only. No secrets in the browser.

File layout

  • Client (Pro package):
    • src/pro/ai/CaptionProvider.js → interface shape, small helpers (prompt assembly, truncation).
    • src/pro/ai/providers/OpenRouterProvider.js → calls backend proxy (not the OpenRouter API directly from browser).
    • src/pro/ai/schema/ImageObject.js → pure functions to build JSON‑LD from slide + AI result.
    • src/pro/ai/plugin.js → FotoSwipe Pro plugin that orchestrates: read key → license gate → fetch captions → emit schema → update UI.
    • src/pro/ai/ui/announcer.js → optional ARIA/live region announcements; accessibility first.
  • Server:
    • server/ai/router.js (Express) or api/ai/caption.ts (serverless) → POST /api/ai/caption.
    • Validates payload; composes provider request; enforces privacy (no image bytes unless enabled); rate limits; logs minimally.
  • Docs demo:
    • demo-docs-website/src/components/ProDemo/index.js → call /api/ai/caption when user toggles “Generate captions”; render alt/caption and a <script type="application/ld+json"> block.
    • UI clearly labels Mock vs Live.

Server API

  • Endpoint: POST /api/ai/caption
    • Body: { url: string, context?: { title?: string, category?: string }, options?: { maxTokens?: number }, licenseKey?: string }
    • Response: { alt: string, caption: string, confidence?: number }
    • Errors: 400 invalid_input, 402 license_invalid, 429 rate_limited, 502 provider_error.
    • Behavior:
      • If licenseKey present, validate via withLicenseGate on server (or reuse existing LS proxy validation); otherwise allow demo in mocked mode (configurable).
      • Never send image bytes by default. If ALLOW_IMAGE_BYTES=true, download/resize and hash; redact URLs from logs.
      • Timeout budget ≤ 6s; retries with backoff x1.

Env/config (server)

  • OPENROUTER_API_KEY (or provider‑specific key)
  • AI_PROVIDER=openrouter (extensible)
  • ALLOW_IMAGE_BYTES=false (default)
  • AI_MAX_TOKENS=256, AI_MODEL (e.g., captioning model), AI_TIMEOUT_MS=6000

Client interfaces

// src/pro/ai/CaptionProvider.js
export class CaptionProvider {
/** @param {{ baseUrl: string }} opts */ constructor(opts) { this.baseUrl = opts.baseUrl.replace(/\/$/, ''); }
/** @param {{ url: string, context?: any, licenseKey?: string }} input */
async generate(input) {
const r = await fetch(`${this.baseUrl}/caption`, {
method: 'POST', headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url: input.url, context: input.context, licenseKey: input.licenseKey })
});
if (!r.ok) throw new Error('caption_failed');
return await r.json(); // { alt, caption, confidence? }
}
}

Schema builder

// src/pro/ai/schema/ImageObject.js
export function toImageObject({ slide, result }) {
return {
'@context': 'https://schema.org',
'@type': 'ImageObject',
contentUrl: slide.src,
caption: result.caption,
description: result.alt
};
}

Pro plugin orchestration

// src/pro/ai/plugin.js
import { withRemoteAwareGate } from '../license-remote.js';
import { CaptionProvider } from './CaptionProvider.js';
import { toImageObject } from './schema/ImageObject.js';

export function createAiSeoPlugin({ baseUrl = '/api/ai', onSchema }) {
const provider = new CaptionProvider({ baseUrl });
const run = async ({ slide, licenseKey }) => {
const result = await provider.generate({ url: slide.src, context: { title: slide.title }, licenseKey });
const schema = toImageObject({ slide, result });
if (onSchema) onSchema(schema);
return { alt: result.alt, caption: result.caption };
};
return withRemoteAwareGate(run, { provider: { validate: async () => ({ valid: true }) } });
}

Docs demo wiring

  • Add a toggle “Generate AI captions” in ProDemo and call createAiSeoPlugin for each slide.
  • Inject the resulting JSON‑LD via a <script type="application/ld+json"> tag (client‑side append; no Helmet).
  • Display alt/caption next to each image; add a badge for “AI (live)” vs “Mocked”.

Privacy & compliance

  • Off by default; requires explicit user action (toggle) or config to enable.
  • Do not log URLs or PII; if logging is enabled, hash URLs and redact user data.
  • Provide an option to run URL‑only mode (no bytes) for strict privacy.

Performance

  • Caption generation is async; never block initial LCP image render.
  • Cache by URL hash on the server for 24h; ETag responses where possible.
  • Budget: ≤ 6s P95 for caption responses; UI timeout with retry suggestion.

Acceptance criteria

  • Given a valid license and AI key, /api/ai/caption returns alt/caption within 6s for a public image URL.
  • Demo /pro shows correct alt/caption and injects ImageObject JSON‑LD per image.
  • License off → endpoint rejects with 402 or demo runs in mocked mode (clearly labeled).
  • Offline → UI shows fallback and retains prior captions from cache if present.

Testing

  • Unit: schema builder, provider error mapping, prompt assembly.
  • Integration: endpoint happy path, invalid URL, provider timeout, rate limit.
  • E2E: demo toggles, JSON‑LD present, accessibility audit for alt text.

Rollout steps 1) Implement server /api/ai/caption in server/ai/router.js; mount under /api/ai in server/index.js. 2) Add client files under src/pro/ai/* and export createAiSeoPlugin in Pro build. 3) Wire demo UI toggle; label Mock vs Live. 4) Add docs: usage snippet, env setup, privacy notes. 5) Ship acceptance tests; tag release; update docs/fotoswipe-dual-license-e2e.md checklists.