pro-ai-captions-architecture

FotoSwipe Pro: AI captions/alt-text + SEO schema (implementation plan)

Purpose: ship the “Aha” Pro feature. Provider‑agnostic AI captions that generate accurate alt/captions and ImageObject JSON‑LD, respecting privacy and .cursorrules.

Scope

Inputs: image URL (default) or opt‑in image bytes; optional product title/category/context.
Outputs: alt text, caption, and normalized ImageObject schema fields.
Delivery: plugin API for FotoSwipe (Pro), server endpoint for AI calls, demo wiring on /pro.

Architecture fit (SOLID, DRY)

Pro gating stays in src/pro/license.js. Remote validation via withRemoteAwareGate from src/pro/license-remote.js.
Introduce a small interface (CaptionProvider) and adapters (OpenRouter first). All provider logic lives in src/pro/ai/providers/ and is injected.
Server owns AI secrets and network calls. Client talks to /api/ai/caption only. No secrets in the browser.

File layout

Client (Pro package):
- src/pro/ai/CaptionProvider.js → interface shape, small helpers (prompt assembly, truncation).
- src/pro/ai/providers/OpenRouterProvider.js → calls backend proxy (not the OpenRouter API directly from browser).
- src/pro/ai/schema/ImageObject.js → pure functions to build JSON‑LD from slide + AI result.
- src/pro/ai/plugin.js → FotoSwipe Pro plugin that orchestrates: read key → license gate → fetch captions → emit schema → update UI.
- src/pro/ai/ui/announcer.js → optional ARIA/live region announcements; accessibility first.
Server:
- server/ai/router.js (Express) or api/ai/caption.ts (serverless) → POST /api/ai/caption.
- Validates payload; composes provider request; enforces privacy (no image bytes unless enabled); rate limits; logs minimally.
Docs demo:
- demo-docs-website/src/components/ProDemo/index.js → call /api/ai/caption when user toggles “Generate captions”; render alt/caption and a <script type="application/ld+json"> block.
- UI clearly labels Mock vs Live.

Server API

Endpoint: POST /api/ai/caption
- Body: { url: string, context?: { title?: string, category?: string }, options?: { maxTokens?: number }, licenseKey?: string }
- Response: { alt: string, caption: string, confidence?: number }
- Errors: 400 invalid_input, 402 license_invalid, 429 rate_limited, 502 provider_error.
- Behavior:
  - If licenseKey present, validate via withLicenseGate on server (or reuse existing LS proxy validation); otherwise allow demo in mocked mode (configurable).
  - Never send image bytes by default. If ALLOW_IMAGE_BYTES=true, download/resize and hash; redact URLs from logs.
  - Timeout budget ≤ 6s; retries with backoff x1.

Env/config (server)

OPENROUTER_API_KEY (or provider‑specific key)
AI_PROVIDER=openrouter (extensible)
ALLOW_IMAGE_BYTES=false (default)
AI_MAX_TOKENS=256, AI_MODEL (e.g., captioning model), AI_TIMEOUT_MS=6000

Client interfaces

// src/pro/ai/CaptionProvider.js
export class CaptionProvider {
  /** @param {{ baseUrl: string }} opts */ constructor(opts) { this.baseUrl = opts.baseUrl.replace(/\/$/, ''); }
  /** @param {{ url: string, context?: any, licenseKey?: string }} input */
  async generate(input) {
    const r = await fetch(`${this.baseUrl}/caption`, {
      method: 'POST', headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ url: input.url, context: input.context, licenseKey: input.licenseKey })
    });
    if (!r.ok) throw new Error('caption_failed');
    return await r.json(); // { alt, caption, confidence? }
  }
}

Schema builder

// src/pro/ai/schema/ImageObject.js
export function toImageObject({ slide, result }) {
  return {
    '@context': 'https://schema.org',
    '@type': 'ImageObject',
    contentUrl: slide.src,
    caption: result.caption,
    description: result.alt
  };
}

Pro plugin orchestration

// src/pro/ai/plugin.js
import { withRemoteAwareGate } from '../license-remote.js';
import { CaptionProvider } from './CaptionProvider.js';
import { toImageObject } from './schema/ImageObject.js';

export function createAiSeoPlugin({ baseUrl = '/api/ai', onSchema }) {
  const provider = new CaptionProvider({ baseUrl });
  const run = async ({ slide, licenseKey }) => {
    const result = await provider.generate({ url: slide.src, context: { title: slide.title }, licenseKey });
    const schema = toImageObject({ slide, result });
    if (onSchema) onSchema(schema);
    return { alt: result.alt, caption: result.caption };
  };
  return withRemoteAwareGate(run, { provider: { validate: async () => ({ valid: true }) } });
}

Docs demo wiring

Add a toggle “Generate AI captions” in ProDemo and call createAiSeoPlugin for each slide.
Inject the resulting JSON‑LD via a <script type="application/ld+json"> tag (client‑side append; no Helmet).
Display alt/caption next to each image; add a badge for “AI (live)” vs “Mocked”.

Privacy & compliance

Off by default; requires explicit user action (toggle) or config to enable.
Do not log URLs or PII; if logging is enabled, hash URLs and redact user data.
Provide an option to run URL‑only mode (no bytes) for strict privacy.

Performance

Caption generation is async; never block initial LCP image render.
Cache by URL hash on the server for 24h; ETag responses where possible.
Budget: ≤ 6s P95 for caption responses; UI timeout with retry suggestion.

Acceptance criteria

Given a valid license and AI key, /api/ai/caption returns alt/caption within 6s for a public image URL.
Demo /pro shows correct alt/caption and injects ImageObject JSON‑LD per image.
License off → endpoint rejects with 402 or demo runs in mocked mode (clearly labeled).
Offline → UI shows fallback and retains prior captions from cache if present.

Testing

Unit: schema builder, provider error mapping, prompt assembly.
Integration: endpoint happy path, invalid URL, provider timeout, rate limit.
E2E: demo toggles, JSON‑LD present, accessibility audit for alt text.

Rollout steps 1) Implement server /api/ai/caption in server/ai/router.js; mount under /api/ai in server/index.js. 2) Add client files under src/pro/ai/* and export createAiSeoPlugin in Pro build. 3) Wire demo UI toggle; label Mock vs Live. 4) Add docs: usage snippet, env setup, privacy notes. 5) Ship acceptance tests; tag release; update docs/fotoswipe-dual-license-e2e.md checklists.

FotoSwipe Pro: AI captions/alt-text + SEO schema (implementation plan)​

FotoSwipe Pro: AI captions/alt-text + SEO schema (implementation plan)