MODULE_ID: PROJECT_BRAIN
KNOWLEDGE SYSTEM

PROJECT BRAIN

Unified RAG over docs, ClickUp, Figma, Miro, Excel & sprint history

STACK

Mastra + Vectorize + D1

BUILD TIME

Ongoing

STATUS

LIVE

ROLE

Solo build

OUTCOME

Unified RAG over docs, ClickUp, Figma, Miro, Excel & sprint history

Performance Metric

Time-to-context

10xfaster grounded answers
PROJECTED GAIN

System Overview

STATUS: ACTIVE

Problem Statement

PMs and stakeholders waste hours hunting answers across PDFs, decks, Figma boards, Miro maps, ClickUp tasks and sprint comments. Context lives everywhere except where the next decision is being made, and blocked tasks rot for days because nobody writes the missing requirement down.

Technical Solution

Project Brain ingests every project source — text/PDF/PowerPoint, Excel/CSV spreadsheets, websites, Figma nodes, Miro boards, ClickUp tasks/lists, images — and turns each into structured artifacts (documentation, requirements, todos, meeting minutes, sprint meetings) inside ClickUp Docs. Each artifact enters a full lifecycle: version history with LCS line-level diff, inline editing with preview toggle, ClickUp drift detection, and one-click export to Markdown, HTML, or PDF — all without leaving the page. When the same source URL is re-submitted, Smart Update detects the duplicate, computes an LLM-powered semantic diff, zeros stale Vectorize vectors, and merges artifacts diff-aware (struck-through removals, in-place modifications, appended additions). A daily cron snapshots the active sprint, diffs status/due/assignee changes, parses [area] task prefixes, analyses blocker comments and emails the right designer/dev/PM with reminders. A retrieval-augmented chat answers questions over everything with live token streaming, a tool-call visibility panel, and citations — with a /all bypass when filters get in the way. Sprint Intelligence extends the daily cron with heuristic risk scoring (0–3 per task, shown as colored dots), a Friday Mastra-composed weekly digest emailed to PM/lead, automatic sprint-retro pre-fill as a sprint_meeting artifact, and an Insights tab with burndown, status-distribution, blocker-trend, area-workload charts and a most-changed-tasks list — all computed from the existing snapshot history with zero new infra.

AI Implementation Strategy

Mastra agents drive each LLM stage with per-stage model routing (cheap models for classify/enrich/preprocess, full model for artifacts/chat). Smart Update adds an LLM semantic diff stage (diff-document prompt) that compares old and new raw_text to produce structured added/removed/modified arrays; the diff drives a diff-aware ClickUp merge that strikes through removed ACs and annotates modified ones in-place. Figma nodes are now vision-processed on every ingest via Claude Haiku (same pipeline as image sources), making the diff semantically meaningful for design changes. Retrieval: bge-m3 embeddings (batch 64) into Cloudflare Vectorize with 10 indexed metadata fields, smarter HyDE (keyword vs question detection), bge-reranker-base cross-encoder rerank with low-confidence UX, recency decay for task snapshots (90d) and document artifacts (180d), diversity caps. Anthropic prompt caching on the system prompt. Langfuse traces on every chat and artifact run. Every chat run is logged to brain_chat_traces for offline eval.

Core Capabilities

SYS.CAPABILITIES

Ingest & Smart Update

  • Multi-source ingest (PDF, txt, PowerPoint, Excel/CSV, website, Figma, Miro, ClickUp, image)
  • Excel/CSV ingestion: SheetJS parses multi-sheet workbooks; heuristic schema detector classifies each sheet as requirements / tasks / gantt / meeting / table; each sheet rendered as a markdown table with column and row-count headers before entering the standard pipeline
  • Smart duplicate detection: trigger-ingest checks (project_id, source_url) in D1 before dispatch — re-uploading the same URL routes to project-brain-smart-update instead of a blind re-ingest
  • LLM design diff: smart-update computes a structured DocumentDiff { added, removed, modified, summary } by comparing old and new raw_text via a Mastra agent with the diff-document prompt
  • Figma vision on ingest: extractFigma() now exports the node as PNG via the Figma API and runs Claude vision (same as extractImage()), making raw_text semantically rich and diffable; falls back to metadata stub on export failure
  • Stale vector zeroing: after re-embedding new chunks, smart-update calls zero-stale-vectors to re-upsert old Vectorize entries with version_latest:0 so retrieval queries ignore superseded content
  • Diff-aware artifact merge: mergeDocContentWithDiff() applies removed-element strikethrough (~~text~~ Removed [date]) and in-place modification annotations on top of the standard additive merge
  • Smart Update UI: RealtimeStatus shows an amber 'Smart Update' badge during smart-update runs and a +N −N ~N diff summary on completion; DiffPanel renders the full structured diff below the status block
  • Document lineage: old brain_documents row gains superseded_by = new doc id; new row stores diff_json and update_mode = 'smart_update' — full audit trail preserved, no rows deleted

Artifacts & Documentation

  • Mastra-driven artifact generation: documentation, requirements, todo, meeting minute, sprint meeting
  • Artifacts tab: browse and filter all generated artifacts by type with inline markdown expansion
  • ClickUp Docs v3 page creation under per-area sub-docs with version supersede
  • Version history & diff view: click 'history' on any artifact to see all prior versions with LCS line-level diff (green additions, red removals)
  • ClickUp live-sync before update: before overwriting an artifact, generate-artifact fetches the live ClickUp page to preserve manual PM edits
  • In-place artifact editing: edit markdown inline with preview toggle, optionally push back to ClickUp Doc on save
  • ClickUp drift detection: editor surfaces a banner when the live ClickUp page differs from the stored version
  • Artifact export: download as Markdown (.md), HTML (.html via marked), or open Print/PDF dialog
  • Sprint retro pre-fill: auto-generated sprint_meeting artifact on sprint-end detection

Chat & RAG

  • RAG chat with live token streaming, collapsible tool-call visibility panel, and suggestion cards
  • RAG pipeline: parent-child chunks, bge-m3 (batch 64) + bge-reranker-base, /all bypass, citations
  • Smarter HyDE: keyword vs question detection (interrogatives + ? suffix) — natural language queries skip HyDE even when short
  • Recency decay extended to document artifacts (requirements, docs, meeting minutes) with 180-day half-life, in addition to 90-day task-snapshot decay
  • Low-confidence retrieval chip: when reranker score < 0.05, an amber ⚠ hint appears in chat with /all suggestion
  • Langfuse tracing on brain-chat, generate-artifact, and daily-sprint-snapshot runs — set LANGFUSE_PUBLIC_KEY/SECRET to enable
  • Per-stage model routing: MASTRA_MODEL_CLASSIFY / ENRICH / PREPROCESS / ARTIFACT / BLOCKER / TITLE — use cheap models for utility stages, full model for generation
  • Anthropic prompt caching on brain-chat system prompt (ephemeral cache_control) for long-context cost reduction
  • Cursor pagination on get-conversations (50/page), get-artifacts (100/page), get-snapshots (200/page), get-todos (200/page) — returns next_cursor
  • Vocabulary governance for subarea/theme via cosine 0.85 dedup
  • brain_chat_traces logging for offline eval seam
  • Rename conversations: pencil button on each chat sidebar row opens an inline editor; Enter saves via /update-conversation, Esc cancels, server validates 1–200 chars
  • Search across lists: free-text input on Artifacts (title/content), Todos (title/description/assignee), and Conversations (title) — AND-combined with the existing type/status filters
  • compare_sprints chat tool: diffs two sprint date windows from snapshot history to answer velocity questions
  • Semantic doc merging: bge-m3 cosine similarity (≥0.78) picks the right ClickUp Doc to merge into when titles drift; keyword fuzzy scorer kept as a safety fallback
  • Centralised constants & models registry: all RAG thresholds and per-stage model defaults in one configurable place
  • Schema-validated API boundaries: Zod parse-or-400 on all internal route payloads with structured error issues

Sprint Monitoring

  • Daily Mon–Fri 18:00 Europe/Rome sprint snapshot + change diff with [area] prefix parsing
  • Blocker-comment analysis → brain_todos with delayed Mailtrap reminders
  • Sprint Intelligence tab: burndown, status-distribution stacked area, blocker trend, area workload, assignee load, top-N most-changed tasks (recharts, snapshot history)
  • Heuristic risk scoring (0–3) per task snapshot — colored dot in Sprint tab (yellow / orange / red pulsing)
  • Friday weekly digest: Mastra-composed email (shipped / blocked / slipped / risks / Monday agenda) via Mailtrap
  • Cross-sprint trends in Insights: velocity, slip rate, and blocker-day charts over the last 4 / 6 / 12 sprints with a built-in window selector — zero new infra, derived from existing snapshot history