AI / AGENTS

Repolens

What you're committing to.

Five AI agents inspect a GitHub repository in parallel. One verdict in under a minute.

Role

Solo · Product, Backend, Frontend, Infra

Timeline

14 days · May 2026

Stack

Python · FastAPI · LangGraph · MCP · Docker · Next.js · TypeScript · Tailwind · shadcn/ui · Railway · Vercel

Status

● Production live · active development

Live demo →·GitHub →·API docs →·Skill source →

The problem.

When I'm building, I want to know what an industry-standard repo looks like before modeling my own work on it. When I'm reading code, I want a single screen that tells me whether this repo is worth adopting. Both reduce to the same question: structured assessment of an entire codebase, surfaced in seconds, not hours.

Off-the-shelf tools fragment this work. Snyk for CVEs, SonarQube for quality, Depfu for dependencies, manual reading for architecture and documentation. Five tools, five accounts, five tabs. The signal is buried in tooling overhead.

Repolens collapses this into one workflow. Paste a GitHub URL. Five LLM agents inspect the repo in parallel across documentation, architecture, maintenance, testing, and security. Result: a structured report in 30 seconds with evidence-cited findings and recommendations.

Five agents, one verdict.

The audit dimensions weren't a marketing decision. Documentation, architecture, maintenance, testing, and security cover what a senior engineer would actually inspect during code review: can I read it, is it structured, is it maintained, is it tested, is it safe.

Each dimension is an independent LangGraph node returning a Pydantic schema: score, severity, summary, findings, recommendations. Nodes run in parallel via LangGraph fan-out, then aggregate into a weighted overall verdict. Each agent uses with_structured_output so the LLM output is typed end-to-end, not parsed from prose.

The security agent calls out to a custom MCP server. Two tools exposed via JSON-RPC over stdio: lookup_cves(dep) queries the CVE database, lookup_package(dep)queries PyPI/npm registry. The MCP pattern matters here because it demonstrates Anthropic's late-2024 protocol in production code, not as a demo.

Making agentic visible.

V1 shipped on Day 8. It worked. Users got their audit in 30 seconds, complete report at the end.

Something felt off in my own use. I opened the site, pasted a URL, clicked run. Blank screen for 30 seconds. Then everything appeared at once.

The whole product positioning was “agentic” — five agents working in parallel. None of that was visible. Users couldn't see the parallelism that was the differentiating insight. The UX was indistinguishable from a slow API call.

Day 10 I refactored the orchestration: LangGraph .invoke() became .astream(), FastAPI returned StreamingResponse with proper SSE format (text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no for proxy resistance), frontend replaced fetch with EventSource. As each agent completes, the corresponding dimension row populates with score and severity color.

One trap caught during design review: EventSourceauto-reconnects on disconnect by default. On a transient network blip, the frontend would silently re-trigger a full 30-second audit on the backend. I disabled auto-retry and exposed a “Connection lost” state instead.

Same product, completely different feel. The kind of UX gap that doesn't surface in MVP testing — and is easy to ignore post-launch — except it's the whole reason “agentic” sells.

Docker shaped two architectural decisions worth flagging. First, the MCP server runs as a subprocess inside the backend container, isolated but co-located — the command=sys.executable portability fix surfaced because Docker's python3symlink behaviour differs from macOS's. Second, the multi-stage Dockerfile keeps the production image lean (Python slim base, no build tools in final layer) and makes Railway deploys reproducible. The same image runs locally, in CI, and in production; the same image will be the basis for V3's self-hosted Docker Compose stack for enterprise customers who need on-prem deployment.

Three channels, one product.

Day 13 I sketched a “built-in terminal” feature. Users could fix flagged issues right inside Repolens. WebContainer as runtime, mockups looked impressive.

By Day 14 morning I'd cut it. The realization was uncomfortable but obvious: users already have Cursor, Claude Code, and Codex. Building a worse-version-of-their-IDE inside Repolens repeats the wheel. The right move was the opposite — make Repolens callable FROM their IDE.

I shipped Repolens as a Claude Skill instead. skills/repolens/SKILL.md is just markdown with YAML frontmatter — Anthropic's Skill spec. A user installs it once, then typing audit github.com/owner/repo inside Cursor triggers Claude to call Repolens API, parse the result, and present findings inline.

Three channels emerged: web UI for demo and discovery, REST API for programmatic users, Claude Skill for daily-driver inside IDEs. Same product, three points of contact. The Skill is the most novel: in the AI-tooling era, distribution-as-instruction is a real shipping artifact. Code isn't the only deliverable; instructions to other AIs are.

What it actually catches.

I audited Repolens with Repolens.

V1 score: 67/100. Documentation: missing CONTRIBUTING.md, no Quick Start guide. Testing: 7.4% test ratio (2 test files for 30 source files), no CI configured. Security: 10 of 11 dependencies unpinned. Architecture and maintenance scored well; the gaps were exactly what I'd skipped in the build sprint.

I fixed everything it flagged. Pinned all dependencies to exact versions in requirements.txt. Added GitHub Actions CI with parallel backend imports + frontend lint/typecheck/build jobs. Wrote LICENSE, CONTRIBUTING.md, expanded README with Quick Start. Re-audited: 80+.

The dogfooding loop closed. The tool worked on the tool's own repo. The findings were specific, actionable, and correct — not vague platitudes like “improve documentation” but cited gaps like “no CONTRIBUTING.md, no Quick Start guide, README lacks installation instructions for non-developers.”

What I optimized for, what I traded.

Optimized for — Time-to-signal

From URL paste to actionable findings in under a minute. Senior-engineer voice in the output — evidence-cited, opinionated, citing specific dimension scores rather than generic encouragement.

Traded — Ecosystem breadth

Repolens audits Python and JavaScript/TypeScript dependencies; Go, Rust, Java are V3. Depth on PyPI/npm was the trade for the breadth I didn't ship.

Traded — Real-time UI for memo and due diligence

Those endpoints are blocking (no SSE) because they take 10-45 seconds and the engineering cost for streaming didn't pencil out at this stage. Same SSE pattern can extend there in V3.

Did not yet build — Enterprise compliance layer

The current build is open-source and intended for individual / small-team use. Enterprise deployment requires the layer described in V3 below — data residency, model provider switching, audit trails, self-hosted Docker Compose. The MVP focuses on demonstrating product thesis; compliance follows when there's a procurement conversation that requires it.

Optimized for — Distribution shape

Skill >standalone product. Repolens lives inside users' existing tools, not as a new destination.

What this actually solves.

Three workflows, today, that Repolens compresses from hours to under a minute:

Workflow 01 — The adoption decision

“Should we use LangChain or LlamaIndex for our RAG project?” Engineers spend 30-90 minutes reading two READMEs, scanning issue trackers, checking last commit dates, eyeballing test coverage, debating in Slack. Repolens compresses this: paste both URLs, get a structured verdict with strengths, concerns, next steps, and red flags in under a minute. The decision still belongs to the team, but evidence is structured and time-to-judgment collapses.

Workflow 02 — The supply chain review

Before production deploy, security needs to know which dependencies are abandoned, which licenses are commercially incompatible, which are maintained by a single person. Manually: click through each dependency on PyPI/npm, check last release dates, parse license fields. For 30 dependencies, 1-2 hours. Repolens: 45 seconds with structured per-dependency risk levels, license compatibility flags, and alternative suggestions where high-risk.

Workflow 03 — The portfolio check

Before submitting code to a manager or mentor, you want a final quality pass. Does the README hold up, is the architecture coherent, did I forget tests for the new module. Currently I push to GitHub and audit via Repolens; V3 will close the loop with local repo upload so the pre-push audit becomes part of the commit ritual.

The common thread: structured assessment of an entire codebase, in seconds. Not a chatbot, not a code search, not a CI step. A compressed senior-engineer review surfaced in a single screen.

V3: what comes next.

Personalized audit rubrics

Industry-standard scoring is the default, but enterprises have their own standards: required folder structure, naming conventions, mandatory files (SECURITY.md, ADR templates), specific dependency policies. V3 will let teams define a custom rubric (YAML config or web UI) and audit against that. Repolens becomes the layer that operationalizes “our team's quality standards” without humans manually reviewing every repo.

Enterprise compliance and data residency

Enterprise deployment requires guarantees the open-source build cannot make alone:

EU AI Act readiness. Under the AI Act, AI systems used for automated decision-making in commercial contexts require risk classification, documentation of model providers, and transparency about training data. V3 ships with a compliance pack: model card disclosures, decision rationale logs, and configurable human-in-the-loop checkpoints.
Data residency. Source code is sensitive corporate data. EU customers cannot send code to US-hosted LLM endpoints under GDPR Article 44 (transfers to third countries). V3 supports EU-region Anthropic endpoints (Frankfurt) and configurable LLM provider switching for customers requiring on-premise inference (Azure OpenAI EU, AWS Bedrock EU).
No training on customer code. Default Anthropic API terms exclude API inputs from training; V3 makes this explicit in the deployment config and surfaces it to compliance teams as part of the audit trail.
Self-hosted deployment. For customers with classification levels above what SaaS can satisfy, Repolens ships as a Docker Compose stack: backend, frontend, MCP server, no external dependencies beyond the LLM endpoint (which can be on-prem).

Local repo audit

Pre-push quality check without GitHub push. Upload a folder or zip, run the same 5-dimensional audit on local files. Architectural shift: current pipeline is GitHub-API-driven, V3 adds local file traversal as a parallel input path.

Audit history and score trajectory

Track repos over time as users iterate. Score curves, “+8 points since last week,” regression alerts. The user behavior surfaced from my own dogfooding loop — I kept re-auditing Repolens after polish sessions to see the score move.

Multi-ecosystem dependencies

Go, Rust, Java registry clients with license and popularity heuristics matching the PyPI/npm depth.

Streaming for memo and due diligence

Same SSE pattern that transformed the audit UX, extended to the slower endpoints.

Live

repolens-audit.vercel.app

Code

github.com/Haichennn/repolens

API docs

repolens-production-61e0.up.railway.app/docs

Stack

Python · FastAPI · LangGraph · MCP · Pydantic · Docker · Next.js · TypeScript · Tailwind · shadcn/ui · Railway · Vercel

— Haichen Duan, May 2026

← back to all projects