Askly | Automata

// section 01 · discovery

$ cat ./discovery.md

▸ overview

Upload a PDF, Word doc or spreadsheet and chat with it. Frontend is a three-panel Next.js app: chat history, messages + input, document library. Uploads are parsed (unpdf / mammoth / xlsx), chunked (1000 chars, 200 overlap) and embedded via HuggingFace Inference (384-dim). Embeddings live in Supabase with the pgvector extension and a HNSW index. On a question, the query is embedded, matched against the HNSW index, the top chunks are stitched into a MiniMax M2 prompt, and the response streams back via the Vercel AI SDK. Sessions are browser-isolated (localStorage UUID, x-session-id header) — no accounts.

▸ problem

Keyword search over uploaded documents fails as soon as the question wording differs from the text. The proper fix is semantic search, but that usually means a managed vector DB (Pinecone, Weaviate). pgvector inside Supabase is good enough for the use case and removes one vendor.

▸ audience

Developers, researchers and students who want to chat with their own documents privately, without the materials leaving an account they own.

// section 03 · architecture

$ cat ./architecture.md

c4 / container · c4-level-2

loading…

flowchart TD
    user(((User)))

    subgraph vercel["Vercel"]
        ui["Next.js 16 UI<br/><i>3-panel chat</i>"]
        api["API routes<br/><i>6 handlers</i>"]
    end

    subgraph supa["Supabase"]
        pg[(Postgres 16<br/>+ pgvector + HNSW)]
    end

    hf["HuggingFace Inference<br/><i>384-dim embeddings</i>"]
    mm["MiniMax M2<br/><i>streaming LLM</i>"]

    user -->|x-session-id| ui
    ui --> api
    api -->|embed query| hf
    api -->|match_chunks RPC| pg
    api -->|prompt + chunks| mm
    mm -->|stream| ui
    api -->|store chunks + embeddings| pg

    classDef person fill:#0D1117,stroke:#4ADE80,color:#E5E7EB
    classDef container fill:#0D1117,stroke:#06B6D4,color:#E5E7EB
    classDef store fill:#111827,stroke:#FBBF24,color:#E5E7EB
    classDef ext fill:#111827,stroke:#6B7280,color:#9CA3AF

    class user person
    class ui,api container
    class pg store
    class hf,mm ext

    style vercel fill:transparent,stroke:#4ADE80,stroke-dasharray:4 4,color:#E5E7EB
    style supa fill:transparent,stroke:#4ADE80,stroke-dasharray:4 4,color:#E5E7EB

// section 03b · sequences

$ cat ./sequences.md

// runtime flows — who talks to whom, in what order

// From a typed question to a streamed answer that cites the document chunks it used.

RAG query — question to grounded answer · flow-01

loading…

sequenceDiagram
    autonumber
    actor User
    participant UI as Next.js UI
    participant API as API /chat
    participant HF as HuggingFace
    participant PG as Supabase pgvector
    participant LLM as MiniMax M2

    User->>UI: types question
    UI->>API: POST /api/chat<br/>(query, x-session-id)
    API->>HF: embed(query)
    HF-->>API: 384-dim vector
    API->>API: classifyQueryIntent(query)
    API->>PG: rpc match_document_chunks<br/>(vector, threshold, session_id)
    PG-->>API: top N chunks + doc names
    API->>LLM: stream prompt (query + chunks)
    LLM-->>API: token stream
    API-->>UI: SSE stream
    UI-->>User: progressive answer + sources

// section 04 · infrastructure

$ cat ./infrastructure.md

▸ services

provider: Supabase (Postgres + pgvector) + HuggingFace Inference + MiniMax API

▸ Next.js 16 App Router on Vercel
▸ Supabase Postgres with pgvector extension
▸ HNSW index on document_chunks.embedding
▸ HuggingFace Inference API (384-dim embeddings)
▸ MiniMax M2 (LLM, streaming responses)
▸ Vercel AI SDK for the streaming UI

// section 05 · implementation

$ cat ./implementation.md

▸ frontend

· Next.js 16
· React 19
· Tailwind CSS 4
· Vercel AI SDK (useChat)
· react-markdown + GFM

▸ backend

· Next.js API routes (6 handlers)
· Supabase JS client (service role for admin writes)
· LangChain RecursiveCharacterTextSplitter
· unpdf / mammoth / xlsx parsers
· Vercel AI SDK for streaming

▸ data

· Postgres 16 (Supabase managed)
· pgvector with HNSW index
· 7 SQL migrations (initial schema, embedding dims, doc names in match results, index tuning, sessions/chats)

▸ devops

· Vercel hosting
· NODE_OPTIONS=--max-old-space-size=4096 for large-file uploads

// section 06 · technical challenges

$ cat ./challenges/*.md

// 3 technical problems solved

01 / 03

challenge-01.md · rag · chunking · embeddings

▸ problem

Chunking PDFs without breaking sentences and without exhausting the embedding rate limit.

constraint: HuggingFace Inference rate-limits aggressively; a 200-page PDF can produce 800+ chunks. Each retried call counts. Chunks too small lose semantic coherence; too large overflow the LLM context window when stitched.

▸ approach

RecursiveCharacterTextSplitter with chunkSize=1000 and chunkOverlap=200. Embeddings are batched (10-20 at a time) with exponential backoff on 429. Chunking and embedding happen on the API route side, never in the browser.

lib/chunking.ts typescript

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await splitter.createDocuments([cleanText]);

return docs.map((doc, index) => ({
  content: doc.pageContent,
  index,
}));

challenge-02.md · rag · intent · search

▸ problem

A fixed similarity threshold gives bad RAG quality: general questions ("what is this about?") need lots of low-similarity chunks; specific questions need few high-similarity ones.

constraint: pgvector HNSW does not let you change the threshold per query without re-issuing the RPC. The split between "general" and "specific" cannot be inferred from embedding similarity alone — by then it is too late.

▸ approach

A simple regex-based intent classifier looks at the raw query (Spanish + English patterns: "de que trata", "summary", "overview"). General intent → match_count = 30, threshold = 0.2. Specific intent → match_count = 5, threshold = 0.3. The matched document names are injected into the prompt so the model can cite sources.

lib/rag.ts typescript

const GENERAL_PATTERNS = [
  /\bde\s+qu[eé]\s+(va|trata|habla)\b/i, // ES: "de que trata"
  /\bsummar(y|ize|ise)\b/i,                  // EN
  /\boverview\b/i,
];

function classifyQueryIntent(query: string) {
  return GENERAL_PATTERNS.some((p) => p.test(query))
    ? { type: "general" as const }
    : { type: "specific" as const };
}

const { data } = await supabaseAdmin.rpc("match_document_chunks", {
  query_embedding:    embedding,
  match_threshold:    isGeneral ? 0.2 : 0.3,
  match_count:        isGeneral ? 30  : 5,
  filter_session_id:  sessionId,
});

challenge-03.md · session · privacy · tradeoffs

▸ problem

Isolating one user's documents from another without an auth system.

constraint: No accounts, no JWT. Anything that lives only in localStorage is gone if the user clears it. UUIDs in headers are guessable in theory (v4 collisions are astronomically unlikely but the API has no proof of ownership).

▸ approach

Honest tradeoff: a UUID v4 is generated on first load, persisted in localStorage, and sent as x-session-id on every request. Every Postgres row carries session_id, every query filters by it. Acceptable for an MVP; a real launch needs Supabase Row-Level Security with auth.

lib/client-session.ts typescript

// Client side
export function getOrCreateSessionId(): string {
  let id = localStorage.getItem("askly-session-id");
  if (!id) {
    id = crypto.randomUUID();
    localStorage.setItem("askly-session-id", id);
  }
  return id;
}

// Server side (every API route)
const sessionId = req.headers.get("x-session-id");
if (!sessionId) return new Response("Unauthorized", { status: 401 });

const { data } = await supabase
  .from("chats")
  .select("*")
  .eq("session_id", sessionId);

// section 07 · testing & ci

$ cat ./testing.md

▸ strategy

Component tests on the chat UI and upload flow. API integration tests mock Supabase + HuggingFace. RAG logic tested with sample PDFs (chunking, embedding, retrieval). No Playwright e2e yet.

▸ tools

Jest@testing-library/react

// section 09 · results

$ cat ./results.md

01 /

6

Next.js API route handlers

02 /

15

React components

03 /

7

SQL migrations

04 /

7

document parsers (pdf/docx/xlsx/txt/csv/html/md)

05 /

384

embedding dimensions (HuggingFace)

▸ outcomes

MVP running. Users upload docs, ask questions, get streamed answers grounded in the source. Hybrid intent-based retrieval beats a fixed threshold. Rate limits on the embedding API are the current ceiling for large-file uploads — batching + backoff is in place but a paid tier would unblock real workloads.

// section 10 · lessons learned

$ cat ./lessons.md

// if I did it again

01 /
pgvector + HNSW is enough for an MVP RAG

The first instinct was to grab Pinecone. pgvector inside Supabase delivered query latencies that the UI did not even notice, with zero extra vendor cost. The day a real workload arrives, the index swap is one ALTER away.
02 /
Query intent classification > tuning a single threshold

Trying to find the one threshold value that works for both "what is this about?" and "what is the deadline on page 47?" is a dead end. Branching on intent (general vs specific) at the RPC call site beat every threshold I tried.

// mentions

$ grep -r "this-url" ./web

// replies, reposts and likes from the open web

▸ reactions

♡ 0 likes

↻ 0 reposts

↩ 0 replies

▸ replies & mentions

> Askly

$ cat ./discovery.md

$ cat ./architecture.md

$ cat ./sequences.md

$ cat ./infrastructure.md

$ cat ./implementation.md

$ cat ./challenges/*.md

Chunking PDFs without breaking sentences and without exhausting the embedding rate limit.

A fixed similarity threshold gives bad RAG quality: general questions ("what is this about?") need lots of low-similarity chunks; specific questions need few high-similarity ones.

Isolating one user's documents from another without an auth system.

$ cat ./testing.md

$ cat ./results.md

$ cat ./lessons.md

pgvector + HNSW is enough for an MVP RAG

Query intent classification > tuning a single threshold

$ grep -r "this-url" ./web

$ automata deploy --your-operation

navigation

case study

diagrams