Askly | Automata
sections (10)
$ cd .. // back to projects
automata@latam: ~/projects/askly
PRODUCT · slug: askly ·2026 · 4 min read · 826 words

> Askly

// Session-isolated RAG chatbot: upload PDF/DOCX/XLSX, ask questions, get streamed answers grounded in your documents — pgvector + HuggingFace embeddings + MiniMax LLM.

Next.js 16React 19SupabasepgvectorHuggingFaceMiniMax M2Vercel AI SDK
Askly
▸ role
Full-stack · RAG
▸ team
Solo
▸ status
online
// section 01 · discovery

$ cat ./discovery.md

▸ overview

Upload a PDF, Word doc or spreadsheet and chat with it. Frontend is a three-panel Next.js app: chat history, messages + input, document library. Uploads are parsed (unpdf / mammoth / xlsx), chunked (1000 chars, 200 overlap) and embedded via HuggingFace Inference (384-dim). Embeddings live in Supabase with the pgvector extension and a HNSW index. On a question, the query is embedded, matched against the HNSW index, the top chunks are stitched into a MiniMax M2 prompt, and the response streams back via the Vercel AI SDK. Sessions are browser-isolated (localStorage UUID, x-session-id header) — no accounts.

▸ problem

Keyword search over uploaded documents fails as soon as the question wording differs from the text. The proper fix is semantic search, but that usually means a managed vector DB (Pinecone, Weaviate). pgvector inside Supabase is good enough for the use case and removes one vendor.

▸ audience

Developers, researchers and students who want to chat with their own documents privately, without the materials leaving an account they own.

// section 03 · architecture

$ cat ./architecture.md

// section 03b · sequences

$ cat ./sequences.md

// runtime flows — who talks to whom, in what order

// From a typed question to a streamed answer that cites the document chunks it used.

RAG query — question to grounded answer · flow-01
loading…
// section 04 · infrastructure

$ cat ./infrastructure.md

▸ services
provider: Supabase (Postgres + pgvector) + HuggingFace Inference + MiniMax API
  • Next.js 16 App Router on Vercel
  • Supabase Postgres with pgvector extension
  • HNSW index on document_chunks.embedding
  • HuggingFace Inference API (384-dim embeddings)
  • MiniMax M2 (LLM, streaming responses)
  • Vercel AI SDK for the streaming UI
// section 05 · implementation

$ cat ./implementation.md

▸ frontend
  • · Next.js 16
  • · React 19
  • · Tailwind CSS 4
  • · Vercel AI SDK (useChat)
  • · react-markdown + GFM
▸ backend
  • · Next.js API routes (6 handlers)
  • · Supabase JS client (service role for admin writes)
  • · LangChain RecursiveCharacterTextSplitter
  • · unpdf / mammoth / xlsx parsers
  • · Vercel AI SDK for streaming
▸ data
  • · Postgres 16 (Supabase managed)
  • · pgvector with HNSW index
  • · 7 SQL migrations (initial schema, embedding dims, doc names in match results, index tuning, sessions/chats)
▸ devops
  • · Vercel hosting
  • · NODE_OPTIONS=--max-old-space-size=4096 for large-file uploads
// section 06 · technical challenges

$ cat ./challenges/*.md

// 3 technical problems solved

01 / 03
challenge-01.md · rag · chunking · embeddings
▸ problem

Chunking PDFs without breaking sentences and without exhausting the embedding rate limit.

constraint: HuggingFace Inference rate-limits aggressively; a 200-page PDF can produce 800+ chunks. Each retried call counts. Chunks too small lose semantic coherence; too large overflow the LLM context window when stitched.

▸ approach

RecursiveCharacterTextSplitter with chunkSize=1000 and chunkOverlap=200. Embeddings are batched (10-20 at a time) with exponential backoff on 429. Chunking and embedding happen on the API route side, never in the browser.

lib/chunking.ts typescript
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await splitter.createDocuments([cleanText]);

return docs.map((doc, index) => ({
  content: doc.pageContent,
  index,
}));
challenge-02.md · rag · intent · search
▸ problem

A fixed similarity threshold gives bad RAG quality: general questions ("what is this about?") need lots of low-similarity chunks; specific questions need few high-similarity ones.

constraint: pgvector HNSW does not let you change the threshold per query without re-issuing the RPC. The split between "general" and "specific" cannot be inferred from embedding similarity alone — by then it is too late.

▸ approach

A simple regex-based intent classifier looks at the raw query (Spanish + English patterns: "de que trata", "summary", "overview"). General intent → match_count = 30, threshold = 0.2. Specific intent → match_count = 5, threshold = 0.3. The matched document names are injected into the prompt so the model can cite sources.

lib/rag.ts typescript
const GENERAL_PATTERNS = [
  /\bde\s+qu[]\s+(va|trata|habla)\b/i, // ES: "de que trata"
  /\bsummar(y|ize|ise)\b/i,                  // EN
  /\boverview\b/i,
];

function classifyQueryIntent(query: string) {
  return GENERAL_PATTERNS.some((p) => p.test(query))
    ? { type: "general" as const }
    : { type: "specific" as const };
}

const { data } = await supabaseAdmin.rpc("match_document_chunks", {
  query_embedding:    embedding,
  match_threshold:    isGeneral ? 0.2 : 0.3,
  match_count:        isGeneral ? 30  : 5,
  filter_session_id:  sessionId,
});
challenge-03.md · session · privacy · tradeoffs
▸ problem

Isolating one user's documents from another without an auth system.

constraint: No accounts, no JWT. Anything that lives only in localStorage is gone if the user clears it. UUIDs in headers are guessable in theory (v4 collisions are astronomically unlikely but the API has no proof of ownership).

▸ approach

Honest tradeoff: a UUID v4 is generated on first load, persisted in localStorage, and sent as x-session-id on every request. Every Postgres row carries session_id, every query filters by it. Acceptable for an MVP; a real launch needs Supabase Row-Level Security with auth.

lib/client-session.ts typescript
// Client side
export function getOrCreateSessionId(): string {
  let id = localStorage.getItem("askly-session-id");
  if (!id) {
    id = crypto.randomUUID();
    localStorage.setItem("askly-session-id", id);
  }
  return id;
}

// Server side (every API route)
const sessionId = req.headers.get("x-session-id");
if (!sessionId) return new Response("Unauthorized", { status: 401 });

const { data } = await supabase
  .from("chats")
  .select("*")
  .eq("session_id", sessionId);
// section 07 · testing & ci

$ cat ./testing.md

▸ strategy

Component tests on the chat UI and upload flow. API integration tests mock Supabase + HuggingFace. RAG logic tested with sample PDFs (chunking, embedding, retrieval). No Playwright e2e yet.

▸ tools
Jest@testing-library/react
// section 09 · results

$ cat ./results.md

01 /
6
Next.js API route handlers
02 /
15
React components
03 /
7
SQL migrations
04 /
7
document parsers (pdf/docx/xlsx/txt/csv/html/md)
05 /
384
embedding dimensions (HuggingFace)
▸ outcomes

MVP running. Users upload docs, ask questions, get streamed answers grounded in the source. Hybrid intent-based retrieval beats a fixed threshold. Rate limits on the embedding API are the current ceiling for large-file uploads — batching + backoff is in place but a paid tier would unblock real workloads.

// section 10 · lessons learned

$ cat ./lessons.md

// if I did it again

  • 01 /

    pgvector + HNSW is enough for an MVP RAG

    The first instinct was to grab Pinecone. pgvector inside Supabase delivered query latencies that the UI did not even notice, with zero extra vendor cost. The day a real workload arrives, the index swap is one ALTER away.

  • 02 /

    Query intent classification > tuning a single threshold

    Trying to find the one threshold value that works for both "what is this about?" and "what is the deadline on page 47?" is a dead end. Branching on intent (general vs specific) at the RPC call site beat every threshold I tried.

// next step

$ automata deploy --your-operation

// Let's talk about adapting this to your case.

./let-s-talk.sh