📚 Agent Knowledge

Search documents and scraped web pages using semantic similarity for RAG-powered AI workflows.

Overview

The Agent Knowledge node enables RAG (Retrieval-Augmented Generation) in your workflows. Add content to your agent's knowledge base — either by uploading files or by pointing it at a URL — then use this node to search it with semantic similarity. Feed the results into an AI Prompt node so your AI can answer questions grounded in your own data.

How It Works

  1. Add content — upload PDF, DOCX, TXT, or Markdown files, or use Add URL to ingest a single web page or crawl a whole website. Both sources flow into the same knowledge base.
  2. Automatic processing — content is chunked (with markdown-aware section splitting for web pages) and embedded using OpenAI text-embedding-3-small.
  3. Search with the node — the Agent Knowledge node converts your query into a vector embedding and performs a cosine similarity search against the stored chunks using pgvector.
  4. Feed results to AI — pass the search results to an AI Prompt node as context, enabling the AI to answer questions based on your documents.
Knowledge Base settings showing uploaded documents and URL sources side by side

Configuration

  • Query — the search text to find relevant documents. Supports{{variables}} for dynamic queries (e.g. {{$json.message}} or{{trigger.body}}).
  • Top K — number of results to return (default: 5). Higher values return more context but use more tokens.
  • Minimum Score — similarity threshold from 0.0 to 1.0 (default: 0.0). Set higher to filter out less relevant results.
  • Output Variable — variable name to store the search results (default: knowledge_results).
  • Include Metadata — whether to include source file name and chunk metadata in results.

Output Format

Results are stored in the output variable as an object with a results array. Each result includes:

  • {{knowledge_results.results[0].content}} — the matched text chunk
  • {{knowledge_results.results[0].score}} — similarity score (0.0 to 1.0)
  • {{knowledge_results.results[0].source}} — source file name (when metadata is enabled)

Common Use Cases

  • Customer support chatbot — upload product docs, FAQs, and policies so the AI can answer customer questions accurately
  • Internal knowledge assistant — upload company handbooks, SOPs, and training materials for employee self-service
  • Document Q&A — upload contracts, reports, or research papers and ask questions about their content

Adding content from a URL

Click Add URL on the Knowledge Base tab to pull content directly from the web. There are two modes:

  • Single URL — ingest one page. Best when you know the exact URLs you want to add (a pricing page, a specific FAQ, a help article).
  • Whole Website — start from one URL and follow links across the site. Configure how deep the crawl goes, how many pages to ingest, and an optional URL pattern (e.g. */blog/*) to limit which pages are included. There's a hard cap of 500 pages per crawl to protect against runaway ingestion.
Add URL modal in Single URL mode
Single URL: one page, optional auto refresh.
Add URL modal in Whole Website mode
Whole Website: depth, page cap, pattern filter, domain scope.

Each ingested page becomes its own entry in the knowledge base, with the page URL stored as the source. You can delete a URL source at any time, and all pages it produced are removed with it.

Keeping content fresh

Web content changes. When adding a URL source, you can enable Auto refresh to re-scrape on a schedule (every 7, 14, or 30 days). The knowledge base re-fetches the page or re-crawls the site and replaces the previous content automatically.

To force an immediate refresh outside the scheduled interval, delete the URL source and add it again. A dedicated "Re-run now" control is on the roadmap.

Supported sources

  • PDF (.pdf)
  • Microsoft Word (.docx, .doc)
  • Plain text (.txt)
  • Markdown (.md)
  • Web pages — single URL or full-site crawl (up to 500 pages per source)

Storage Options

Uploaded files and scraped web pages are stored in Supabase Storage (default) or your own AWS S3 bucket. Configure S3 credentials in the agent settings to use your own storage.