Skip to content

Chat API

The Chat API allows you to easily create a Chat interface using OpenAI’s chat models.

  • Querying and using embeddings for Retrieval Augmented Generation (RAG)
  • Storing and sharing chat history
  • Custom tools
  • Vision, web crawling, image generation


A default config file is provided for Chat endpoints at config/prompts/chat.yml. This provides a decent base prompt for a helpful chatbot with access to some external tools.

There you can define:

  • The model and fallback models used by chat, eg GPT-4-Turbo
  • System prompt
  • Default user prompt
  • Context prompt (see below)
  • Options to pass to OpenAI, eg temperature or max_tokens

Providing context for RAG

The Chat APIs will automatically search your Pinecone embeddings if a context ID is provided as part of the request. The extra context from Pinecone will be added to the request and used by the AI.

If you want additional text added to the system prompt if embeddings are added then you can edit the with_context property in the config/chat.yml file.

Tools & Function calling

All of our add-ons to Chat take place via an OpenAI feature called function calling. Essentially this is a mechanism where the AI can call arbitrary functions and understand the result of them.

In StartKit.AI by default this includes:

  • Embeddings + RAG
  • Crawling external URLs
  • Creating images within Chat
  • Parsing YouTube transcripts

You can implement your own calling functions easily by editing two files:

  • server/services/ai/chat/tools.js
  • server/services/ai/chat/functions.js

The tools.js file contains the definitions of the functions that you want to the chat endpoint to know about. This is formatted as JSON Schema.

For example, here is a tool definition that will allow the AI to send an email:

"type": "function",
"function": {
"name": "sendEmail",
"description": "Use to send an email with specific text to a provided email address. The email will be sent from",
"parameters": {
"type": "object",
"properties": {
"toEmail": {
"type": "string",
"title": "toEmail",
"description": "The email address to send the email to"
"body": {
"type": "string",
"title": "body",
"description": "The body of the email in plaintext"
"subject": {
"type": "string",
"title": "subject",
"description": "The subject of the email"
"required": ["toEmail", "body", "subject"]

You then need to write the function that matches the definition in server/services/ai/chat/functions.js:

sendEmail: async function ({ toEmail, subject, content }) {
const { error } = await resend.emails.send({
from: sendingAddress,
to: email,
text: content
let result = '';
if (error) {
result = `The email failed to send. The error was: ${error.message}`
} else {
result = `The email sent successfully`;
return {
usage: {}

Your tool must return an object with the result property or the call will fail. The result can be in whatever format you like and that will be passed directly to the AI.

Sometimes I find that if the result is in natural language (even if there’s an error) then the AI will be able to translate it more nicely into a reply. However, it usually also deals well with if you return a JSON object too.

By default, the model will decide for itself if it should call a tool based on the message and context. If you want to force the chat to call a certain tool then you can set force_tool_choice in the chat.yml config file to the name of your function.

force_tool_choice: createImage

We also provide force_tool_rag as an option. If you set this to true then the model will always query your embeddings database on every chat request.

force_tool_rag: true

This can be useful as sometimes when you ask Chat a question about the context of a document it wont bother to query the document if it thinks it already knows the answer. Setting force_tool_rag solves this by forcing it to always make the query.


We use event-streams a lot for streaming responses back to the client, this is what creates the feeling that the AI is typing it’s reply back to you in real-time, and means we can send very fast partial responses to the user. It’s essentially web-sockets but simpler.

The response is returned as text/event-stream.


Here’s an example calling the Chat API for the first time and streaming a response:

const response = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Accept: 'text/event-stream'
body: JSON.stringify({
text: 'What is the prime directive?'
for await (let message of parseMessages(response)) {
const { event, data } = message;
switch (event) {
case 'content':
// recevied AI response chunk 🎉
case 'metadata':
case 'result':
// metadata contains any functions called as a JSON object
// result contains the usage data as a JSON object
console.log(`${event}:`, JSON.parse(decode(data)));
case 'end':
// the stream is finished
console.warn('received unknown event');

We provide a helper function for the frontend to make this process easier: getStreamedResponse(url).


Each chunk of content in the stream is UTF-8 encoded as base64 (this means you wont have problems with special characters and newlines).

We decode the stream in our demos like this:

function decode(base64) {
const bytes = Uint8Array.from(atob(base64), (c) => c.charCodeAt(0));
return new TextDecoder('utf-8').decode(bytes);