Chat API
The Chat API allows you to easily create a Chat interface using OpenAI’s chat models.
- Querying and using embeddings for Retrieval Augmented Generation (RAG)
- Storing and sharing chat history
- Custom tools
- Vision, web crawling, image generation
Config
A default config file is provided for Chat endpoints at config/prompts/chat.yml
. This provides a decent base prompt for a helpful chatbot with access to some external tools.
There you can define:
- The model and fallback models used by chat, eg
GPT-4-Turbo
- System prompt
- Default user prompt
- Context prompt (see below)
- Options to pass to OpenAI, eg
temperature
ormax_tokens
Providing context for RAG
The Chat APIs will automatically search your Pinecone embeddings if a context ID is provided as part of the request. The extra context from Pinecone will be added to the request and used by the AI.
If you want additional text added to the system prompt if embeddings are added then you can edit the with_context
property in the config/chat.yml
file.
Tools & Function calling
All of our add-ons to Chat take place via an OpenAI feature called function calling. Essentially this is a mechanism where the AI can call arbitrary functions and understand the result of them.
In StartKit.AI by default this includes:
- Embeddings + RAG
- Crawling external URLs
- Creating images within Chat
- Parsing YouTube transcripts
You can implement your own calling functions easily by editing two files:
server/api/modules/chat/tools.js
server/api/modules/chat/functions.js
The tools.js
file contains the definitions of the functions that you want to the chat endpoint to know about. This is formatted as JSON Schema.
For example, here is a tool definition that will allow the AI to send an email:
You then need to write the function that matches the definition in server/api/modules/chat/functions.js
:
Your tool must return an object with the result
property or the call will fail. The result can be
in whatever format you like and that will be passed directly to the AI.
Sometimes I find that if the result is in natural language (even if there’s an error) then the AI will be able to translate it more nicely into a reply. However, it usually also deals well with if you return a JSON object too.
By default, the model will decide for itself if it should call a tool based on the message and context. If you want to force the chat to call a certain tool then you can set force_tool_choice
in the chat.yml
config file to the name of your function.
We also provide force_tool_rag
as an option. If you set this to true then the model will always query your embeddings database on every chat request.
This can be useful as sometimes when you ask Chat a question about the context of a document it wont bother to query the document if it thinks it already knows the answer. Setting force_tool_rag
solves this by forcing it to always make the query.
Streaming
We use event-streams a lot for streaming responses back to the client, this is what creates the feeling that the AI is typing it’s reply back to you in real-time, and means we can send very fast partial responses to the user. It’s essentially web-sockets but simpler.
The response is returned as text/event-stream
.
Examples
Here’s an example calling the Chat API for the first time and streaming a response:
We provide a helper function for the frontend to make this process easier: getStreamedResponse(url)
.
Encoding
Each chunk of content in the stream is UTF-8 encoded as base64 (this means you wont have problems with special characters and newlines).
We decode the stream in our demos like this: