Moderation
The Moderation API allows you to use OpenAIs harmful text API, and some of our own custom moderation functions:
- Checking if text is harmful and in what harmful category it falls into
- Analyzing sentiment
- Redacting personal information
Config
Default config files are provided for Moderation endpoints at config/prompts/moderation/*.
For each you can define:
- The model and fallback models used by chat, eg gpt-4-turbo
- Options to pass to OpenAI, eg sizeorquality
Most image requests are quite simple so don’t need a lot of configuration.
Harmful text
Given some input text, outputs if the model classifies it as potentially harmful across several categories.
Chunks
OpenAI recommends that you split long text into chunks for moderation, this is handled automatically for you and input is chunked into sensible snippets.
If you’re not getting perfect results then you can tweak the chunk_size parameter in the moderation/harmful.yml file.
Redaction
Given some input text, checks if any Personally Identifiable Information is present and redacts it.
By default the following PII will be replaced by the text [redacted]:
- Phone numbers
- Email addresses
- Physical addresses
For example:
Hey everyone, I'm selling my xbox, call me at 01789284776if you're interested. It's available for collection at21, Arden Close, Wilmcote or email me at hi@example.com.Becomes:
Hey everyone, I'm selling my xbox, call me at [phone_redacted]if you're interested. It's available for collection at[address_redacted] or email me at [email_redacted].The endpoint will also output the details of the items that have been redacted.
Sentiment
Given some text, outputs the sentiment either positive, negative, or neutral.
