Skip to content

Moderation

The Moderation API allows you to use OpenAIs harmful text API, and some of our own custom moderation functions:

  • Checking if text is harmful and in what harmful category it falls into
  • Analyzing sentiment
  • Redacting personal information

Config

Default config files are provided for Moderation endpoints at config/prompts/moderation/*.

For each you can define:

  • The model and fallback models used by chat, eg gpt-4-turbo
  • Options to pass to OpenAI, eg size or quality

Most image requests are quite simple so don’t need a lot of configuration.

Harmful text

Given some input text, outputs if the model classifies it as potentially harmful across several categories.

Chunks

OpenAI recommends that you split long text into chunks for moderation, this is handled automatically for you and input is chunked into sensible snippets.

If you’re not getting perfect results then you can tweak the chunk_size parameter in the moderation/harmful.yml file.

Redaction

Given some input text, checks if any Personally Identifiable Information is present and redacts it.

By default the following PII will be replaced by the text [redacted]:

  • Phone numbers
  • Email addresses
  • Physical addresses

For example:

Hey everyone, I'm selling my xbox, call me at 01789284776
if you're interested. It's available for collection at
21, Arden Close, Wilmcote or email me at hi@example.com.

Becomes:

Hey everyone, I'm selling my xbox, call me at [phone_redacted]
if you're interested. It's available for collection at
[address_redacted] or email me at [email_redacted].

The endpoint will also output the details of the items that have been redacted.

Sentiment

Given some text, outputs the sentiment either positive, negative, or neutral.