ModerationCategory
Represents categories for content moderation used to classify potentially harmful or inappropriate content. These categories help identify specific types of violations that content may fall under.
Inheritors
Types
Responses that are both verifiably false and likely to injure a living person’s reputation
Responses that contain factually incorrect information about electoral systems and processes, including in the time, place, or manner of voting in civic elections
Represents the "Harassment" moderation category.
Represents the category of moderation specifically focused on identifying content that involves harassment with a threatening nature.
Represents content categorized as hate speech or related material.
Represents the HATE_THREATENING moderation category.
Represents the moderation category for content that may involve illegal or illicit activities. This category is used to identify content that violates legal frameworks or ethical guidelines.
Represents content classified as both illicit and violent in nature.
Responses that may violate the intellectual property rights of any third party
Represents a predefined moderation category for cases associated with misconduct.
Responses that contain sensitive, nonpublic personal information that could undermine someone’s physical, digital, or financial security
Represents a specific moderation category for identifying and handling potential prompt attacks.
Represents the "SELF_HARM" moderation category. This category is used to identify content that pertains to self-harm or related behavior.
Represents the moderation category for instructions or content that encourages or promotes self-harm.
Represents content that explicitly indicates an intent of self-harm.
Represents content categorized as sexual in nature.
Represents content related to sexual material involving minors.
Responses that contain specialized financial, medical, or legal advice, or that indicate dangerous activities or objects are safe
Represents the category of content classified as violent behavior or actions.
Represents the VIOLENCE_GRAPHIC moderation category.