Research

Perspective API is a collaborative research effort exploring AI as a tool for better discussions online

Perspective API is the product of a collaborative research effort by Jigsaw and Google’s Counter Abuse Technology team exploring machine learning as a tool for better discussions online. The team routinely publishes datasets, academic research, and open source code as part of their commitment to transparency and innovation in natural language processing and machine learning.

Public Datasets
Open Source Code
Research Contributions

The challenges of maintaining healthy conversations online are significant, and we know we cannot solve them alone. To enable academic and industry research in the field, we create public datasets whenever possible.

Jigsaw Unintended Bias in Toxicity Kaggle Competition

A public Kaggle competition, based on ~2 million comments from the Civil Comments platform, which shut down in 2017. This data is annotated for toxicity, toxicity sub-types, and mentions of identities, which enables evaluation of unintended bias with respect to identity mentions. See the Kaggle page as well as our academic paper Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification for detailed description of data source and annotation schema. This dataset is also available on TensorFlow Datasets.

Toxic Comment Classification Kaggle Competition

A public Kaggle competition, based on a crowdsourced dataset that includes 4 toxicity sub-types, and approximately 160k human labelled comments from Wikipedia Talk pages. The labelled annotations are based on asking 5000 crowd-workers to rate Wikipedia comments according to their toxicity. This dataset is also available on Figshare as the Wikipedia Human Annotations of Toxicity on Talk Pages.

Jigsaw Multilingual Toxic Comment Classification Kaggle Competition

A public Kaggle competition that challenges participants to use the data from the previous two Kaggle competitions to build a multilingual toxicity model.

Wikipedia Human Annotations of Personal Attacks on Talk Pages

100k Comments from Wikipedia each with 10 annotations by the 4000 annotators who contributed to the effort. Each comment annotation notes whether the annotator considers the comment to be a personal attack or not.

Wikipedia Machine Annotations of Talk Pages

Machine-labelled annotations for every English Wikipedia talk page comment from 2001 to 2015, approximately 95 million comments to support large scale data analysis.

Constructive Comments Corpus

A collection of 12,000 news comments that have been annotated for positive contributions to online conversations. This is a collaboration between Simon Fraser University and Jigsaw, and is soon to appear in a First Monday special issue on abusive language online.

Unhealthy Comments Corpus

A collection of 44,000 comments that have been annotated for a variety of subtle aspects of unhealthiness, including sarcasm, antagonism, and condescension. This dataset was a collaboration between the University of Oxford and Jigsaw and will be published at the Workshop on Online Abuse and Harms.

Toxicity in Context Dataset

A dataset derived from Unintended Bias Kaggle competition forms the basis for a context-aware dataset that has been annotated by raters who could see the previous comment as part of a study measuring the importance of context for moderation. This collaboration between Athens University of Economics and Business and Jigsaw appeared at ACL 2020.

Our open source repositories provide a range of examples using Perspective, from fully-fledged tools to experimental demos, as well as examples of tools we leverage to build our machine learning models.

Tools built using Perspective

Moderator

A moderation tool to support using machine learning models to assist a human review process (used by the New York Times).

An Authorship Experience

Code to build an authorship experience that gives feedback to people as they type. This is used in our public demo of perspective API, but the code repository includes many additional features and ways to create other authorship experiences.

Tune

An experimental Chrome extension that lets people customize how much toxicity they want to see in comments across the internet. Tune uses Perspective to let people set the “volume” of conversations on a number of popular platforms, including YouTube, Facebook, Twitter, Reddit, and Disqus. The extension is available for download in the Chrome Web Store.

Perspective Hacks Gallery

A collection of concepts and demos built using Perspective API.

Example code for calling Perspective

perspectiveapi-js-client

A simple JavaScript client library for calling the Perspective API.

perspectiveapi-simple-server

A simple Express based proxy server that can hold your API-key and calls the Perspective API.

perspectiveapi-proxy

An Express based simple proxy server that can be used to provide restricted access to your Perspective API cloud project.

perspectiveapi-appscript

Example code using Perspective API with Google Apps Script.

Model building tools

Measuring and Mitigating Unintended Bias

Our repository for tools to measure and mitigated unintended bias in our models.

WikiDetox

Collaborative work with Wikimedia to create a useful corpus of Talk Page conversations on Wikipedia.

Conversation AI Models

Example code to train machine learning models for text

The team behind Perspective API regularly publishes research in academic forums.

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

Demonstrates that rater identity plays a statistically significant role in how raters annotate toxicity for identity-related annotations, and compares models trained on annotations from several different identity-based rater pools.

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Introduces a novel framework for dataset developers to facilitate transparent documentation of key decision points at various stages of the ML data pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

Lost in Distillation: A Case Study in Toxicity Modeling

Demonstrates that models distilled from large language models often have hidden performance costs especially in terms of identity-based bias.

"You have to prove the threat is real": Understanding the needs of Female Journalists and Activists to Document and Report Online Harassment

Introduces a research framework to highlight the documentation and reporting needs of female journalists and activists undergoing significant harassment on social media platforms, and validates those needs by designing a prototype tool called Harassment Manager.

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

Presents the Charformer multilingual text classification model that is used in PerspectiveAPI and the techniques used to minimize bias and maximize the benefits of cross-lingual classification. This model shows across the board improvements, especially for emoji and code-switching data commonly used in user generated content.

From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer

Expands upon the work that resulted in the SemEval Toxic Spans evaluation from 2021 to present a range of techniques used to identify spans that are associated with comments receiving toxic ratings and present a method of suggesting alternative content that conveys the same ideas but in a civil fashion, when this is possible.

Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

Surveys an array of literature on human computation, with a focus on ethical considerations around crowdsourcing, and lays out challenges associated with who the annotator is, how the annotators’ lived experiences can impact their annotations, and the relationship between the annotators and the crowdsourcing platforms, including putting forth a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline.

Toxicity Detection can be Sensitive to the Conversational Context

Constructs and releases a dataset of posts with two kinds of toxicity labels, depending on whether annotators considered the post with the previous one as additional context or without additional context, and based on this introduces context sensitivity estimation, a new task which aims to identify posts whose perceived toxicity changes of the context is also considered.

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation

Introduces new metrics enabling the rigorous study of content moderation as a human-AI collaborative process, and demonstrates that state-of-the-art uncertainty models enable new collaborative review strategies improving the overall collaborative moderator-model system's performance.

A large-scale characterization of online incitements to harassment across platforms

Examines incitements and calls to harass posted by members of certain online communities as a lens through which to holistically measure and understand a broad range of harassment strategies, including developing a taxonomy to categorize the preferred approaches of coordinated attackers and providing suggestions for actions and future research that could be performed by researchers, platforms, authorities, and anti-harassment groups.

Semeval-2021 task 5: Toxic spans detection

Describes the Toxic Spans Detection task of SemEval-2021, which required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. Summarizes the results of the participants and their major strategies for this competition.

Civil rephrases of toxic texts with self-supervised transformers

Develops a new model, CAE-T5, that can help suggest rephrasings of toxic comments in a more civil manner, inspired by recent progress in unpaired sequence-to-sequence tasks.

Capturing Covertly Toxic Speech via Crowdsourcing

Studies the task of labeling covert or veiled toxicity in online conversations, including introducing a dataset categorizing different types of covert toxicity, and evaluating models on the task.

Six attributes of unhealthy conversation

Presents a new dataset of comments annotated for their impact on the overall health of a conversation, including annotating for a new typology of potentially unhealthy sub-attributes.

Toxicity detection: Does context really matter?

Finds that context can affect human judgments of toxicity, either amplifying or mitigating the perceived toxicity of posts, and that a significant subset of annotations can be flipped if annotators are not provided with context, but that context surprisingly does not appear to improve the performance of toxicity classifiers.

Classifying constructive comments

Introduces the Constructive Comments Corpus, a new dataset intended to help build new tools for online communities to improve the quality of their discussions, including a taxonomy of sub-characteristics of constructiveness. Together with new machine learning models for constructiveness, this paves the way for moderation tools focused on promoting comments that contribute to a discussion rather than only filtering out undesirable content.

Jigsaw@ AMI and HaSpeeDe2: Fine-Tuning a Pre-Trained Comment-Domain BERT Model

Describes our submissions for two of the EVALITA (Evaluation of NLP and Speech Tools for Italian) 2020 shared tasks, based in part on the technology that powers Perspective, and reviews the types of errors our system made in the shared tasks.

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT

Presents the application of two strong baseline systems for toxicity detection, and evaluates their performance in identifying and categorizing offensive language in social media.

Debiasing Embeddings for Reduced Gender Bias in Text Classification

Demonstrates how traditional techniques for debiasing word embeddings can actually increase model bias on downstream tasks and proposes novel debiasing methods to ameliorate the issue.

Model Cards for Model Reporting

Proposes a framework to encourage transparent reporting of the context, use-cases, and performance characteristics of machine learning models across domains.

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

Introduces a suite of threshold-agnostic metrics that provide a nuanced view of unintended bias in text classification, by exploring the various ways that a classifier’s score distribution can vary across designated groups.

Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions

Discusses open questions and research challenges toward the goal of effective crowdsourcing of online toxicity as well as presenting a survey of recent work that addresses these.

WikiDetox Visualization

Presents a novel data visualization and moderation tool for Wikipedia that is built on top of the Perspective API.

Conversations Gone Awry: Detecting Early Signs of Conversational Failure

Introduces the task of predicting whether a given conversation is on the verge of being derailed by the antisocial actions of one of its participants and demonstrates that a simple model using conversational and linguistic features can achieve performance close to that of humans for this task.

Measuring and Mitigating Unintended Bias in Text Classification

Develops methods for measuring the unintended bias in a text classifier according to terms that appear in the text, as well as approaches to help mitigate them. The limitations of these methods are expanded on in the follow up paper Limitations of Pinned AUC for Measuring Unintended Bias.

Correlating Self-Report and Trace Data Measures of Incivility: A Proof of Concept

Connects trace data and machine learning classifiers to self-reported survey information about user’s online behaviour demonstrating the correlation between the two.

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Presents an unprecedented view of the complete history of conversations between contributors of English Wikipedia by recording the intermediate states of conversations—including not only comments and replies, but also their modifications, deletions and restorations.

Ex Machina: Personal attacks seen at scale

Outlines how crowdsourcing and machine learning can be used to scale our understanding of online personal attacks and applies these methods to the challenge on Wikipedia.

Network Traffic Obfuscation and Automated Internet Censorship

Surveys approaches that use machine learning to obfuscate network traffic to circumvent censorship.

Looking to learn more? Visit our Developers site for more technical information.

Go to developers site