NLP Guard


nlp guard framework

With AI use expanding across industries, upcoming regulations like the EU AI Act will require that NLP models avoid using sensitive attributes—race, gender, or sexual orientation—in decision-making. Yet NLP models remain "black-boxes," making it hard to verify if they rely on such attributes. Traditional methods target demographic fairness but don't eliminate underlying bias.

To address this, Nokia Bell Labs introduces NLPGuard, a framework that reduces NLP models' dependence on sensitive attributes without sacrificing performance. NLPGuard works with existing models to identify and minimize reliance on protected terms in tasks like toxicity detection, sentiment analysis, and occupation classification.


How It Works

To utilize NLPGuard, users are required to input an unlabeled dataset, an NLP classifier, and the classifier's original training data. NLPGuard then processes this information to output a modified training dataset that has been adjusted to reduce the model's reliance on sensitive attributes. The NLPGuard framework comprises three main components (see the figure above):

  1. An Explainer that identifies which words are most influential in predictions,
  2. An Identifier that assesses if these words relate to protected attributes, and
  3. A Moderator that modifies the training data to retrain the model, reducing its use of these sensitive terms.

Impact and Availability

NLPGuard empowers organizations to meet AI fairness and privacy regulations, including GDPR and U.S. anti-discrimination laws. Nokia Bell Labs has made NLPGuard's code openly accessible, inviting developers to integrate it into NLP systems and further its application across diverse AI fields.


Publications

  • NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers. CSCW 2024 PDF