By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Fine-Tuning AI Models to Better Recognize Gender and Race in Stories | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Fine-Tuning AI Models to Better Recognize Gender and Race in Stories | HackerNoon
Computing

Fine-Tuning AI Models to Better Recognize Gender and Race in Stories | HackerNoon

News Room
Last updated: 2025/04/23 at 8:26 AM
News Room Published 23 April 2025
Share
SHARE

Authors:

(1) Evan Shieh, Young Data Scientists League ([email protected]);

(2) Faye-Marie Vassel, Stanford University;

(3) Cassidy Sugimoto, School of Public Policy, Georgia Institute of Technology;

(4) Thema Monroe-White, Schar School of Policy and Government & Department of Computer Science, George Mason University ([email protected]).

Table of Links

Abstract and 1 Introduction

1.1 Related Work and Contributions

2 Methods and Data Collection

2.1 Textual Identity Proxies and Socio-Psychological Harms

2.2 Modeling Gender, Sexual Orientation, and Race

3 Analysis

3.1 Harms of Omission

3.2 Harms of Subordination

3.3 Harms of Stereotyping

4 Discussion, Acknowledgements, and References

SUPPLEMENTAL MATERIALS

A OPERATIONALIZING POWER AND INTERSECTIONALITY

B EXTENDED TECHNICAL DETAILS

B.1 Modeling Gender and Sexual Orientation

B.2 Modeling Race

B.3 Automated Data Mining of Textual Cues

B.4 Representation Ratio

B.5 Subordination Ratio

B.6 Median Racialized Subordination Ratio

B.7 Extended Cues for Stereotype Analysis

B.8 Statistical Methods

C ADDITIONAL EXAMPLES

C.1 Most Common Names Generated by LM per Race

C.2 Additional Selected Examples of Full Synthetic Texts

D DATASHEET AND PUBLIC USE DISCLOSURES

D.1 Datasheet for Laissez-Faire Prompts Dataset

B.3 Automated Data Mining of Textual Cues

To measure harms of omission (see Supplemental B.4) we collect 1,000 generations per language model per prompt to produce an adequate number of total samples needed for modeling “small-N” populations [35]. On the resulting dataset of 500K stories, it is intractable to hand-extract textual cues from reading each individual story. Therefore, we fine-tune a language model (gpt-3.5-turbo) to perform automated extraction of gender references and names at high precision.

First, we hand-label inferred gender (based on gender references) and name on an evaluation set of 4,600 uniformly down-sampled story generations from all five models, ensuring all three domains and both power conditions are equally represented. This then provides us with a sample dataset to estimate precision and recall statistics on all 500K stories with high confidence (.0063 95CI).

Then, we use ChatGPT 3.5 (gpt-3.5-turbo) to perform automated labeling using the prompt templates shown in Table S7, chosen after iterating through candidate prompts and selecting based on precision and recall. Based on the scenarios and power conditions for each specific story prompt (see Supplement A, Tables S3, S4, and S5), we adjust the “Character” placeholder variable(s) in the prompt template.

For each label response we receive, we then attempt to parse the returned JSON response to perform programmatic post-processing to remove hallucinations (such as references or names that do not exist in the story texts). We report the results of this initial process in Table S8a.

We observe results in line with prior related studies of co-reference resolution that show automated systems to underperform on minoritized identity groups [58]. For example, we note that the pre-trained gpt-3.5-turbo model does not perform well for non-binary pronouns such as they/them, often having difficulty distinguishing between resolutions to individual characters versus groups.

To address such issues, we further hand-label 150 stories (outside of the evaluation dataset) with a specific focus on cases that we found the initial model to struggle with, including non-binary pronouns in the Love domain. This boosts our precision to above 98% for both gender references and names, as shown in Table S8b. Final recall for gender references reaches 97% for gender references and above 99% for names.

We note that fine-tuning a closed-source model such as ChatGPT has potential drawbacks, including lack of awareness if underlying models change. Additionally, OpenAI has not at the time of this writing released detailed information on the algorithms they use for fine-tuning. For future work, the choice of model need not be restricted to ChatGPT, and opensource alternatives may work just as well.

Table S7: Prompts Used for Automated LabelingTable S7: Prompts Used for Automated Labeling

Table S8: Co-reference Precision and Recall for AutolabelingTable S8: Co-reference Precision and Recall for Autolabeling

B.4 Representation Ratio

Using observed race and gender, we quantify statistical ratios corresponding to harms of omission and subordination. For a given demographic, we define the representation ratio as the proportion p of characters with the observed demographic divided by the proportion of the observed demographic in a comparison distribution p*.

The choice of comparison distribution p* varies depending on the desired context of study. For example, it could be used to compare against subject or occupation-specific percentages (see Tables S1 and S2). Given prior research observing how definitions of “fairness” may obscure systemic challenges faced by intersectional minoritized groups [37], we focus instead on measuring the relative degree to which our demographics of study are omitted or over-represented beyond sociological factors that already shape demographic composition to be unequal. Therefore, we set p* in our study to be the U.S. Census [83, 85], while noting that more progressive ideals of fairness (e.g. uniformly over-representing under-served groups) cannot be achieved without surpassing Census representation (as a lower standard).

Table S9: Calculations for Mapping Census Baselines for Gender and Sexual OrientationTable S9: Calculations for Mapping Census Baselines for Gender and Sexual Orientation

Six of seven racial categories are assigned a likelihood in the 2022 Census [83], excluding MENA as it was only proposed by the OMB in 2023. Therefore, we baseline MENA using overall representation in the Wikipedia dataset [57]. To compute p* for sexual orientation and gender identity (SOGI), we utilize the U.S. Census 2021 Household Pulse Survey (HPS) [85], which studies have shown to reduce known issues of undercounting LGBTQ+ identities [60]. See Table S9 for how we map SOGI to our gender and relationship type schema.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Cynomi cinches $37M for its AI-based ‘virtual CISO’ for SMB cybersecurity | News
Next Article Baidu Reportedly Launches Xinxiang Ai Agent for Android Smartphones
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Excited customers flock to stores for Xiaomi’s first EV: report · TechNode
Computing
Morgan Riddle in Wimbledon stuns in outfit while watching boyfriend Taylor Fritz
News
nsnfsSvNwhyyGgs
News
The End of the Guessing Game? Why Describing Data Beats Estimating It | HackerNoon
Computing

You Might also Like

Computing

Excited customers flock to stores for Xiaomi’s first EV: report · TechNode

1 Min Read
Computing

The End of the Guessing Game? Why Describing Data Beats Estimating It | HackerNoon

19 Min Read
Computing

Huawei to pre-install self-developed HarmonyOS on all new devices in 2025 · TechNode

1 Min Read
Computing

Midas And 0G Partner To Bring Real-World Assets To AI-Native Blockchain Infrastructure | HackerNoon

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?