By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Study Finds AI Responses Rated Higher When Context is Limited | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Study Finds AI Responses Rated Higher When Context is Limited | HackerNoon
Computing

Study Finds AI Responses Rated Higher When Context is Limited | HackerNoon

News Room
Last updated: 2025/04/07 at 12:59 PM
News Room Published 7 April 2025
Share
SHARE

Authors:

(1) Clemencia Siro, University of Amsterdam, Amsterdam, The Netherlands;

(2) Mohammad Aliannejadi, University of Amsterdam, Amsterdam, The Netherlands;

(3) Maarten de Rijke, University of Amsterdam, Amsterdam, The Netherlands.

Table of Links

Abstract and 1 Introduction

2 Methodology and 2.1 Experimental data and tasks

2.2 Automatic generation of diverse dialogue contexts

2.3 Crowdsource experiments

2.4 Experimental conditions

2.5 Participants

3 Results and Analysis and 3.1 Data statistics

3.2 RQ1: Effect of varying amount of dialogue context

3.3 RQ2: Effect of automatically generated dialogue context

4 Discussion and Implications

5 Related Work

6 Conclusion, Limitations, and Ethical Considerations

7 Acknowledgements and References

A. Appendix

3 Results and Analysis

We address (RQ1) and (RQ2) by providing an overview of the results and in-depth analysis of our crowdsourcing experiments. We first describe the key data statistics.

3.1 Data statistics

Phase 1. Figure 1 presents the distributions of relevance and usefulness ratings across the three variations, C0, C3, and C7. Figure 1a indicates a larger number of dialogues rated as relevant when annotators had no prior context (C0), compared to instances of C3 and C7, where a lower number

Figure 1: Distribution of (a) relevance and (b) usefulness labels for dialogue annotations in Phase 1.Figure 1: Distribution of (a) relevance and (b) usefulness labels for dialogue annotations in Phase 1.

of dialogues received such ratings. This suggests that in the absence of prior context, annotators are more inclined to perceive the system’s response as relevant, as they lack evidence to assert otherwise. This trend is particularly prevalent when user utterances lean towards casual conversations, such as inquiring about a previously mentioned movie or requesting a similar recommendation to their initial query, aspects to which the annotators have no access. Consequently, this suggests that annotators rely on assumptions regarding the user’s previous inquiries, leading to higher ratings for system response relevance.

We observe a similar trend for usefulness (Figure 1b), compared to C3 and C7, C0 has more dialogues rated as useful. The introduction of the user’s next utterance introduced some level of ambiguity to annotators. Evident in instances where the user introduced a new item not mentioned in the system’s response and expressed an intention to watch it, the usefulness of the system’s response became uncertain. This ambiguity arises particularly when annotators lack access to prior context, making it challenging to tell if the movie was mentioned before in the preceding context.

These observations highlight the impact of the amount of dialogue context on the annotators’ perceptions of relevance and usefulness in Phase 1. This emphasizes the significance of taking contextual factors into account when evaluating TDSs.

Phase 2. In Phase 2, we present findings on how different types of dialogue contexts influence the annotation of relevance and usefulness labels. When the dialogue summary is included as supplementary information for the turn under evaluation (C0-sum), a higher proportion of dialogues are annotated as relevant compared to C0-llm for relevance (60% vs. 52.5%, respectively); see Figure 2a.

In contrast to the observations made for relevance, we see in Figure 2b that a higher percentage of dialogues are predominantly labeled as not useful when additional information is provided to the annotators. This accounts for 60% in C0-heu, 47.5% in C0-llm, and 45% in C0-sum. This trend is consistent with our observations from Phase 1, highlighting that while system responses may be relevant, they do not always align with the user’s actual information need. We find that C0-sum exhibits the highest number of dialogues rated as useful, indicating its effectiveness in providing pertinent information to aid annotators in making informed judgments regarding usefulness.

Figure 2: Distribution of (a) relevance and (b) usefulness ratings when annotators have access to additional context in C0 Phase 2.Figure 2: Distribution of (a) relevance and (b) usefulness ratings when annotators have access to additional context in C0 Phase 2.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article What is Samsung Onyx Cinema LED? The next-gen theatre screen explained
Next Article Cookie Cutter: Overkill Edition is out on Nintendo Switch and is a gory joy
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

U.S. and China agree to a temporary trade deal – but tariffs are still higher than what they were before Trump
News
NAK Compiler For Mesa’s NVK Driver Adds Support For More NVIDIA Kepler GPUs
Computing
macOS 16 to enable clipboard privacy protection – 9to5Mac
News
New Apple TV+ docuseries from Gordon Ramsay serves Michelin-star drama on a silver platter
News

You Might also Like

Computing

NAK Compiler For Mesa’s NVK Driver Adds Support For More NVIDIA Kepler GPUs

1 Min Read
Computing

THE FUN CONTINUES AFTER DARK AT BEYOND EXPO 2025: YOUR ULTIMATE PARTY GUIDE · TechNode

4 Min Read
Computing

7 Social Media Monetization Options For Creators in 2025

46 Min Read
Computing

The HackerNoon Newsletter: Good Programmers Always Be Refactoring (5/12/2025) | HackerNoon

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?