By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How AI Learns to Summarize Movies Like a Human | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How AI Learns to Summarize Movies Like a Human | HackerNoon
Computing

How AI Learns to Summarize Movies Like a Human | HackerNoon

News Room
Last updated: 2025/04/09 at 11:45 PM
News Room Published 9 April 2025
Share
SHARE

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Table of Links

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Abstract

Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.[1]

1. Introduction

Abstractive summarization is the process of reducing an information source to its most important content by generating a coherent summary. Previous work has primarily focused on news (Cheng and Lapata, 2016; Gehrmann et al., 2018), meetings (Zhong et al., 2021), and dialogues (Zhong et al., 2022; Zhu et al., 2021a), but there is limited prior work on summarizing long-form narrative texts such as movie scripts (Gorinski and Lapata, 2015; Chen et al., 2022).

Long-form narrative summarization poses challenges to large language models (Beltagy et al., 2020; Zhang et al., 2020a; Huang et al., 2021) both in terms of memory complexity and in terms of attending to salient information in the text. Large language models perform poorly for long sequence lengths in zero-shot settings compared to finetuned models (Shaham et al., 2023). Recently, Liu et al. (2024) showed that the performance of these models degrades when the relevant information is present in the middle of a long document. With an average length of 110 pages, movie scripts are therefore challenging to summarize.

Several methods have previously relied on content selection for summarization to reduce the input size by either performing content selection implicitly using neural network attention (Chen and Bansal, 2018; You et al., 2019; Zhong et al., 2021) or explicitly (Ladhak et al., 2020; Manakul and Gales, 2021; Zhang et al., 2022) by aligning the source document with the summary using metrics such as ROUGE (Lin, 2004). Unlike for news articles, the implicit attention-based method is problematic for movie scripts, as current methods cannot reliably process text of such length. On the other hand, current explicit methods are neither optimized nor evaluated for content selection using gold-standard labels. In addition, considering the large number of sentences in movies that contain repeated mentions of characters and locations, a method based on a lexical overlap metric such as ROUGE creates many false positives. Crucially, all these methods use source–summary alignment as an auxiliary task without actually optimizing or evaluating this task.

For news summarization, Ernst et al. (2021) created crowd-sourced development and test sets for the evaluation of proposition-level alignment. However, news texts differ from movie scripts both in length and in terms of the rigid inverted pyramid structure that is typical for news articles. For movie scripts, Mirza et al. (2021) proposed a specialized alignment method which they evaluated on a set of 10 movies. However, they do not perform movie script summarization.

Movie scripts are structured in terms of scenes, where each scene describes a distinct plot element and hapening at a fixed place and time, and involving a fixed set of characters. It therefore makes sense to formalize movie summarization as the identification of the most salient scenes from a movie, followed by the generation of an abstractive summary of those scenes (Gorinski and Lapata, 2015). Hence we define movie scene saliency based on whether the scene is mentioned in the summary i.e., if the scene is mentioned in the summary, it is considered salient. Using scene saliency for summarization is therefore a method of explicit content selection.

In this paper, we first introduce MENSA, a Movie ScENe SAliency dataset that includes human annotation of salient scenes in movie scripts. Our annotators manually align Wikipedia summary sentences with movie scenes for 100 movies. We use these gold-standard annotations to evaluate existing explicit alignment methods. We then propose a supervised scene saliency classification model to identify salient scenes given a movie script. Specifically, we use the alignment method that performs best on the gold-standard data to generate silverstandard labels on a larger dataset, on which we then train a sequence classification model using scene embeddings to identify salient scenes. We then fine-tune a pre-trained language model using only the salient scenes to generate movie summaries. This model achieves new state-of-the-art summarization results as measured by ROUGE and BERTScore (Zhang et al., 2020b). In addition to that, we evaluate the generated summaries using a question-answer-based metric (Deutsch et al., 2021) and show that summaries generated using only the salient scenes outperform those generated using the entire movie script or baseline models.

2.1 Long-form Summarization

Summarization of long-form documents has been studied across various domains, such as news articles (Zhu et al., 2021b), books (Kryscinski et al., 2022), dialogues (Zhong et al., 2022), meetings (Zhong et al., 2021), and scientific publications (Cohan et al., 2018). To handle and process the long documents, many efficient transformer variants have been proposed (Zaheer et al., 2020; Zhang et al., 2020a; Huang et al., 2021). Similarly, work such as Longformer (Beltagy et al., 2020) uses local and global attention in transformers (Vaswani et al., 2017) to process long inputs. However, given that movie scripts are particularly long (see Table 1), these models still have a limited capacity due to memory and time complexity, and need to truncate movie scripts based on the maximum sequence length supported by the model.

Over the past decade, numerous approaches movie summarization have been proposed. Gorinski and Lapata (2018, 2015) generate movie overviews using a graph-based model and create movie script summaries based on progression, diversity, and importance. In contrast, the aim of our work is to find salient scenes and use these for summarization. Papalampidi et al. (2019, 2021) summarize movie scripts by identifying turning points, important narrative events. In contrast, our approach is based on salient scenes and does not assume a rigid narrative structure. Recently, Agarwal et al. (2022) proposed a shared task for script summarization; the best model (Pu et al., 2022) used a heuristic approach to truncate the script.

2.2 Summarization based on Content Selection

Several methods (Ladhak et al., 2020; Manakul and Gales, 2021; Liu et al., 2022) have leveraged content selection for summarization. Chen and Bansal (2018) and Zhang et al. (2022) generate silver standard labels through greedy alignment of the source document sentences with summary sentences. However, these methods do not explicitly evaluate alignments. Moreover, movie scripts consist of a large number of sentences with the same characters and location names, which can generate many false positives in greedy alignment. We collect gold-standard saliency labels to compare and evaluate alignment methods. Mirza et al. (2021) proposed a movie script alignment method for summaries but do not actually propose a summarization model. Recent work (Dou et al., 2021; Wang et al., 2022) has employed neural network attention for the summarization of short documents. However, movie scripts are challenging for attention-based methods, given their length.


[1] Our dataset and code is released at https://github.com/saxenarohit/select_summ.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Spotify down UPDATES: Thousands of users struggle to play music on app
Next Article One in four kids spend more than 4hrs a day online – despite school phone bans
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Huawei’s 2023 global sales revenue hits nearly 98 billion dollars, up by 9.63% y-o-y · TechNode
Computing
Australian Airline Qantas Confirms Contact With Possible Hackers
News
‘Anthem’ Is the Latest Video Game Casualty. What Should End-of-Life Care Look Like for Games?
Gadget
The Future of the Internet is Community Driven | HackerNoon
Computing

You Might also Like

Computing

Huawei’s 2023 global sales revenue hits nearly 98 billion dollars, up by 9.63% y-o-y · TechNode

1 Min Read
Computing

The Future of the Internet is Community Driven | HackerNoon

7 Min Read
Computing

U-Boot 2025.07 Brings New Code For Apple M1/M2 & Raspberry Pi, exFAT Support

2 Min Read
Computing

Semicon China: an expert’s takeaways · TechNode

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?