By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: This Model Knows Which Movie Scenes Matter Most | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > This Model Knows Which Movie Scenes Matter Most | HackerNoon
Computing

This Model Knows Which Movie Scenes Matter Most | HackerNoon

News Room
Last updated: 2025/04/09 at 11:58 PM
News Room Published 9 April 2025
Share
SHARE

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Table of Links

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

3. MENSA: Movie Scene Saliency Dataset

We define the saliency of a movie scene based on the mention of the scene in a user-written summary of the movie. If the scene appears in the summary, then it is considered salient for understanding the narrative of the movie. By aligning summary sentences to movie scenes, we identify salient scenes and later use them for movie summarization.

The MENSA dataset consists of the scripts of 100 movies and respective Wikipedia plot summaries annotated with gold-standard sentence-to-scene alignment. We selected 80 movies randomly from ScriptBase (Gorinski and Lapata, 2015) and added 20 recently released, manually corrected movie scripts, which all had Wikipedia summaries.

Both MENSA and ScriptBase datasets are movie scripts datasets and differ from other dialogue/narrative datasets such as SummScreenFD (Chen et al., 2022), the ForeverDreaming subset of the SummScreen dataset as used in the SCROLLS benchmark (Shaham et al., 2022). SummScreenFD is dataset of TV show episodes and consists of crowd-sourced transcripts and recaps. In contrast, the movie scripts in our dataset were written by screenwriters and the summaries were curated by Wikipedia. It is important to note that movies and TV shows have different storytelling structures, number of acts, and length. SummScreenFD has shorter input texts and summaries compared to movie scripts as shown in Table 2.

Table 1: Statistics of the MENSA dataset.Table 1: Statistics of the MENSA dataset.

Table 2: Statistics of the length of the script and summary in the SummScreenFD and MENSA datasets.Table 2: Statistics of the length of the script and summary in the SummScreenFD and MENSA datasets.

3.1 Annotation Scheme

Formally, let M denote a movie script consisting of a sequence of scenes M = {S1, S2, …, SN } and let D denote the Wikipedia plot summary consisting of a sequence of sentences D = {s1, s2, …, sT }. The aim is to annotate and select a subset of salient scenes M′ such that M′ ⊂ M and |M′ | ≪ |M|, where for every scene in M′ there exist one or more aligned sentences in D.

To manually align the summary sentences for 100 movies, we recruited five in-house annotators. They received detailed annotation instructions and were trained by the authors until they were able to perform the alignment task reliably. To analyze inter-annotator agreement, 15 movies were selected randomly and triple-annotated by the annotators. The remaining 85 movies were single annotated, similar to the annotation process used by Papalampidi et al. (2019), to reduce the cost of annotation. As annotating and aligning a full-length movie script with its summary is a difficult task, we provided a default alignment to annotators generated by the alignment model of Mirza et al. (2021). For every summary sentence, annotators first verified the default alignment with movie script scenes. If the alignment was only partially correct or missing, they corrected the alignment by adding or removing scenes for a given sentence using a web-based tool. We assume that each sentence can be aligned to one or more scenes and vice versa. In Table 1, we present statistics of the scripts and summaries in the MENSA dataset.

To evaluate the quality of the annotations collected, we computed inter-annotator agreement on the triple annotated movies using three metrics: (a) Exact Match Agreement (EMA), (b) Partial Agreement (P A), and (c) Mean Annotation Distance (D). These measures were used for a similar annotation task by Papalampidi et al. (2019).[2] EMA is the ratio of the intersection of the scenes that the three annotators exactly agree upon for a given summary sentence, which is averaged over all sentences in the summary (Jaccard Similarity) and computed as follows:

Partial agreement (P A) is the ratio where there is an overlap of at least one scene among the annotators and is given as follows:

Annotation distance (d) for a summary sentence s between two annotators is defined as the minimum overlap distance and is computed as follows:

EMA and PA between our annotators was 52.80% and 81.63%, respectively. The PA indicates that for every sentence in the summaries, there is a high overlap of at least one scene. This is consistent with the low mean annotation distance of 1.21, which indicates that on average the distance between the annotations is around one scene. The EMA shows that for more than half of the sentences, there is an exact match in scene-to-sentence alignment among the annotators.

Table 3: Comparing alignment performance for different alignment methods on the gold-standard set.Table 3: Comparing alignment performance for different alignment methods on the gold-standard set.

3.2 Evaluation of Automatic Alignment Methods

Since it is too expensive and time-consuming to collect gold-standard scene saliency labels for the whole of Scriptbase (Gorinski and Lapata, 2015), we generate silver-standard labels to train a model for scene saliency classification. Based on our definition of scene saliency above, silver-standard labels for scene saliency can be generated by aligning movie scenes with summary sentences.

Alignment between the source document segments and the summary sentences has been previously proposed for news summarization (Chen and Bansal, 2018; Zhang et al., 2022) and narrative text (Mirza et al., 2021). Using our gold-standard labels, we investigate which of these approaches yields better alignment between movie scripts and summaries and therefore should be used to generate silver-standard labels for scene saliency.

Chen and Bansal (2018) used ROUGE-L to align a summary sentence to the most similar source document sentence. In our case, we transformed these source document (movie script) sentence-level alignments to scene-level alignments such that if the scene contains the aligned sentence, the scene will be aligned to the summary sentence. Zhang et al. (2022) used a greedy algorithm for aligning the document segment and the summary sentences. For each segment, the sentences are aligned based on the gain in ROUGE-1 score. In our case, movie scenes are considered as source document segments. Mirza et al. (2021) proposed an alignment method specifically for movie scripts using semantic similarity combined with Integer Linear Programming (ILP) to align movie script scenes to summary sentences.

We present the results of applying these three approaches on our gold-standard MENSA dataset in Table 3. We report macro-averaged precision (P), recall (R), and F1 score. The Mirza et al. (2021) method performs significantly better than the ROUGE-based methods, possibly as it was specifically proposed to align movie scenes and summary sentences.[3] We therefore used this alignment method to generate silver-standard scene saliency labels for the complete Scriptbase corpus.

Our dataset can be used in the future to evaluate content selection strategies in long documents. The gold-standard salient scenes can also be used to evaluate extractive summarization methods.

We now introduce our Select and Summarize (SELECT & SUMM) model, which first uses a classification model (Section 4) to predict the salient scenes and then utilizes only the salient scenes to generate a movie summary using a pre-trained abstractive summarization model (Section 5). These models are trained in a two-stage pipeline.


[2] We renamed total agreement in Papalampidi et al. (2019) to EMA for clarity.

[3] It was also used to generate the default alignment that our human annotators had to correct, which biases our evaluation towards the method of Mirza et al. (2021). However, our results are still a good measure of how many errors human annotators find in the alignment generated by this method.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article One in four kids spend more than 4hrs a day online – despite school phone bans
Next Article Samsung is adding Gemini to Ballie, and it’s rolling out in the US soon
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How Google’s new Gemini AI update keeps us safe from online scams
News
Legend, Nigeria’s first publicly listed ISP, has bigger internet plans
Computing
Hisense’s 2025 TV range looks great, but it’s the 100in TVs that really set it apart | Stuff
Gadget
Most AI Spending Due to Fear of Falling Behind, According to IBM
News

You Might also Like

Computing

Legend, Nigeria’s first publicly listed ISP, has bigger internet plans

8 Min Read
Computing

The ESPRIT Algorithm and Central Limit Error Scaling | HackerNoon

1 Min Read
Computing

38,000+ FreeDrain Subdomains Found Exploiting SEO to Steal Crypto Wallet Seed Phrases

7 Min Read
Computing

Intel Link-Off Between Frames “LOBF” Submitted For Linux 6.16 Graphics Driver

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?