By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Turning Movie Scripts into Short Summaries—Smarter and Faster | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Turning Movie Scripts into Short Summaries—Smarter and Faster | HackerNoon
Computing

Turning Movie Scripts into Short Summaries—Smarter and Faster | HackerNoon

News Room
Last updated: 2025/04/10 at 12:23 AM
News Room Published 10 April 2025
Share
SHARE

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Table of Links

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

5. Summarization Using Salient Scenes

We now investigate the benefit of using only salient scenes for the abstractive summarization of movie scripts. We formulate this task as a sequence-to-sequence generation problem. Formally, given a movie with a set of salient scenes M = {S1, S2, …, SK}, the goal is to generate a target summary S = {s1, s2, …, sm}. As the input length of the salient scenes is still quite large as shown in Figure 2, we use a Longformer EncoderDecoder (LED) architecture (Beltagy et al., 2020). To handle long input sequences, LED uses efficient local attention with global attention for the encoder. The decoder then uses the full self-attention to the encoded tokens and to previously decoded locations to generate the summary.

5.1 Dataset

We used the same dataset and split as in Section 4.1, now with Wikipedia plot summaries as output for movie script summarization. However, instead of using the whole movie script, we utilize the output of our scene saliency model and input only the salient scenes when we generate movie summaries.

5.2 Baselines

We compare the proposed model with various baselines. Lead-N simply outputs the first N tokens of the movie script as the summary of the movie. We varied N to understand the impact of summary length on performance and report results on Lead-512 and Lead-1024. FLAN-T5-XXL (Chung et al., 2022), FLAN-UL2 (Wei et al., 2022), Vicuna-13b-1.5 (Zheng et al., 2023) which is finetuned on Llama-2 (Touvron et al., 2023), and GPT-3.5-Turbo[4] (Brown et al., 2020) are instruction tuned large language models (LLMs) which were used in zero-shot setting. SUMM N (Zhang et al., 2022) is a multi-stage summarization framework for long input dialogues and documents. Unlimi-former (Bertsch et al., 2023) uses retrieval-based attention mechanism for long document summarization. Two-Stage Heuristic (Pu et al., 2022) is a two-stage movie script summarization model which first selects the essential sentences based on heuristics and then summarizes the text using LED with efficient fine-tuning. Random Selection randomly selects salient scenes for summarization. Full Text takes the full movie script as input (no content selection) and truncates the text based on model input length.

Table 5: Results of our model Select and Summarize (SELECT & SUMM) compared with other summarization models. *Denotes model results from the paper of the shared task.Table 5: Results of our model Select and Summarize (SELECT & SUMM) compared with other summarization models. *Denotes model results from the paper of the shared task.

5.3 Implementation Details

We experimented with two pre-trained models LED and Pegasus-X as base models for summarization which were fined-tuned on the Scriptbase corpus (see Section 4.1). Each input sequence for the movie is truncated to 16,384 tokens (including special tokens) to fit into the maximum input length of the model. We experimented with both the base and large variants of these models and found that the large models performed better and used them in our experiments. We used AdamW as an optimizer (β1 = 0.9, β2 = 0.99) with a learning rate of 5e-5. We used a linear warmup strategy with 512 warmup steps. We trained the models to 60 epochs and used the checkpoint with the best validation score. We used a beam size of five for decoding and generating the summary. We also created a random selection baseline by selecting a random k% of scenes and using those to generate a summary. We report the best result for random selection, which was obtained for k = 25 and LED. All the baseline models are fully trained on our dataset using the best configuration from the papers.

5.4 Results

Table 5 shows our evaluation results using ROUGE (F1) scores and BERTScore on the Scriptbase corpus. Compared with the baseline models and previous work, our model achieves state-of-the-art results on all metrics. Specifically, our Select and Summarize model, which selects salient scenes, achieves 49.98, 12.11, and 47.95 on ROUGE1/2/L scores and also shows improvements on BERTScore. Compared to a model which uses the full text of the movies, our model improves the performance by 3.83, 1.49, and 3.49 ROUGE-1/2/L points, respectively. The Lead-N baseline achieves better results than Agarwal et al. (2022) with a ROUGE-1 of 17.69 for Lead-1024. Our model outperforms SUMMN (Zhang et al., 2022), which can be attributed to better content selection using salient scenes compared to greedy content selection based on ROUGE. As named entities and places are repeated across the movie script, the greedy alignment used in SUMMN can result in false positives. Unlimiformer performance is low compared to our model and the two-stage model, possibly because it does not include explicit content selection. The Pu et al. (2022) model performs slightly better than using Full Text, as removing sentences based on heuristics allows it to include movie script text which would otherwise be truncated. FLAN-UL2 performs better than GPT-3.5-Turbo and FLANT5-XXL in a zero-shot setting but our fine-tuned model outperforms all three models.

We also experimented with Pegasus-X (Phang et al., 2023) instead of LED as the base summarization model for SELECT & SUMM. We found both models perform better when using our approach of selecting salient scenes compared to the full text, with LED demonstrating superior performance.

Figure 2. also shows that our model yields improvements even though it uses only half the length (only salient scenes) of the original script. This demonstrates the effectiveness of salient scene selection in movie script summarization. Appendix E shows generated summaries for two movies.

Table 6: Results of QAEval on summaries generated by Select and Summarize and baseline models.Table 6: Results of QAEval on summaries generated by Select and Summarize and baseline models.


[4] We used model gpt-3.5-turbo-1106 which has context length of 16K tokens.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article You Can Play the New Game in ‘Black Mirror’—and It’s an Adorable Nightmare
Next Article Immigrants’ social media accounts will be monitored for ‘antisemitic activity,’ DHS says
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Steam finally goes native on Apple Silicon, here’s how to try it – 9to5Mac
News
Amazon API Gateway Adds Dynamic Routing Based on Headers and Paths
News
Ransomware Gangs Exploit Unpatched SimpleHelp Flaws to Target Victims with Double Extortion
Computing
Garmin Venu X1 vs Apple Watch Ultra 2: Which should you go for?
Gadget

You Might also Like

Computing

Ransomware Gangs Exploit Unpatched SimpleHelp Flaws to Target Victims with Double Extortion

8 Min Read
Computing

China’s GAC starts pre-sales of $234,000 flying car · TechNode

1 Min Read
Computing

In rural Kenya, donkeys are now microchipped, insured, and protected

14 Min Read
Computing

28 Brands on TikTok to Inspire Your Feed |

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?