By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How Neighborhood Data Improves Legal Document Classification | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How Neighborhood Data Improves Legal Document Classification | HackerNoon
Computing

How Neighborhood Data Improves Legal Document Classification | HackerNoon

News Room
Last updated: 2025/04/02 at 1:02 AM
News Room Published 2 April 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Task, Datasets, Baseline

  3. RQ 1: Leveraging the Neighbourhood at Inference

    4.1. Methods

    4.2. Experiments

  4. RQ 2: Leveraging the Neighbourhood at Training

    5.1. Methods

    5.2. Experiments

  5. RQ 3: Cross-Domain Generalizability

  6. Conclusion

  7. Limitations

  8. Ethics Statement

  9. Bibliographical References

4.2. Experiments

4.2.1. Implementation Details

We follow the hyperparameters for baseline as described in Kalamkar et al. 2022. We use the BERT base model to obtain the token encodings. We employ a dropout of 0.5, maximum sequence length of 128, LSTM dimension of 768, attention context dimension of 200. We sweep over learning rates {1e5, 3e-5, 5e-5. 1e-4, 3e-4} for 40 epochs with Adam optimizer (Kingma and Ba, 2014) to derive the best model based on validation set performance. For all our inference variants, we carry a grid search over the interpolation factor (λ) in increments of 0.1 in the range of [0,1] to choose the best model based on Macro-F1 on validation set. For KNN and multiple prototypes, we vary k over powers of 2 from 8 till 256.

4.2.2. Results

In Table 1, we present the macro-F1 and micro-F1 scores for both the baseline and the interpolation variants. We observe a significant improvement when using kNN interpolation across all datasets, particularly in the more challenging macro-F1 metric, which accounts for label imbalances. On the other hand, single prototype interpolation mitigates memory footprint issue of kNN by storing one representation per rhetorical role but leads to performance degradation compared to kNN. This decline results from oversimplification, as a single prototype may struggle to capture the diverse aspects within each rhetorical role, particularly when instances of the same label are dispersed across the embedding space. This is evident in the Paheli dataset, where no improvement over the baseline is observed. Interpolation with multiple prototypes balances memory efficiency and label variation capture. While it slightly underperforms kNN interpolation in Paheli and M-CL datasets, it outperforms kNN in Build and M-IT. This can be attributed to a smoothing effect that reduces noise or human label variations in the kNN-based approach, particularly evident in datasets with low inter-annotator agreements (Build and M-IT). These results affirm our hypothesis that straightforward interpolation using training set examples during inference can boost the performance of rhetorical role classifiers.

Sensitivity of interpolation In Figure 1, we present the macro-F1 score for the M-CL dataset using kNN interpolation, while varying the interpolation coefficient λ and the number of neighbors

Table 1: Performance of interpolation methods on four datasets. mac.F1: macro-F1, mic.F1: micro-F1Table 1: Performance of interpolation methods on four datasets. mac.F1: macro-F1, mic.F1: micro-F1

Figure 1: Sensitity to hyperparameters - kNN (MCL) λ = 0: interpolation only, λ = 1: baseline onlyFigure 1: Sensitity to hyperparameters - kNN (MCL) λ = 0: interpolation only, λ = 1: baseline only

’k’. Here, λ values of 0 and 1 correspond to predictions solely from interpolation and the baseline model, respectively. We observe that performance initially improves as ’k’ increases, signifying that incorporating more neighbors boosts confidence by including closely similar examples. However, performance starts to decline with higher ’k’, which can be attributed to a large number of neighbours introducing noise with low inter-annotator agreement, suggesting a need for a addressing this task a multilabel classification. On the other hand, reducing λ consistently enhances performance, particularly for lower k, showcasing the model’s capacity to rely solely on semantically similar instances for label prediction. With higher k, we notice a decline in performance at lower λ values beyond a certain optimal point, which is related to the label variation problem exacerbated by a larger number of neighbours. Similar trends are observed with other interpolations.

Authors:

(1) Santosh T.Y.S.S, School of Computation, Information, and Technology; Technical University of Munich, Germany ([email protected]);

(2) Hassan Sarwat, School of Computation, Information, and Technology; Technical University of Munich, Germany ([email protected]);

(3) Ahmed Abdou, School of Computation, Information, and Technology; Technical University of Munich, Germany ([email protected]);

(4) Matthias Grabmair, School of Computation, Information, and Technology; Technical University of Munich, Germany ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Apple preps iPad Pro powered by 2nm M6 processor with next-generation C2 in-house 5G modem
Next Article NASA spotted an almost perfectly round circle of clouds over the ocean
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

This Refurbished Lenovo Chromebook is Now $75
News
You can snag a year of Peacock Premium for just $24.99 right now
News
GSMA calls for governments to prioritise affordable mobile spectrum | Computer Weekly
News
Elden ring nightreign, doom: the dark ages round up the biggest games of May
Software

You Might also Like

Computing

If You’re Going to Use Next.js — At Least Use it Right | HackerNoon

3 Min Read
Computing

Intel Core Ultra 7 256V “Lunar Lake” Linux Performance In Mid-2025 vs. Launch Day

3 Min Read
Computing

Safaricom, Airtel, CA barred from future internet blocks

4 Min Read
Computing

Top 11 Leapsome Alternatives in 2025 |

30 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?