By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Can Less Data Make Better Predictions? This Study Says Yes | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Can Less Data Make Better Predictions? This Study Says Yes | HackerNoon
Computing

Can Less Data Make Better Predictions? This Study Says Yes | HackerNoon

News Room
Last updated: 2025/05/13 at 12:41 AM
News Room Published 13 May 2025
Share
SHARE

Authors:

(1) Mahdi Goldani;

(2) Soraya Asadi Tirvan.

Table of Links

Abstract and Introduction

Methodology

Dataset

Similarity methods

Feature selection methods

Measure the performance of methods

Result

Discussion

Conclusion and References

Methodology

This section elaborates on the methodology adopted for this research work. The complete methodology is depicted in Fig. 1 and consists of the following steps.

• Historical finance datasets of the 100 biggest companies are collected

• In this step, appropriate features are selected using feature selection methods and similarity methods

• feature selection methods were used in 80 steps. Each step reduced the dataset size by 1% until just 20% of the primary dataset

• Linear regression is trained on selected features and forecast 10 days ahead of APPL close price.

• In the last step, Linear regression performance is evaluated through cross-validation techniques and results are documented

Fig. 1 The complete methodologyFig. 1 The complete methodology

Dataset

Based on the aim of this paper, to examine the Density and performance of the feature selection methods and similarity methods during high and low sample sizes, the finance dataset was chosen. A large amount of financial data is a suitable feature to examine the performance of methods in large to small amounts of data. According to the Fortune Global 500 2023 rankings, the data set of this research was secondary data including open, low, high, and close prices and the volume of the 100 biggest companies by consolidated revenue. The target value of this dataset was Apple’s close price the prediction of the closing price of this variable is done in different data sizes and the best model was selected from among the datasets. The data were collected from the Yahoo Finesse site spanning from January 1, 2016, to January 28, 2024.

This research’s main approach is measuring feature selection algorithms’ sensitivity to sample size. For this purpose, the feature selection methods were used in 80 steps. Each step reduced the dataset size by 1% until just 20% of the primary dataset.

Similarity methods

As is clear in Table 1 each method of FS has some Limitations and weaknesses. Therefore, the time series similarity methods can be a good choice as feature selection methods. Measuring similarity in time series forms the basis for the clustering and classification of these data, and its task is to measure the distance between two-time series. The similarity in time series plays a vital role in analyzing temporal patterns. Firstly, the similarity between time series has been used as an absolute measure for statistical inference about the relationship between time series from different data sets [16]. In recent years, the increase in data collection has made it possible to create time series data. In the past few years, tasks such as regression, classification, clustering, and segmentation were employed for working with time series. In many cases, these tasks require defining a distance measurement that indicates the level of similarity between time series. Therefore, studying various methods for measuring the distance between time series appears essential and necessary. Among the different types of similarity measurement criteria for time series, they can be divided into three categories: step-by-step measures, distribution-based measures, and geometric methods. Table 2 describes both advantages and disadvantages of similarity methods.

Table 2. Similarity methodsTable 2. Similarity methods

Table 2. Similarity methodsTable 2. Similarity methods

This paper is available on arxiv under CC BY-SA 4.0 by Deed (Attribution-Sharealike 4.0 International) license.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Nintendo Switch 2 adds two great features the Switch was missing
Next Article Celebrate Mental Health Awareness Month with free books on Stuff Your Kindle Day
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Tesla Board Chair Robyn Denholm Made $198 Million Selling Stock as Profit Fell
News
Malicious PyPI Package Posing as Solana Tool Stole Source Code in 761 Downloads
Computing
The Best Payroll Services for 2025
News
These new iPhone features are our first real look at iOS 19 | Stuff
Gadget

You Might also Like

Computing

Malicious PyPI Package Posing as Solana Tool Stole Source Code in 761 Downloads

3 Min Read
Computing

China-Linked APTs Exploit SAP CVE-2025-31324 to Breach 581 Critical Systems Worldwide

5 Min Read
Computing

Alibaba opens new campuses in Hangzhou and Beijing on the same day · TechNode

1 Min Read
Computing

Jumia launches logistics service in push toward profitability by 2027

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?