Being a software engineer who works extensively with financial data, I recently hit a wall with traditional stock market APIs. After getting frustrated with rate limits and expensive subscriptions, I decided to build my own solution using web scraping. Here’s how I did it, and what I learned along the way.
Introduction: Why I Needed a Different Approach
My breaking point came during a personal project where I was trying to analyze market trends. Yahoo Finance’s API kept hitting rate limits, and Bloomberg Terminal’s pricing made me laugh out loud – there was no way I could justify that cost for a side project. I needed something that would let me:
- Fetch data without arbitrary limits
- Get real-time prices and trading volumes
- Access historical data without paying premium fees
- Scale up my analysis as needed
The Web Scraping Solution
After some research and experimentation, I settled on scraping data from two main sources: CNN Money for trending stocks and Yahoo Finance for detailed metrics. Here’s how I built it:
Setting Up the Basic Infrastructure
First, I installed the essential tools:
pip install requests bs4
Then I created a basic scraper that could handle network issues gracefully:
import requests
from bs4 import BeautifulSoup
import time
import logging
def make_request(url, max_retries=3):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
for attempt in range(max_retries):
try:
return requests.get(url, headers=headers, timeout=10)
except Exception as e:
if attempt == max_retries - 1:
raise
time.sleep(attempt + 1)
Grabbing Trending Stocks
I started with CNN Money’s hot stocks list, which gives me three categories of stocks to track:
def get_trending_stocks():
url = 'https://money.cnn.com/data/hotstocks/index.html'
response = make_request(url)
soup = BeautifulSoup(response.text, "html.parser")
tables = soup.findAll("table", {"class": "wsod_dataTable wsod_dataTableBigAlt"})
categories = ["Most Actives", "Gainers", "Losers"]
stocks = []
for i, table in enumerate(tables):
for row in table.findAll("tr")[1:]: # Skip headers
cells = row.findAll("td")
if cells:
stocks.append({
'category': categories[i],
'symbol': cells[0].find(text=True),
'company': cells[0].span.text.strip()
})
return stocks
Getting the Financial Details
For each trending stock, I fetch additional data from Yahoo Finance:
def get_stock_details(symbol):
url = f"https://finance.yahoo.com/quote/{symbol}"
response = make_request(url)
soup = BeautifulSoup(response.text, "html.parser")
data = {}
# Find the main quote table
table = soup.find("table", {"class": "W(100%)"})
if table:
for row in table.findAll("tr"):
cells = row.findAll("td")
if len(cells) > 1:
key = cells[0].text.strip()
value = cells[1].text.strip()
data[key] = value
return data
The Gotchas I Encountered
Building this wasn’t all smooth sailing. Here are some real issues I hit and how I solved them:
- Rate Limiting: Yahoo Finance started blocking me after too many rapid requests. I added random delays between requests:
time.sleep(random.uniform(1, 3)) # Random delay between 1-3 seconds
- Data Inconsistencies: Sometimes the scraped data would be malformed. I added validation:
def validate_price(price_str):
try:
return float(price_str.replace('$', '').replace(',', ''))
except:
return None
- Website Changes: The sites occasionally update their HTML structure. I made my selectors more robust:
# Instead of exact class matches, use partial matches
table = soup.find("table", class_=lambda x: x and 'dataTable' in x)
Storing and Using the Data
I keep things simple with CSV storage – it’s easy to work with and perfect for my needs:
import csv
from datetime import datetime
def save_stock_data(stocks):
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
with open('stock_data.csv', 'a', newline='') as file:
writer = csv.writer(file)
for stock in stocks:
writer.writerow([timestamp, stock['symbol'],
stock['price'], stock['volume']])
What I Learned
After running this scraper for several weeks, here are my key takeaways:
- Web scraping isn’t just a hack – it’s a viable alternative to expensive APIs when done rightly.
- Building in error handling and logging from the start saves huge headaches later.
- Stock data is messy – so always validate what you scrape.
- Starting simple and iterating works better than trying to build everything at once!
What’s Next?
I’m currently working on adding:
- News sentiment analysis
- Basic pattern recognition
- A simple dashboard for visualization
Also, would you like to integrate this scraper with machine learning models to predict stock trends? Let me know in the comments!