Catch Secrets In Real Time On GitHub With EnvScanner 2.0 And AI

Every developer has been there. You’re pushing code at 2 AM, eyes half-shut, and without realizing it — that API key you hardcoded “just for testing” is now sitting in your public GitHub repo.

And just like that, it’s out there.

Leaked secrets are one of the biggest sources of breaches today. From AWS keys to database credentials, once they hit the public GitHub firehose, attackers with automated scanners pick them up within minutes.

That’s where EnvScanner 2.0 comes in.

The Idea Behind EnvScanner 2.0

When I looked at existing secret scanners, I saw two problems:

False positives everywhere. Regex-only scanners scream about anything that looks like a key (even harmless fake values).
Performance issues. Many scanners are too heavy for small servers or can’t keep up with real-time GitHub events.

So I set out to build a proof-of-concept DevSecOps tool that could:

Ingest real-time GitHub events
Detect possible secrets intelligently
Validate them with AI (Google Gemini) to minimize false alarms
Run efficiently, even on low-resource machines

The result: EnvScanner 2.0. | https://envscanner.vercel.app

How It Works (Architecture)

EnvScanner 2.0 follows a multi-stage pipeline:

Event Ingestion → Continuously polls the public GitHub /events API.
Memory-Efficient Queuing → Transforms heavy GitHub events into lightweight objects stored in a capped in-memory queue (so it won’t blow up RAM).
File Filtering & Fetching → Ignores non-source files (images, binaries, docs).
Secret Detection → Uses regex + entropy analysis + keyword matching + negative lookaheads to avoid obvious junk.
AI Validation → Potential secrets go to Gemini API, which looks at the context and discards fake/local/test credentials.
Storage & Display → Validated secrets are stored in MongoDB and shown on a live React dashboard via WebSockets.

💻 Tech Stack

Performance Optimizations

I wanted this to run even on a 512MB server. Here’s how:

Concurrency Limiting with p-limit → prevents CPU/memory spikes.
Aggressive Timeouts → no getting stuck on slow file fetches.
Memory-Efficient Queuing → lightweight event objects only.
Blacklist Filtering → skips “junk” files like images and binaries.

The Frontend Dashboard

The React + WebSocket frontend shows:

Live event stream
Current scan activity
API rate limit status
Newly discovered + AI-validated secrets

It’s responsive, so you can monitor leaks in real-time from your laptop or phone.

Why This Matters

This project isn’t meant to replace enterprise-grade tools like GitGuardian or CloudSEK (they’re doing amazing work in this space). Instead, EnvScanner 2.0 is a proof of concept — showing how lightweight engineering + AI validation can make DevSecOps tools both smarter and more resource-friendly.

Secrets leaking on GitHub isn’t slowing down anytime soon. My hope is that experiments like this push the ecosystem toward fewer false positives, more automation, and smarter validation.

Final Thoughts

EnvScanner 2.0 started as a late-night experiment and turned into a full-stack project that I’m genuinely proud of.

If you’re curious about the project or want to collaborate on improving it — feel free to reach out. Always open to feedback from the security and dev communities.

Because at the end of the day, keeping secrets safe is everyone’s responsibility.

Catch Secrets in Real Time on GitHub with EnvScanner 2.0 and AI | HackerNoon

The Idea Behind EnvScanner 2.0

How It Works (Architecture)

💻 Tech Stack

Performance Optimizations

The Frontend Dashboard

Why This Matters

Final Thoughts

Leave a Reply

The Idea Behind EnvScanner 2.0

How It Works (Architecture)

💻 Tech Stack

Performance Optimizations

The Frontend Dashboard

Why This Matters

Final Thoughts

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply