Microservices Observability: Leveraging OpenTelemetry In Real-World Systems By Gajinder Sharma

As a backend developer working with microservices for the past few years, one truth has become painfully obvious: debugging production issues across distributed systems can feel like detective work in the dark.

You’ve got services calling services, sometimes dozens deep. A user clicks a button on the UI, and 15 microservices spin into action. If something breaks — or worse, just slows down — figuring out where and why can chew up hours.

This is exactly why observability matters. And if you’re building or maintaining microservices in 2024, OpenTelemetry is the tool you want in your corner.

What Even Is Observability, Really?

Observability is more than just logs. It’s about understanding why your system is behaving a certain way, not just what it’s doing. At the core, we’re talking about three pillars:

Logs – Raw events, helpful for debugging.
Metrics – Numbers you can track over time (e.g. request count, CPU).
Traces – End-to-end request flows across services (aka your distributed “call stack”).

Traditional monitoring tools mostly focus on metrics and logs, but tracing is the real game-changer for microservices.

Why We Picked OpenTelemetry

We experimented with several observability stacks — Datadog, New Relic, Prometheus, Jaeger, Zipkin — but they all had one problem: either they were vendor-locked or lacked consistency across languages.

OpenTelemetry (OTel) checked all our boxes:

Open-source, under CNCF
Works across languages (we use Node.js, Go, and Python)
Vendor-neutral — export to Grafana, Jaeger, New Relic, etc.
Supported by everyone in the industry (literally: AWS, GCP, Microsoft, etc.)

How We Use OpenTelemetry in Node.js Microservices

Let me walk you through how we actually instrumented a real service. Let’s say we’ve got a simple User Service built in Node.js using Express. It exposes an endpoint `/users` that fetches user data. Below are the steps.

Step 1: Install Dependencies

npm install @opentelemetry/api 
@opentelemetry/sdk-node 
@opentelemetry/auto-instrumentations-node 
@opentelemetry/exporter-trace-otlp-http

We’re going to export traces via OTLP to a local Jaeger instance.

Step 2: Create tracing.js to Initialize OpenTelemetry

JavaScript: tracing.js

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { Resource } = require('@opentelemetry/resources');
const traceExporter = new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces',
});

const sdk = new NodeSDK({
traceExporter,
instrumentations: [getNodeAutoInstrumentations()],
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'user-service',})
});
sdk.start();

Step 3: Add It to Your Entry File

JavaScript: Index.js

require('./tracing'); // Always load this first

const express = require('express');
const app = express();

app.get('/users', (req, res) => {
 res.json([{ id: 1, name: "Alice" }, { id: 2, name: "Bob" }]);
});

app.listen(3000, () => console.log("User service running on port 3000"));

Our service is now exporting traces.

Step 4: Spin Up Jaeger Locally (or Use Grafana Tempo)

Here’s how we test locally:

docker run -d --name jaeger 
 -e COLLECTOR_OTLP_ENABLED=true 
 -p 4318:4318 -p 16686:16686 
 jaegertracing/all-in-one:latest

Chaining Traces Across Services

Now say you have another service — order-service — that calls user-service. If both are instrumented with OpenTelemetry, you’ll get a full trace of the user request hopping between them.

And the best part? OpenTelemetry handles trace context propagation via HTTP headers automatically. You don’t have to manually pass trace IDs between services.

Adding Custom Spans for Business Logic

Sometimes auto-instrumentation isn’t enough. For example, if you want to trace a DB query or external API call:

const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('user-service');

app.get('/users', async (req, res) => {
 const span = tracer.startSpan('fetch-user-data');
 try {
  const users = await fetchUsersFromDB();
  res.json(users);
 } catch (err) {
  span.recordException(err);
  throw err;
 } finally {
  span.end();
 }
});

This is super helpful when you want to track performance of specific business logic.

Best Practices We’ve Learned the Hard Way

1. Use Semantic Conventions

Instead of inventing your own attribute names, stick with the OpenTelemetry semantic conventions. These make your traces easier to understand and compatible with tools like Grafana, Tempo, etc.

Example:

JavaScript
span.setAttribute("http.method", req.method);
span.setAttribute("http.route", req.path);

2. Sample Wisely

If you trace every single request, your system will drown in data. Use trace sampling (e.g. 10%, or only errors).

JavaScript

const sdk = new NodeSDK ({
 sampler: new TraceIdRatioBasedSampler (0.1), // 10% sampling
});

3. Use OpenTelemetry Collector in Production

Don’t export telemetry data directly from your services to your backend. Route it through the OpenTelemetry Collector — it gives you buffering, batching, retries, and format conversion.

4. Don’t Log PII in Spans

This one’s critical. Be super careful not to store user names, emails, credit card info, etc. in span attributes or logs. Stick to metadata and identifiers.

Where This Has Helped Us Most

Debugging latency issues: Seeing full traces across 4–5 microservices helped us identify bottlenecks in minutes.
Identifying retry storms: We spotted a service calling another in a loop with retries, something we wouldn’t have caught via logs.
Deployment regressions: Comparing traces from one version to the next showed us exactly what changed.

Bonus: Tracing in a Multi-Language Stack

We’re using Node.js for some services, Go for others. OpenTelemetry made it easy to instrument both and send all data to a single place — Jaeger for dev, Grafana Cloud in staging/prod.

No vendor lock-in. No mismatch in trace formats. Just pure visibility.

Conclusion: If You’re Building Microservices, Start with Observability

Microservices give us scale and flexibility, but they also bring complexity. Without proper observability, you’re flying blind.
OpenTelemetry has become a core part of our architecture, not just for debugging, but for optimizing performance, reliability, and ultimately — the user experience.

If anyone not using it yet, I strongly recommend giving it a shot. Even a basic setup with Jaeger and a couple services will make you wonder how you ever lived without it.

Microservices Observability: Leveraging OpenTelemetry in Real-World Systems by Gajinder Sharma | HackerNoon

What Even Is Observability, Really?

Why We Picked OpenTelemetry

How We Use OpenTelemetry in Node.js Microservices

Chaining Traces Across Services

Best Practices We’ve Learned the Hard Way

Bonus: Tracing in a Multi-Language Stack

Leave a Reply Cancel reply

Stay Connected

Latest News

Why Blockchain’s Math Doesn’t Translate Into Financial Stability | HackerNoon

Synology BeeStation Plus review: Simple NAS storage with caveats

Melania Trump Dubs Herself First Lady of Technology

Why Bitcoin Lacks the Staying Power of Gold | HackerNoon

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

What Even Is Observability, Really?

Why We Picked OpenTelemetry

How We Use OpenTelemetry in Node.js Microservices

Chaining Traces Across Services

Best Practices We’ve Learned the Hard Way

Bonus: Tracing in a Multi-Language Stack

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News