Custom Agent for your application observability needs
Modern software architectures often embrace microservices and distributed systems to enhance scalability, resilience, and modularity. However, this architectural shift brings unique challenges to maintaining observability, a critical aspect of system reliability and performance. These are the challenges in Implementing Observability in Microservices and Distributed Systems:
- Distributed Nature of Microservices
In a distributed system, a single transaction often spans multiple services and network hops. Tracing the flow of data and understanding inter-service communication can be complex, especially when failures occur. Observability solutions need to provide end-to-end visibility without introducing excessive overhead. - Volume of Metrics, Logs, and Traces
Microservices generate a vast amount of data in the form of logs, metrics, and traces. Managing and storing this data at scale while ensuring quick access for troubleshooting and analysis can strain resources. - Diverse Technology Stack
Microservices ecosystems often involve a heterogeneous mix of technologies, languages, and frameworks. This diversity complicates standardizing observability practices across services, making it harder to unify data collection and analysis. - Dynamic Environments
Modern systems often run in dynamic environments like Kubernetes, where containers are ephemeral and infrastructure changes frequently. Observability tools must adapt to these dynamic changes to provide meaningful insights without manual intervention. - Latency in Data Collection and Visualization
Achieving near-real-time observability is essential for detecting issues promptly. However, collecting, processing, and visualizing large amounts of data in real-time can lead to latency, reducing the effectiveness of monitoring systems. - Overhead and Performance Impact
Instrumentation for observability can introduce additional load to the application, potentially impacting its performance. Striking a balance between comprehensive observability and minimal overhead is a constant challenge. - Correlating Data Across Dimensions
Correlating metrics, logs, and traces across services to identify the root cause of issues is challenging. Observability tools need robust correlation mechanisms to unify disparate data sources for meaningful insights. - Security and Privacy Concerns
Sensitive data may be logged or traced inadvertently, leading to privacy concerns. Observability implementations must ensure compliance with security and data protection regulations.
To address these challenges, organizations need observability solutions that are scalable, interoperable, and adaptive to their specific architectures. By leveraging tools like Micrometer for metrics collection developers can mitigate many of these complexities and build resilient, observable systems.
What Is Micrometer? A Brief Overview
Micrometer is a lightweight metrics instrumentation library designed for JVM-based applications. It simplifies the process of collecting and publishing metrics across diverse monitoring systems without locking developers into a single vendor or tool.
Key Features
- Multi-Binder Support: Export metrics to multiple monitoring backends simultaneously, such as Prometheus, Datadog, or New Relic.
- Simple API: Provides a straightforward API to create custom metrics, including counters, timers, gauges, and distribution summaries.
- Vendor-Neutral: Abstracts metrics collection so developers can change monitoring systems with minimal effort.
Role as a Metrics Instrumentation Library for JVM-Based Applications
Micrometer acts as a bridge between application code and monitoring platforms. It enables developers to:
- Instrument Application Code: Add metrics collection to critical parts of the application, such as database queries or HTTP endpoints.
- Provide Consistent Metrics: Standardize metrics format across services in a microservices ecosystem.
- Leverage JVM Insights: Out-of-the-box JVM metrics (e.g., memory usage, garbage collection) ensure holistic monitoring.
In essence, Micrometer empowers JVM developers to build observable applications by offering a seamless and efficient way to collect, aggregate, and export metrics.
Observability Pillars
Modern observability is built on three foundational pillars — Metrics, Logging, and Tracing — each serving a distinct purpose in monitoring and diagnosing applications.
1. Metrics
- What They Are: Quantitative measures that provide insights into application performance and resource usage. Examples include CPU usage, request latency, and database connection counts.
- Micrometer’s Role: Micrometer is primarily a metrics instrumentation library, enabling developers to collect, transform, and export application metrics to various monitoring systems like Prometheus, Datadog, and AWS CloudWatch.
- Strength: Best for monitoring trends and identifying performance bottlenecks.
2. Logging
- What It Is: Captures discrete events or errors as textual records. Logs provide details about what happened, where, and when.
- Micrometer’s Role: While Micrometer itself focuses on metrics, it complements logging by correlating metrics with logs using shared contextual information trace id and span id.
- Strength: Ideal for debugging and forensic analysis.
3. Tracing
- What It Is: Tracks the flow of requests across distributed services, offering a timeline of events.
- Micrometer’s Role: Micrometer integrates with tracing tools like OpenTelemetry or Zipkin to add contextual metadata to traces like trace id and span id, enabling better correlation between traces and metrics.
- Strength: Essential for diagnosing issues in distributed architectures.
Micrometer’s primary strength lies in metrics collection, but its compatibility with logging and tracing tools ensures a cohesive observability strategy as well.
Core Metric Types in Micrometer: Counters, Timers, Gauges, and Distribution Summaries
Micrometer provides a variety of metric types to help developers capture specific performance data and behavioral insights about their applications. Here’s a breakdown of the key metric types and their use cases:
1. Counters
A counter is a monotonically increasing value that represents the number of times an event has occurred.
Use Case: Best suited for counting discrete occurrences, such as:
- The total number of HTTP requests received.
- The count of errors or exceptions thrown.
Characteristics:
- Value only increases (or resets, depending on the backend).
- No negative values.
Example:
Counter requestCounter = Counter.builder("http.requests.total")
.description("Total HTTP requests")
.register(meterRegistry);
requestCounter.increment(); // Increment by 1
2. Timers
A timer measures both the frequency and duration of events.
Use Case: Ideal for monitoring latency and throughput, such as:
- Tracking the execution time of a database query.
- Measuring the response time of an API endpoint.
Characteristics:
- Tracks the count, total time, and average duration of the events.
- Provides percentile distribution (e.g., p95, p99) when supported by the backend.
Example:
Timer timer = Timer.builder("db.query.time")
.description("Time taken for database queries")
.register(meterRegistry);
timer.record(() -> {
// Code to be timed
fetchFromDatabase();
});
3. Gauges
A gauge represents a single numerical value that can go up or down.
Use Case: Best for monitoring real-time state or levels, such as:
- The current number of active threads.
- The size of a queue or cache.
Characteristics:
- Value can increase or decrease.
- Reads directly from the observed object.
Example:
List<String> cache = new ArrayList<>();
Gauge.builder("cache.size", cache, List::size)
.description("Current size of the cache")
.register(meterRegistry);
4. Distribution Summaries
A distribution summary tracks the distribution of a set of values, providing statistical insights like count, sum, mean, and percentiles.
Use Case: Useful for measuring the distribution of discrete values, such as:
- The payload size of HTTP requests.
- The number of items processed in a batch.
Characteristics:
- Tracks the total number of events, the sum of their values, and their statistical distribution.
- Offers percentile-based insights when supported by the backend.
Example:
DistributionSumjmary summary = DistributionSummary.builder("http.request.size")
.description("Size of HTTP request payloads")
.baseUnit("bytes")
.register(meterRegistry);
summary.record(512); // Record a 512-byte request payload
In this post, we’ll be using Gauge to create custom metrics.
With your JVM applications, one can use micrometer to instrument observability with standardized naming and tagging (watch the demo here), but what if the components of your application are not always spring and moreover you want to observe the application as a whole from an operations standpoint, for e.g. In case of a retail application how many orders did I receive today, how many orders were delivered today, how many new users have joined and how many left etc. Even though create order, sell order and user service are implemented as distributed services but they all might be interacting with the same database or different databases for their data persistence.
So we can rely on the application database or multiple application databases to collect the metrics and report them to a single pane. This is what we will try to solve today. This is what our application design will look like —
Let’s bring up the application stack using docker-compose up
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- ./grafana:/var/lib/grafana
networks:
- monitoring
restart: always
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus-local.yaml:/etc/prometheus/prometheus.yml
networks:
- monitoring
restart: always
postgres:
image: postgres:latest
container_name: postgres
ports:
- "5432:5432"
volumes:
- ./postgres:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_USER=postgres
- POSTGRES_DB=postgres
networks:
- monitoring
restart: always
networks:
monitoring:
driver: bridge
prometheus-local.yml has the scrape configuration which looks like this —
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'docker'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['host.docker.internal:8081']
In our case, the agent will be running on port 8081
Now our application stack is up, let’s create some data using this application. Just to explain, we are trying to create books in the database, this app uses Spring JPA to create them —
@Entity
@Table(name = "books")
public class Book {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
private String title;
private BigDecimal price;
private LocalDate publishDate;
}
And the CRUD repository —
public interface BookRepository extends JpaRepository<Book, Long> {
List<Book> findByTitle(String title);
// Custom query
@Query("SELECT b FROM Book b WHERE b.publishDate > :date")
List<Book> findByPublishedDateAfter(@Param("date") LocalDate date);
}
Use the below curl commands to create some books —
curl --location 'http://localhost:8080/books/all' \
--header 'Content-Type: application/json' \
--data '[
{
"title": "Book A",
"price": 9.99,
"publishDate": "2023-08-31"
},
{
"title": "Book B",
"price": 19.99,
"publishDate": "2023-07-31"
},
{
"title": "Book C",
"price": 29.99,
"publishDate": "2023-06-10"
},
{
"title": "Book D",
"price": 39.99,
"publishDate": "2023-05-05"
}
]'
Now we have data in our database, let’s jump into our metrics agent application. We’ll be targeting 2 metrics — count of all books and count of books by title. Here is how to do that using spring JPA —
public int countBooks() {
return (int) bookRepository.count();
}
public List<Book> findAll() {
return bookRepository.findAll();
}
Of course this is very crude way of doing it, in case of large applications we need to think about performance. One way will be to create materialized views on the top of your data structures and then executing custom aggregation queries to get you the metrics you want to extract.
Using Gauge to publish custom metric —
private void publishCustomMetric(String metricName, String description, Set<Tag> tags, double value) {
Gauge.builder(metricName, () -> value)
.strongReference(true)
.description(description)
.tags(tags)
.register(io.micrometer.core.instrument.Metrics.globalRegistry);
}
One can publish the book count —
public void publishBookCountMetric() {
int bookCount = countBooks();
Set<Tag> tags = convertMapToTags(java.util.Map.of("env", "dev"));
publishCustomMetric("book.count", "Count of books", tags, bookCount);
}
And book count by title —
public void publishBookCountByTitleMetric() {
List<Book> books = findAll();
// Using Java streams to count occurrences
Map<String, Long> titleCountMap = books.stream()
.collect(Collectors.groupingBy(
Book::getTitle,
Collectors.counting()
));
titleCountMap.forEach((title, count) -> {
Set<Tag> tags = convertMapToTags(java.util.Map.of("env", "dev", "title", title));
publishCustomMetric("book.count.by.title", "Count of books by title", tags, count);
});
}
And then schedule this metric collector —
@Scheduled(fixedDelay = 5000)
public void publishMetrics() {
publishBookCountMetric();
publishBookCountByTitleMetric();
}
Let’s start the application and build the rest of the stack.
Open browser and let’s view the metrics published by our agent.
- To view all the metrics, type in — http://localhost:8081/actuator/metrics
As you can see, the agent is exposing all its JVM, JDBC, Hikaricp etc. metrics with 3rd and 4th metric as our custom book metrics.
- To view book count, type in — http://localhost:8081/actuator/metrics/book.count
{
"name": "book.count",
"description": "Count of books",
"measurements": [
{
"statistic": "VALUE",
"value": 18
}
],
"availableTags": [
{
"tag": "application",
"values": [
"spring-boot-observability-agent"
]
},
{
"tag": "env",
"values": [
"dev"
]
}
]
}
So we can see a total of 18 books in the database published by our agent.
- To view book count by title, type in — http://localhost:8081/actuator/metrics/book.count.by.title
{
"name": "book.count.by.title",
"description": "Count of books by title",
"measurements": [
{
"statistic": "VALUE",
"value": 18
}
],
"availableTags": [
{
"tag": "application",
"values": [
"spring-boot-observability-agent"
]
},
{
"tag": "title",
"values": [
"Book A",
"Book B",
"Book C",
"Book D"
]
},
{
"tag": "env",
"values": [
"dev"
]
}
]
}
You see there are 4 values of a tag named title here. All these are 4 dimensions and the total is show as 18.
- Now, we configured our application to expose metrics in Prometheus supported format by including — io.micrometer:micrometer-registry-prometheus in build.grade.
To access Prometheus endpoint of our agent, type in — http://localhost:8081/actuator/prometheus
# TYPE book_count gauge
book_count{application="spring-boot-observability-agent",env="dev"} 18.0
# HELP book_count_by_title Count of books by title
# TYPE book_count_by_title gauge
book_count_by_title{application="spring-boot-observability-agent",env="dev",title="Book A"} 5.0
book_count_by_title{application="spring-boot-observability-agent",env="dev",title="Book B"} 3.0
book_count_by_title{application="spring-boot-observability-agent",env="dev",title="Book C"} 6.0
book_count_by_title{application="spring-boot-observability-agent",env="dev",title="Book D"} 4.0
This is how our 2 custom metrics look like.
- Now let’s access and see Prometheus itself. Type in — http://localhost:9090/targets to access the configured targets and see if they look healthy —
- Now click query and try to query one of our metrics —
Voila, this works. Prometheus is able to query our endpoint and collect metrics. This is a time series database and will store all our metrics.
- Finally let’s go to Grafana and create a connection for prometheus and build the dashboard —
That’s pretty much it, one should be able to use this example stack and publish metrics according to their needs. Feel free to contact me in case of any questions. Your feedback is appreciated.
References —
- Springboot JPA data creator repo — https://github.com/paras301/spring-boot-data-jpa
- Observability stack repo — https://github.com/paras301/spring-boot-observability-agent
- Springone micrometer 2024 — https://www.youtube.com/watch?v=X7rODR2m63c