Kafka metrics
Overview
Metrics for a Kafka instance provide details about the performance of the Kafka brokers and topics that can be helpful in troubleshooting issues or fine-tuning the performance of the Kafka topics.
Context and Layout
Context
In the managed Kafka service, the Kafka metrics display on the first tab of the resource-level view.
There is an open question remaining about this tab label and whether “Dashboard” is an appropriate label or if “Overview” would be better suited.
The example above is meant to show context for where this UI displays within the application, however this example is outdated and the latest version of this UI is provided in the sections below.
Layout
Information alert about lag
Given that the API that provides metrics is separate from the API that manages topics, it's possible that the user could define new topics that are not reflected yet on this page that shows metrics. This means the number of topics and topic partitions returned on this tab might not reflect the same number shown on the Topics tab.
Counts
Some metrics displays as a total count in the first row of cards.
Charts
Metrics that display as charts are divided into 2 separate columns. The first column shows charts related to Kafka brokers. The second column shows charts related to Kafka topics. Each column includes a toolbar that affects what data shows in the charts in that column.
Refer to the last section, Metric popover contents, to see help contents for each chart.
Counts
The three counts that display in the first row are:
- Topics
- Topic partitions
- Consumer groups
One of the limits associated with a Kafka instance is related to the total number of partitions a Kafka instance can have. This limit displays in the card that shows the number of topic partitions. As this value approaches the limit, a warning indicator displays next to the value and a collapsed inline alert displays at the bottom of the card. Once the limit is reached, the warning indication is replaced with an error indication, and the inline alert title changes.
![]() | ![]() | ![]() |
Warning collapsed | Warning expanded | Error expanded |
Kafka instance metrics — toolbar
The column that displays metrics for the Kafka instance includes a toolbar that affects all charts in this column.
The toolbar in the Kafka instance metrics column provides the following elements:
- Filter by broker - A Kafka instance is made up of 1 or more brokers. The standard sizes offered for the managed Kafka service provided 3 brokers for size 1 and 6 brokers for size 2. This menu lets the user filter all charts to show data for either all brokers or a specific broker.
- Relative time range - Allows the user to see metrics for a specified time range. Time range options are relative to the present time.
- Refresh - Metrics are not refreshed automatically. The are retrieved on page load and again when the user clicks Refresh.
- Last update - Since metrics are refreshed on demand, these contents communicate how long ago the metrics were last retrieved.
![]() | ![]() | |
Brokers filter menu | Relative time range menu | Refresh and last update |
Kafka instance metrics — charts
Used disk space
This is one of the charts associated with a limit on the service. The chart shows both the used disk space along with the current limit.
A toggle group displays allowing the user to show metrics per broker. When Per broker is selected, the limit should also reflect the per-broker limit vs the total limit.
When the column is filtered to show metrics for a specific broker, the toggle group is disabled and “Per broker” is selected automatically.
Partition size
In the Kafka instance column, the Partition size chart shows the top 10 or 20 partitions for either all brokers or the selected broker.
There are a couple of use cases that this chart is meant to support:
- One partition is overloaded - This can happen with the same key field is specified for all records (a common misconfiguration) or when the number of partitions specified in a topic is 1.
- One broker is getting full - This can happen when the largest partitions are assigned to the same broker. The user can filter this column by the broker that's near capacity to identify which partitions in which topics are contributing to the increased disk space.
Client connections
This is one of the charts associated with a limit on the service. The chart shows both the total connections along with the current limit.
Connection attempt rate
This is one of the charts associated with a limit on the service. The chart shows both the number of attempted client connections per second along with the current limit.
Topic metrics — toolbar
The column that displays metrics for the Kafka topics also includes a toolbar that affects all charts in this column.
The toolbar in the Kafka instance metrics column provides the following elements:
- Filter by topic - This menu lets the user filter all charts to show data for either all topics or a specific topic.
The other elements in the toolbar have the same behavior as the elements in the Kafka instance toolbar.
- Relative time range - Allows the user to see metrics for a specified time range. Time range options are relative to the present time.
- Refresh - Metrics are not refreshed automatically. The are retrieved on page load and again when the user clicks Refresh.
- Last update - Since metrics are refreshed on demand, these contents communicate how long ago the metrics were last retrieved.
Topic filter menu
Topic metrics — charts
Bytes incoming and outgoing
While ingress and egress rate are a limit associated with a Kafka instance, this chart does not show a rate, and therefore the limit is not visible.
Incoming message rate
Partition size
By default, the topic charts are filtered to show metrics for all topics. However partition size in this column only shows metrics per topics. Therefore the initial view of this chart is an empty state, prompting the user to filter the column by a specific topic.
To see top partitions across the Kafka instance, the user would use the Partition size chart in the first column.
Otherwise, the behavior and contents in the partition size chart are similar to that of the first column.
Metric popover contents
The following descriptions display in the popovers for the metrics on this page.
Metric label | Description text |
Topics | Topics are event logs in a Kafka instance. This metric shows the total number of topics in the Kafka instance. |
Topic partitions | Topic partitions are divisions in a topic that are used for data sharding and replication. This metric shows the total number of partitions in the Kafka instance. |
Consumer groups | Consumer groups are sets of consumers that share a data stream generated by producers. This metric shows the total number of consumer groups in the Kafka instance. |
Kafka instance metrics | |
Used disk space | Used disk space is the amount of disk space used by the Kafka broker in the instance. This metric enables you to assess available disk space relative to the limit. To reduce used disk space, you can adjust topic retention time or other topic properties as needed. |
Partition size | Partition size is the log size of a partition in a topic. This metric enables you to assess the amount of data used by partitions in your Kafka instance as a whole, or in a selected broker. To reduce partition sizes in a topic, you can decrease the retention time or the retention size, or modify the cleanup policy. You can also increase the number of partitions for the topic. |
Client connections | Client connections are the total connections across all clients over time. Clients can make multiple connections to multiple brokers in the Kafka instance. This metric enables you to review client activity in the instance and to assess available connections relative to the limit. |
Connection attempt rate | Connection attempt rate is the number of attempted client connections per second for the Kafka instance over time. This metric enables you to review client activity spikes in the instance and to assess available connections per second relative to the limit. |
Topic metrics | |
Bytes incoming and outgoing | Bytes incoming and outgoing are the total bytes for all topics or total bytes for a selected topic in the Kafka instance. This metric enables you to assess data transfer in and out of your Kafka instance. To modify incoming and outgoing bytes, you can adjust topic message size or other topic properties as needed. |
Incoming message rate | Incoming message rate is the number of messages per second received by one or more topics in the Kafka instance over time. This metric enables you to verify that messages are produced to topics and that producers are functioning properly. To modify the incoming messages per topic, you can adjust the corresponding producer as needed. |
Partition size | Partition size is the log size of a partition in a topic. This metric enables you to assess the amount of data used by partitions in the selected topic. To reduce partition sizes in a topic, you can decrease the retention time or the retention size, or modify the cleanup policy. You can also increase the number of partitions for the topic. |