Service Instance Metrics
This page describes the metrics used to monitor state of an a9s MongDB. Service Instance. For further information on how to monitor an a9s Service Instance, see the Set up Monitoring section of the Application Developer's documentation.
Metric Groups
Group Name | Description |
memory | Memory Information |
asserts | Error Handling Information |
network | Network Information |
replicas | Replication Information |
replica sets | Node-specific replication information |
commands | Commands information |
errors | Errors information |
connections | Connection information |
locks | Concurrency Information |
opcounters | Database operations counters |
oplatencies | Database operation latencies |
The metric groups listed above are available both in a9s MongoDB 5.0 and a9s MongoDB 7.0. However, small differences exist between the two versions. We highlight those differences when applicable.
Metric Group Patterns
Each metric group has a set of patterns that describe the metrics contained in each group.
A document that reports on the system architecture of the mongod
and current memory use.
A document that reports on the number of assertions raised since the MongoDB process started. While assert errors are typically uncommon, if there are non-zero values for the asserts, you should examine the log file for more information. In many cases, these errors are trivial, but are worth investigating.
A document that reports data on MongoDB's network use. These statistics measure ingress connections only, specifically
the traffic seen by the mongod
or mongos
over network connections initiated by clients or other mongod
or mongos
instances. Traffic from network connections initiated by this instance (i.e. egress connections) is not measured in
these statistics.
In addition, in a9s MongoDB 7.0 we have the following network metrics from the metrics
A document that reports on the replica set configuration.
For the *.*.mongodb.*.*.*.*.metrics.repl.stateTransition.lastStateTransition
metric we use the following encoding:
MongoDB Value | a9s Value |
stepUp | 1 |
stepDown | 2 |
rollback | 3 |
"" | 0 |
Replica Sets
In addition to these global replication metrics, we also provide metrics for each node in a cluster. The format for these metrics is slightly different as it accounts for the names of each specific node and the names of the replica-sets, as assigned by MongoDB. Note that these metrics are made available only when when the current host is part of a replica set.
The overall format is
<service_guid>.mongodb.<host>.<node>.<space>.<domain>.replSet.<replica_set>.<target_node>.<metric> <metric_value> <metric_timestamp>
As an example, metrics could look like this: 1.0 1709920512
The <node>
part indicates the node in which the metrics are collected. The <target_node>
indicates the node for
which the metric is collected. This distinction is important when trying to interpret the metric names because in
specific nodes we can only obtain certain metrics w.r.t. other nodes.
The metrics listed below are obtained in each node, for each of the nodes in a replica set.
The first metrics in the list above are not followed by any <target_node>
value, because these provide information
about the current node only. Specifically, the primary
and secondary
metrics take boolean values to indicate whether
the current node has a Primary or Secondary status in the replica set.
In case the node has neither status, the myState
metric is informative. The values of this latter metric
correspond to Mongo's Replica States.
Below, we list the collected metrics. Note that some of them behave differently depending on the role of the node (i.e., Primary or Secondary) in which the collection takes place.
The lag
metric denotes the replication lag between the current node and the <target_node>
. This metric is reported
for all other nodes in the replica set if the current node is a Primary. However, if the current node is a Secondary, it
will report this metric only for itself in relation to the other nodes in the set. Note that some caveats may apply for
the lag
metric. In particular, if there is a partition in the cluster, the nodes might not be able to communicate
with each other to the point that they are not able to infer the replication lag.
From a Primary's perspective, the lag will be reported as -1
for each node that is not reachable/healthy.
From a Secondary's perspective, the lag that is reported will be -1
only if this node is _isolated_
, i.e. none of
the other nodes are reachable/healthy. If any of the other nodes is reachable, the lag will be calculated based on the
state of the current node and the other healthy node(s).
Similarly, the seconds_since_last_contact
is reported for all other nodes if the current node is a Primary. This is
also the case if the current node is a Secondary. In cases of partitions, the value of this metric will keep increasing.
Note that our definition of healthy
coincides with a state
value of 1 (primary) or 2 (secondary). If the nodes enter
any other states they are in a non-healthy state like recovering or doing startup, for more information see
Replica States.
A document that reports on the use of database commands. The fields in metrics.commands
are the names of database
commands. For each command, the serverStatus
reports the total number of executions and the number of failed
*.*.mongodb.*.*.*.*.metrics.commands.create.validator.jsonSchema 0
*.*.mongodb.*.*.*.*.metrics.commands.collMod.validator.jsonSchema 0
*.*.mongodb.*.*.*.*.metrics.commands.aggregate.allowDiskUseTrue 0
Note that the list above is based on the metrics offered by a9s MongoDB 5.0. In a9s MongoDB 7.0 the following metrics are not offered anymore:
A document that reports on getLastError
A document that reports on the status of the connections. Use these values to assess the current load and capacity requirements of the server.
In a9s MongoDB 7.0, we also provide the following:
Operation Counters
A document that reports on database operations by type since the mongod
instance last started. These numbers will grow
over time until the next restart. Analyze these values over time to track database utilization.
Operation Latencies
A document that reports on the latencies of certain database operations. The operation types listed here should be interpreted alongside the ones in the previous section.
The queryableEncryptionLatencyMicros
metric is only available in a9s MongoDB 7.0.
A document that reports on concurrency operations such as the acquiring of locks. These metrics can be useful to see the modes of operations and whether there is high contention for certain operations. These metrics are only available in a9s MongoDB 7.0.
The lock mode
s reported by the metrics above are described below:
Lock Mode | Lock Type | Description |
w | Intent Exclusive Lock | A write lock on a coarse-grained resource (e.g. a collection) which enables exclusive write locks to be used for that resource. |
W | Exclusive Lock | A write lock on a fine-grained resource (e.g. a record) which allows only one transaction to modify the resource and none to read. |
r | Intent Shared Lock | A read lock on a coarse-grained resource (e.g. a collection) which enables exclusive read locks to be used for that resource. |
R | Shared Lock | A read lock on a fine-grained resource (e.g. a record) which allows transactions to read the resource, but not write to it. |
As you might have noticed, the metric
document contains subdocuments that provide numerous metrics. However, it can
also provide some simpler metrics that reflect the current use and state of a running mongod
In a9s MongoDB 7.0, the following are also available:
Note that as part of the metrics
document, there are many subdocuments that provide numerous other metrics. We expand
on some of these documents in the following sections.
Uncategorized collection of metrics.
In a9s MongoDB 7.0, we also provide *.*.mongodb.*.*.*.*.extra_info.threads