Checking OKD Cluster Work

In this article:

Viewing System Resource Volume Used by Containers

Checking Work of Pods on Each Cluster Node and Auditing Their Logs

Checking Access to Data Source

Checking Cluster Events

To execute diagnostics and eliminate system errors, open the OKD application:

Application capabilities:

Viewing System Resource Volume Used by Containers

To view CPU and RAM usage by containers, open the Monitoring > Dashboards subsection and select the Kubernetes/Compute Resources/Namespace (Pods) dashboard in the Dashboards drop-down list:

By default, a chart displays statistics of system resources usage volume by all pods. To show the statistic by a specific pod, hover the cursor on the chart point. A tooltip is displayed showing a pod name and corresponding resource usage value.

Chart examples:

Checking Work of Pods on Each Cluster Node and Auditing Their Logs

To check work of pods on each cluster node:

  1. Open the Compute > Nodes subsection:

The table displays a list of cluster nodes with system characteristics.

  1. Click the cluster node name in the Name column to view detailed data about a specific cluster node.

  2. Go to the Pods tab:

The tab displays a table with a list of pods.

  1. Analyze the work of pods by parameters (columns):

Pod work is considered incorrect:

  1. Restart the pod in case of incorrect work.

If incorrect behavior persists after the pod restart, audit pod logs.

To audit pod logs on the Pods tab:

  1. Click the pod name in the Name column to view detailed data about a specific pod.

TIP. It is recommended to check the following pods: fmp-api, fmp-auth, fmp-dashboard, fmp-rpc, fmp-web.

  1. Go to the Logs tab:

  1. Sort pod logs by execution time using the command:

kubectl logs <pod name> -n <mobile platform server namespace> | sort -k 23

After this the list of logs is displayed sorted by process execution time from the fastest to the slowest ones.

  1. Analyze pod logs. The example of logs that indicate incorrect work of containers of the fmp-api pod (it is also actual for fmp-auth, fmp-dashboard, fmp-rpc, fmp-web):

{"request_id"="-","ip": "10.129.2.28", "timestamp": "[08/Sep/2022:10:54:06 +0000]", "http_method: "GET", "URN": "/metrics", "query_string": "", "response_status": 200, "response_length": 1948, "response_time": 0.045690}

Explanation of the example:

{"request_id"="<request identifier>", "ip": "<Client IP address>", "timestamp": "<time of sending response to client>", "http_method: "<HTTP method of request>", "URN": "<URN>", "query_string": "values of GET parameters", "response_status": <response status>, "response_length": <response length>, "response_time": <response processing time>}

The logs that are different from those in the example may indicate the incorrect pod work. If pod logs analysis did not result in finding the reason of incorrect work, check access to data source.

If a mobile platform server returns the 502 error, analyze logs of the fmp-nginx pod, which characterize the work of the nginx proxy service.

Checking Access to Data Source

To check access to data source, send the test request to mobile platform server:

If a test request to mobile platform server is not executed, send the request to data source:

NOTE. If a request is executed only from the container, perhaps, proxy services (nginx, ingres of cluster, and so on) work incorrectly. To detect such a service, send a request to each proxy level. if the request is not executed from the container, check timeouts.

Timeouts specified on the proxy server and the framework, should correspond to the actual time of request execution. Check the set timeouts on the proxy server before the cluster, on the cluster (Ingress Controller) and in the data source. If timeouts are specified correctly, check cluster events.

Checking Cluster Events

To check and analyze cluster events, execute one of the operations:

kubectl get events -n <mobile platform server namespace>

After executing the operation the information about cluster events is obtained. For details about cluster events detailing, see OKD documentation.

See also:

System Error Monitoring