PART 1 — Red Hat OpenStack Platform (OSP) Service Telemetry Framework (STF) with OpenShift Container Platform (OCP4)

Kerem Çeliker
7 min readMar 13, 2022

--

INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK

RHOSP 16 traditional Service Telemetry is disabled by default and the Service Telemetry Framework is recommended. One of the central pieces when operating a Red Hat OpenStack environment is the monitoring system. You can use the centralized information in your monitoring system as the source for alerts, visualization, or the source of truth for orchestration frameworks.

Service Telemetry Framework (STF) is an application that runs on Red Hat OpenShift Container Platform (OCP) that provides metrics and events data collection from OpenStack infrastructure, fast and reliable transport of data, and built-in data storage and alerting capabilities

What’s Service Telemetry Framework ?

The Service Telemetry Framework runs on Red Hat OpenShift Container Platform (OCP). This OCP must be independent of the RHOSP subject to telemetry.

It’s all about RHOSP Telemetry Architecture. There is a desire to centrally manage multiple cloud environments using the Service Telemetry Framework, and OCP can operate a scalable monitoring environment.

Service Telemetry Framework architecture

The RHOSP and OCP environments and the components to be deployed are as follows. Red Hat supports the data collection components of the Service Telemetry Framework, collectd / Ceilometer, and the transport components AMQ Interconnect and Smart Gateway. Prometheus, ElasticSearch, and the visualization component Grafana use community-supported ones. We use both collectd and Ceilometer for data collection because collectd alone cannot collect OpenStack metrics.

  • Red Hat OpenShift Container Platform 4.7
  • AMQ Certificate Manager Operator
  • Elastic Cloud is a Kubernetes Operator
  • Service Telemetry Operator(Service Telemetry Framework 1.3 or newer)
  • Grafana operator
  • Red Hat OpenStack Platform 16.x
  • AMQ Interconnect
  • Ceilometer
  • Collectd
  • Be sure to read the reference link below to get clear Simple and Entry-Level information

Link: https://red.ht/3w4iPc2

In order to better understand STF, I will explain this with a Short-Demo Installation article at the minimal and simple level. This will give you a better understanding of this technology, which is often used in Enterprise Banking & eCommerce, Telco and Cloud sectors.

We will deploy 2 core deployments:

  • Red Hat OpenStack 16.x
  • Red Hat OpenShift 4.7 or newer

STF would be installed as an OCP application. It uses the following components:

  • collectd to collect metrics
  • Prometheus as time-series data storage
  • ElasticSearch as events data storage
  • An AMQP 1.x compatible messaging bus to shuttle the metrics to STF for storage in Prometheus

Smart Gateway to pick metrics and events from the AMQP 1.x bus and to deliver events to ElasticSearch or to provide metrics to Prometheus.

This configs will have 3 instances:

  • 1 x Bastion/Jump Host
  • 1 x RHOSP standalone 16.x (All-In-One)
  • 1 x Code Ready containers (CRC) instance where the STF workload will be installed. (If you have OCP in your lab, pls go forward with same steps.)
  • We will be using the 192.168.47.0 private management network. (Of course — you can use different network IP pool)
  • During this article we will be stressing our system so that CPU and memory alerts are triggered. In order to ease this task, Please be sure installed the “stress-ng” command in RHOSP 16

***I will be briefly showing you the “Stress Tests” in Part 3 with examples***

**Installation STF “stress-ng” tool on Host.

  1. Open a terminal and connect to the OSP all-in-one host.
  2. Install stress-ng:
  3. sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
  4. sudo dnf install -y stress-ng

The underlying system is an OSP 16 which is sending and collecting data via collectd and Ceilometer, that data is transported via the AMQP interconnect message bus, and is stored into the storage backend which is consisting of Prometheus and elasticsearch.

The following table describes the application of the client and server components:

1.Infrastructure Contents to be deployed step by step as the follows below;

  • Deploy Service Telemetry Framework in OCP environment
  • Create a Service Telemetry object in OCP
  • Configure RHOSP to utilize the Service Telemetry Framework
  • Operation checking.

2.Deploy Service Telemetry Framework in OCP environment

  • First, create a namespace for service-telemetry. (STF)

$ oc new-project service-telemetry

  • Create an OperatorGroup.

$ oc apply -f — <<EOF

apiVersion: operators.coreos.com/v1

kind: OperatorGroup

metadata:

name: service-telemetry-operator-group

namespace: service-telemetry

spec:

targetNamespaces:

- service-telemetry

  • Enable OperatorHub.io Community Catalog Source to use community operators such as ElasticSearch and Grafana.

$ oc apply -f — <<EOF

apiVersion: operators.coreos.com/v1alpha1

kind: CatalogSource

metadata:

name: operatorhubio-operators

namespace: openshift-marketplace

spec:

sourceType: grpc

image: quay.io/operator-framework/upstream-community-operators:latest

displayName: OperatorHub.io Operators

publisher: OperatorHub.io

EOF

  • Enable Red Hat STF Operators Catalog Source to take advantage of the Service Telemetry Framework.

$ oc apply -f

apiVersion: operators.coreos.com/v1alpha1

kind: CatalogSource

metadata:

name: redhat-operators-stf

namespace: openshift-marketplace

spec:

displayName: Red Hat STF Operators

image: quay.io/redhat-operators-stf/stf-catalog:v4.7

publisher: Red Hat

sourceType: grpc

updateStrategy:

registryPoll:

interval: 30m

EOF

  • Deploy the AMQ Certificate Manager Operator.

$ oc apply -f — <<EOF

apiVersion: operators.coreos.com/v1alpha1

kind: Subscription

metadata:

name: amq7-cert-manager-operator

namespace: openshift-operators

spec:

channel: alpha

installPlanApproval: Automatic

name: amq7-cert-manager-operator

source: redhat-operators-stf

sourceNamespace: openshift-marketplace

targetNamespaces: global

EOF

  • Check the ClusterServiceVersion and make sure it is “Succeeded”.

$ oc get — namespace openshift-operators csv

  • Deploy the Kubernetes Operator Elastic Cloud

$ oc apply -f — <<EOF

apiVersion: operators.coreos.com/v1alpha1

kind: Subscription

metadata:

name: elastic-cloud-eck

namespace: service-telemetry

spec:

channel: stable

installPlanApproval: Automatic

name: elastic-cloud-eck

source: operatorhubio-operators

sourceNamespace: openshift-marketplace

EOF

  • Deploy the Service Telemetry Operator.

$ oc apply -f

apiVersion: operators.coreos.com/v1alpha1

kind: Subscription

metadata:

name: service-telemetry-operator

namespace: service-telemetry

spec:

channel: stable-1.3

installPlanApproval: Automatic

name: service-telemetry-operator

source: redhat-operators

sourceNamespace: openshift-marketplace

  • We’ll do this in a later step in the official documentation, but we’ll deploy the Grafana Operator to take advantage of Dashboard.

$ oc apply -f — <<EOF

apiVersion: operators.coreos.com/v1alpha1

kind: Subscription

metadata:

name: grafana-operator

namespace: service-telemetry

spec:

channel: alpha

installPlanApproval: Automatic

name: grafana-operator

source: operatorhubio-operators

sourceNamespace: openshift-marketplace

EOF

  • Make sure all required components are successfully deployed.

$ oc get csv — namespace service-telemetry

  • Open OpenShift Platform console, it’s listed under “Installed Operators” in “Project — service-telemetry” (Please check it out)

Create a Service Telemetry object in OCP

The main parameters of the ServiceTelemetry object are:

  • Alerting

Create alert rules in Prometheus and send alerts to Alertmanager.

  • Backends

Enables storage and specifies storage for storing metrics and events. Currently the metric backend is Prometheus and the event backend is ElasticSearch.

  • Clouds

Define the cloud that connects to the STF and specify the metric and event collector.

  • Graphing

Set to visualize the metrics collected by collectd using Grafana.

  • High-Availability

Set the redundancy of STF components. Currently STF is not a complete fault tolerant system and the metrics and events being recovered are not guaranteed.

  • Transport

Enables and configures the STF message bus. Currently only AMQ Interconnect is supported.

Create a Service Telemetry object by setting enabled: true for the service you want to use, such as alerting or graphing. Specify the clouds parameter for OpenStack metrics and event collection.

apiVersion: infra.watch/v1beta1

kind: ServiceTelemetry

metadata:

name: default

spec:

alerting:

enabled: true

alertmanager:

storage:

strategy: persistent

persistent:

storageSelector: {}

pvcStorageRequest: 30G

backends:

metrics:

prometheus:

enabled: true

scrapeInterval: 10s

storage:

strategy: persistent

retention: 24h

persistent:

storageSelector: {}

pvcStorageRequest: 30G

events:

elasticsearch:

enabled: true

storage:

strategy: persistent

persistent:

pvcStorageRequest: 30Gi

graphing:

enabled: true

claw:

ingressEnabled: false

adminPassword: secret

adminUser: root

disableSignoutMenu: false

transport:

qdr:

enabled: true

web:

enabled: false

highAvailability:

enabled: false

clouds:

- name: cloud1

metrics:

collectors:

- collectorType: collectd

subscriptionAddress: collectd/telemetry

- collectorType: ceilometer

subscriptionAddress: anycast/ceilometer/metering.sample

events:

collectors:

- collectorType: collectd

subscriptionAddress: collectd/notify

- collectorType: ceilometer

subscriptionAddress: anycast/ceilometer/event.sample

When the ServiceTelemetry object is created, the pod will run according to the settings.

$ oc get pods

You can check it not only in the CLI but also in Developer => Topology in the OpenShift console.

Note:

You are responsible for the direct implementation of the article in live and test environments. This article only describes the STF and provides the most basic of installation steps and tests. At some points, content was created with help from different sources.

References:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/service_telemetry_framework_release_notes_1.3/index

https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html

https://github.com/infrawatch/service-telemetry-operator/blob/master/deploy/alerts/alerts.yaml

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2

https://developers.redhat.com/products/codeready-containers/overview

https://youtu.be/xh-GEJMpaQk

https://www.youtube.com/watch?v=yw0MJQh643s

--

--

Kerem Çeliker

Red Hat Accelerator Awarded 2021/2024 | IBM Champion 2021/2023 | HashiCorp Ambassador | VMware vExpertPRO | Amazon AWS Cloud SAA | VxRail & Nutanix SE Champ.