1. 博客/

高可用prometheus监控集群搭建(三)

·394 字·2 分钟
Kubernetes Prometheus
高可用prometheus监控集群部署 - This article is part of a series.
Part 3: This Article

Prometheus的联邦模式,支持了集群的分层扩展及跨服务扩展。 分层扩展允许Prometheus扩展到多数据中心、大规模主机集群,树型拓扑 跨服务扩展是不同类别的监控指标项由不同的prometheus server分别收集 在多k8s集群模式下,每个集群部署prometheus server用于收集该集群相关指标,借助prometheus联邦模式,实现监控数据的统一收集展现及告警通知

联邦模式部署配置
#
创建prometheus-federate数据目录
#
#分别在主机192.168.1.51和192.168.1.52上执行
$ groupadd -g 65534 nfsnobody
$ useradd -g 65534 -u 65534 -s /sbin/nologin  nfsnobody
$ chown nfsnobody. /data/prometheus-federate/
创建storageclass
#
$ cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-federate-lpv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
创建local volume
#
$ cat pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-federate-lpv-0
spec:
  capacity:
    storage: 50Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-federate-lpv
  local:
    path: /data/prometheus-federate
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 192.168.1.51
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-federate-lpv-1
spec:
  capacity:
    storage: 50Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-federate-lpv
  local:
    path: /data/prometheus-federate
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 192.168.1.52
创建prometheus.yml的configmap
#
$ cat configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-federate-config
  namespace: kube-system
data:
  prometheus.yml: |
    global:
      scrape_interval:     30s
      evaluation_interval: 30s
    scrape_configs:
      - job_name: 'federate'
        scrape_interval: 30s
        honor_labels: true
        metrics_path: '/federate'
        params:
          'match[]':
            - '{job=~"kubernetes.*"}'
            - '{job="prometheus"}'
        static_configs:
          - targets:
            - 'prometheus-0.prometheus:9090'
            - 'prometheus-1.prometheus:9090'    
创建headless service
#
$ cat service-statefulset.yaml 
apiVersion: v1
kind: Service
metadata:
  name: prometheus-federate
  namespace: kube-system
spec:
  ports:
    - name: prometheus-federate
      port: 9091
      targetPort: 9091
  selector:
    k8s-app: prometheus-federate
创建statefulset
#
$ cat prometheus-federate-statefulset.yaml 
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus-federate
  namespace: kube-system
  labels:
    k8s-app: prometheus-federate
    kubernetes.io/cluster-service: "true"
spec:
  serviceName: "prometheus-federate"
  podManagementPolicy: "Parallel"
  replicas: 2
  selector:
    matchLabels:
      k8s-app: prometheus-federate
  template:
    metadata:
      labels:
        k8s-app: prometheus-federate
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - prometheus-federate
            topologyKey: "kubernetes.io/hostname"
      priorityClassName: system-cluster-critical
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: prometheus-federate-configmap-reload
        image: "jimmidyson/configmap-reload:v0.1"
        imagePullPolicy: "IfNotPresent"
        args:
          - --volume-dir=/etc/config
          - --webhook-url=http://localhost:9091/-/reload
        volumeMounts:
          - name: config-volume
            mountPath: /etc/config
            readOnly: true
        resources:
          limits:
            cpu: 10m
            memory: 10Mi
          requests:
            cpu: 10m
            memory: 10Mi
      - image: prom/prometheus:v2.11.0
        imagePullPolicy: IfNotPresent
        name: prometheus
        command:
          - "/bin/prometheus"
        args:
          - "--web.listen-address=0.0.0.0:9091"
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
          - "--storage.tsdb.retention=24h"
          - "--web.console.libraries=/etc/prometheus/console_libraries"
          - "--web.console.templates=/etc/prometheus/consoles"
          - "--web.enable-lifecycle"
        ports:
          - containerPort: 9091
            protocol: TCP
        volumeMounts:
          - mountPath: "/prometheus"
            name: prometheus-federate-data
          - mountPath: "/etc/prometheus"
            name: config-volume
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 9091
          initialDelaySeconds: 30
          timeoutSeconds: 30
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: 9091
          initialDelaySeconds: 30
          timeoutSeconds: 30
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 1000m
            memory: 2500Mi
      serviceAccountName: prometheus
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-federate-config
  volumeClaimTemplates:
    - metadata:
        name: prometheus-federate-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "prometheus-federate-lpv"
        resources:
          requests:
            storage: 20Gi
访问prometheus web UI
#

prometheus server的job都已成功抓取;查询up指标,可以获取到相关metric,同时都具有标签cluster=“01”,可用于区别不同集群的指标;此标签是prometheus server在配置文件中external_labels指定

prometheus-federation-targets.jpg

prometheus-federation-up.jpg

高可用prometheus监控集群部署 - This article is part of a series.
Part 3: This Article

Related

高可用prometheus监控集群搭建(一)
·803 字·4 分钟
Kubernetes Prometheus
高可用prometheus监控集群搭建(二)
·496 字·3 分钟
Kubernetes Prometheus
k8s1.14.6集群搭建之kube-flannel部署
·988 字·5 分钟
Kubernetes Flannel