Pluging de HAProxy para Collectd.
Introducción
Por fin encuentro un poco de tiempo para escribir. Los temas que ahora me atañen son sobre SRE (Site Reliability Engineering), y dentro de está multidisciplina podemos hallar un cantidad sorprendente de áreas que no aparecen directamente en su definición, sin embargo, la hacen posible.
Uno de los pilares es el monitoreo de los componentes de la arquitectura que soporta las aplicaciones, incluídos los productos de software utilizados. Con el propósito de controlar el día a día (estadísticas) de comportamiento y desempeño de cada pieza, es necesario instrumentar los recolectores de métricas.
Yo trabajo con Collectd, puesto que me facilita mucho la vida al momento de integrar agentes que me permitan la recolección de métricas en poco tiempo con un consumo trivial de recursos así como la estabilidad que el proyecto goza.
Integración de HAProxy con Collectd
Al día de hoy existen al menos 3 plugins que el sitio de collectd.org menciona, no obstante, ninguno de ellos me funcionó. Puede ser por la versión de HAproxy o simplemente que no fui capaz de hacer los arreglos necesarios para que concretar con éxito la implementación.
La única opción viable (incluso en términos de aprendizaje) fue modificar el plugin a mis necesidades (gracias al SL), así que tomé el plugin de bash y lo modifiqué para que pudiera contemplar más métricas de backend. ¿Por qué en bash? siendo sinceros, es el lenguaje en el que me siento más cómodo. Quiero hacer notar que la adaptación del script fue lo más simple, sin embargo, entender que existe una correlación entre la RRDB (Graphite[Whisper] en este caso) y el script. Por favor, querido lector, ten cuidado con las métricas que recolectas (tipo), el orden en que lo haces y la cantidad de éstas.
Script
El script quedó como a continuación.
#!/bin/sh sock='/run/haproxy/admin.sock' host="${COLLECTD_HOSTNAME}" pause="${COLLECTD_INTERVAL:-10}" while getopts "h:p:s:" c; do case $c in h) host=$OPTARG;; p) pause=$OPTARG;; s) sock=$OPTARG;; *) echo "Usage: $0 [-h] [-p ] [-s ]";; esac done while [ $? -eq 0 ]; do time="$(date +%s)" echo 'show stat' | socat - UNIX-CLIENT:/run/haproxy/admin.sock | sed -e 's/#//gp' | while IFS=","; read pxname svname qcur qmax scur smax slim stot bin bout dreq dresp ereq econ eresp wretr wredis status weight act bck chkfail chkdown lastchg downtime qlimit pid iid sid throttle lbtot tracked type rate rate_lim rate_max check_status check_code check_duration hrsp_1xx hrsp_2xx hrsp_3xx hrsp_4xx hrsp_5xx hrsp_other hanafail req_rate req_rate_max req_tot cli_abrt srv_abrt comp_in comp_out comp_byp comp_rsp lastsess last_chk last_agt qtime ctime rtime ttime; do if [ ! -z "$svname" ]; then [ "$svname" != 'BACKEND' ] && continue echo "PUTVAL $host/haproxy/haproxy_backend-$pxname $time:${stot:-0}:${econ:-0}:${eresp:-0}:${hrsp_2xx:-0}:${hrsp_5xx:-0}:${dresp:-0}:${qcur:-0}:${qtime:-0}:${wredis:-0}:${wretr:-0}:${rtime:-0}:${req_rate:-0}:${req_rate_max:-0}:${req_tot:-0}:${cli_abrt:-0}:${srv_abrt:-0}:${comp_in:-0}:${comp_out:-0}:${comp_byp:-0}:${comp_rsp:-0}:${lastsess:-0}:${last_chk:-0}:${last_agt:-0}:${ctime:-0}:${ttime:-0}:${hrsp_1xx:-0}:${hrsp_3xx:-0}:${hrsp_4xx:-0}:${hrsp_other:-0}:${qmax:-0}:${scur:-0}:${smax:-0}:${slim:-0}:${bin:-0}:${bout:-0}:${dreq:-0}:${ereq:-0}:${weight:-0}:${act:-0}:${bck:-0}:${chkfail:-0}:${chkdown:-0}:${lastchg:-0}:${downtime:-0}:${qlimit:-0}:${pid:-0}:${iid:-0}:${sid:-0}:${throttle:-0}:${lbtot:-0}:${tracked:-0}:${type:-0}:${rate:-0}:${rate_lim:-0}:${rate_max:-0}:${check_status:-0}:${check_code:-0}:${check_duration:-0}:${hanafail:-0}" fi done sleep $pause done
types.db
Este archivo es exclusivo de Graphite, así que también debe modificarse la definición de métricas como sigue.
absolute value:ABSOLUTE:0:U
apache_bytes value:DERIVE:0:U
apache_connections value:GAUGE:0:65535
apache_idle_workers value:GAUGE:0:65535
apache_requests value:DERIVE:0:U
apache_scoreboard value:GAUGE:0:65535
ath_nodes value:GAUGE:0:65535
ath_stat value:DERIVE:0:U
backends value:GAUGE:0:65535
bitrate value:GAUGE:0:4294967295
blocked_clients value:GAUGE:0:U
bytes value:GAUGE:0:U
cache_eviction value:DERIVE:0:U
cache_operation value:DERIVE:0:U
cache_ratio value:GAUGE:0:100
cache_result value:DERIVE:0:U
cache_size value:GAUGE:0:U
capacity value:GAUGE:0:U
ceph_bytes value:GAUGE:U:U
ceph_latency value:GAUGE:U:U
ceph_rate value:DERIVE:0:U
changes_since_last_save value:GAUGE:0:U
charge value:GAUGE:0:U
compression_ratio value:GAUGE:0:2
compression uncompressed:DERIVE:0:U, compressed:DERIVE:0:U
connections value:DERIVE:0:U
conntrack value:GAUGE:0:4294967295
contextswitch value:DERIVE:0:U
count value:GAUGE:0:U
counter value:COUNTER:U:U
cpufreq value:GAUGE:0:U
cpu value:DERIVE:0:U
current_connections value:GAUGE:0:U
current_sessions value:GAUGE:0:U
current value:GAUGE:U:U
delay value:GAUGE:-1000000:1000000
derive value:DERIVE:0:U
df_complex value:GAUGE:0:U
df_inodes value:GAUGE:0:U
df used:GAUGE:0:1125899906842623, free:GAUGE:0:1125899906842623
disk_latency read:GAUGE:0:U, write:GAUGE:0:U
disk_merged read:DERIVE:0:U, write:DERIVE:0:U
disk_octets read:DERIVE:0:U, write:DERIVE:0:U
disk_ops_complex value:DERIVE:0:U
disk_ops read:DERIVE:0:U, write:DERIVE:0:U
disk_time read:DERIVE:0:U, write:DERIVE:0:U
disk_io_time io_time:DERIVE:0:U, weighted_io_time:DERIVE:0:U
dns_answer value:DERIVE:0:U
dns_notify value:DERIVE:0:U
dns_octets queries:DERIVE:0:U, responses:DERIVE:0:U
dns_opcode value:DERIVE:0:U
dns_qtype_cached value:GAUGE:0:4294967295
dns_qtype value:DERIVE:0:U
dns_query value:DERIVE:0:U
dns_question value:DERIVE:0:U
dns_rcode value:DERIVE:0:U
dns_reject value:DERIVE:0:U
dns_request value:DERIVE:0:U
dns_resolver value:DERIVE:0:U
dns_response value:DERIVE:0:U
dns_transfer value:DERIVE:0:U
dns_update value:DERIVE:0:U
dns_zops value:DERIVE:0:U
drbd_resource value:DERIVE:0:U
duration seconds:GAUGE:0:U
email_check value:GAUGE:0:U
email_count value:GAUGE:0:U
email_size value:GAUGE:0:U
entropy value:GAUGE:0:4294967295
expired_keys value:DERIVE:0:U
fanspeed value:GAUGE:0:U
file_handles value:GAUGE:0:U
file_size value:GAUGE:0:U
files value:GAUGE:0:U
flow value:GAUGE:0:U
fork_rate value:DERIVE:0:U
frequency_offset value:GAUGE:-1000000:1000000
frequency value:GAUGE:0:U
fscache_stat value:DERIVE:0:U
gauge value:GAUGE:U:U
haproxy_backend stot:COUNTER:0:U, econ:COUNTER:0:U, eresp:COUNTER:0:U, hrsp_2xx:DERIVE:0:U, hrsp_5xx:DERIVE:0:U, dresp:COUNTER:0:U, qcur:GAUGE:0:U, qtime:GAUGE:0:U, wredis:GAUGE:0:U, wretr:GAUGE:0:U, rtime:GAUGE:0:U, req_rate::GAUGE:0:U, req_rate_max:GAUGE:0:U, req_tot:GAUGE:0:U, cli_abrt:GAUGE:0:U, srv_abrt:GAUGE:0:U, comp_in:GAUGE:0:U, comp_out:GAUGE:0:U, comp_byp:GAUGE:0:U, comp_rsp:GAUGE:0:U, lastsess:GAUGE:0:U, last_chk:GAUGE:0:U, last_agt:GAUGE:0:U, ctime:GAUGE:0:U, ttime:GAUGE:0:U, hrsp_1xx:DERIVE:0:U, hrsp_3xx:DERIVE:0:U, hrsp_4xx:DERIVE:0:U, hrsp_other:GAUGE:0:U, qmax:COUNTER:0:U, scur:GAUGE:0:U, smax:GAUGE:0:U, slim:GAUGE:0:U, bin:GAUGE:0:U, bout:GAUGE:0:U, dreq:GAUGE:0:U, ereq:GAUGE:0:U, weight:GAUGE:0:U, act:GAUGE:0:U, bck:GAUGE:0:U, chkfail:GAUGE:0:U, chkdown:GAUGE:0:U, lastchg:GAUGE:0:U, downtime:GAUGE:0:U, qlimit:GAUGE:0:U, pid:GAUGE:0:U, iid:GAUGE:0:U, sid:GAUGE:0:U, throttle:GAUGE:0:U, lbtot:GAUGE:0:U, tracked:GAUGE:0:U, type:GAUGE:0:U, rate:GAUGE:0:U, rate_lim:GAUGE:0:U, rate_max:GAUGE:0:U, check_status:GAUGE:0:U, check_code:GAUGE:0:U, check_duration:GAUGE:0:U, hanafail:GAUGE:0:U
hash_collisions value:DERIVE:0:U
http_request_methods value:DERIVE:0:U
http_requests value:DERIVE:0:U
http_response_codes value:DERIVE:0:U
humidity value:GAUGE:0:100
if_collisions value:DERIVE:0:U
if_dropped rx:DERIVE:0:U, tx:DERIVE:0:U
if_errors rx:DERIVE:0:U, tx:DERIVE:0:U
if_multicast value:DERIVE:0:U
if_octets rx:DERIVE:0:U, tx:DERIVE:0:U
if_packets rx:DERIVE:0:U, tx:DERIVE:0:U
if_rx_errors value:DERIVE:0:U
if_rx_octets value:DERIVE:0:U
if_tx_errors value:DERIVE:0:U
if_tx_octets value:DERIVE:0:U
invocations value:DERIVE:0:U
io_octets rx:DERIVE:0:U, tx:DERIVE:0:U
io_packets rx:DERIVE:0:U, tx:DERIVE:0:U
ipt_bytes value:DERIVE:0:U
ipt_packets value:DERIVE:0:U
irq value:DERIVE:0:U
latency value:GAUGE:0:U
links value:GAUGE:0:U
load shortterm:GAUGE:0:5000, midterm:GAUGE:0:5000, longterm:GAUGE:0:5000
md_disks value:GAUGE:0:U
memcached_command value:DERIVE:0:U
memcached_connections value:GAUGE:0:U
memcached_items value:GAUGE:0:U
memcached_octets rx:DERIVE:0:U, tx:DERIVE:0:U
memcached_ops value:DERIVE:0:U
memory value:GAUGE:0:281474976710656
memory_lua value:GAUGE:0:281474976710656
multimeter value:GAUGE:U:U
mutex_operations value:DERIVE:0:U
mysql_commands value:DERIVE:0:U
mysql_handler value:DERIVE:0:U
mysql_locks value:DERIVE:0:U
mysql_log_position value:DERIVE:0:U
mysql_octets rx:DERIVE:0:U, tx:DERIVE:0:U
mysql_bpool_pages value:GAUGE:0:U
mysql_bpool_bytes value:GAUGE:0:U
mysql_bpool_counters value:DERIVE:0:U
mysql_innodb_data value:DERIVE:0:U
mysql_innodb_dblwr value:DERIVE:0:U
mysql_innodb_log value:DERIVE:0:U
mysql_innodb_pages value:DERIVE:0:U
mysql_innodb_row_lock value:DERIVE:0:U
mysql_innodb_rows value:DERIVE:0:U
mysql_select value:DERIVE:0:U
mysql_sort value:DERIVE:0:U
nfs_procedure value:DERIVE:0:U
nginx_connections value:GAUGE:0:U
nginx_requests value:DERIVE:0:U
node_octets rx:DERIVE:0:U, tx:DERIVE:0:U
node_rssi value:GAUGE:0:255
node_stat value:DERIVE:0:U
node_tx_rate value:GAUGE:0:127
objects value:GAUGE:0:U
operations value:DERIVE:0:U
packets value:DERIVE:0:U
pending_operations value:GAUGE:0:U
percent value:GAUGE:0:100.1
percent_bytes value:GAUGE:0:100.1
percent_inodes value:GAUGE:0:100.1
pf_counters value:DERIVE:0:U
pf_limits value:DERIVE:0:U
pf_source value:DERIVE:0:U
pf_states value:GAUGE:0:U
pf_state value:DERIVE:0:U
pg_blks value:DERIVE:0:U
pg_db_size value:GAUGE:0:U
pg_n_tup_c value:DERIVE:0:U
pg_n_tup_g value:GAUGE:0:U
pg_numbackends value:GAUGE:0:U
pg_scan value:DERIVE:0:U
pg_xact value:DERIVE:0:U
ping_droprate value:GAUGE:0:100
ping_stddev value:GAUGE:0:65535
ping value:GAUGE:0:65535
players value:GAUGE:0:1000000
power value:GAUGE:0:U
pressure value:GAUGE:0:U
protocol_counter value:DERIVE:0:U
ps_code value:GAUGE:0:9223372036854775807
ps_count processes:GAUGE:0:1000000, threads:GAUGE:0:1000000
ps_cputime user:DERIVE:0:U, syst:DERIVE:0:U
ps_data value:GAUGE:0:9223372036854775807
ps_disk_octets read:DERIVE:0:U, write:DERIVE:0:U
ps_disk_ops read:DERIVE:0:U, write:DERIVE:0:U
ps_pagefaults minflt:DERIVE:0:U, majflt:DERIVE:0:U
ps_rss value:GAUGE:0:9223372036854775807
ps_stacksize value:GAUGE:0:9223372036854775807
ps_state value:GAUGE:0:65535
ps_vm value:GAUGE:0:9223372036854775807
pubsub value:GAUGE:0:U
queue_length value:GAUGE:0:U
records value:GAUGE:0:U
requests value:GAUGE:0:U
response_time value:GAUGE:0:U
response_code value:GAUGE:0:U
route_etx value:GAUGE:0:U
route_metric value:GAUGE:0:U
routes value:GAUGE:0:U
segments value:GAUGE:0:65535
serial_octets rx:DERIVE:0:U, tx:DERIVE:0:U
signal_noise value:GAUGE:U:0
signal_power value:GAUGE:U:0
signal_quality value:GAUGE:0:U
smart_poweron value:GAUGE:0:U
smart_powercycles value:GAUGE:0:U
smart_badsectors value:GAUGE:0:U
smart_temperature value:GAUGE:-300:300
smart_attribute current:GAUGE:0:255, worst:GAUGE:0:255, threshold:GAUGE:0:255, pretty:GAUGE:0:U
snr value:GAUGE:0:U
spam_check value:GAUGE:0:U
spam_score value:GAUGE:U:U
spl value:GAUGE:U:U
swap_io value:DERIVE:0:U
swap value:GAUGE:0:1099511627776
tcp_connections value:GAUGE:0:4294967295
temperature value:GAUGE:U:U
threads value:GAUGE:0:U
time_dispersion value:GAUGE:-1000000:1000000
timeleft value:GAUGE:0:U
time_offset value:GAUGE:-1000000:1000000
total_bytes value:DERIVE:0:U
total_connections value:DERIVE:0:U
total_objects value:DERIVE:0:U
total_operations value:DERIVE:0:U
total_requests value:DERIVE:0:U
total_sessions value:DERIVE:0:U
total_threads value:DERIVE:0:U
total_time_in_ms value:DERIVE:0:U
total_values value:DERIVE:0:U
uptime value:GAUGE:0:4294967295
users value:GAUGE:0:65535
vcl value:GAUGE:0:65535
vcpu value:GAUGE:0:U
virt_cpu_total value:DERIVE:0:U
virt_vcpu value:DERIVE:0:U
vmpage_action value:DERIVE:0:U
vmpage_faults minflt:DERIVE:0:U, majflt:DERIVE:0:U
vmpage_io in:DERIVE:0:U, out:DERIVE:0:U
vmpage_number value:GAUGE:0:4294967295
volatile_changes value:GAUGE:0:U
voltage_threshold value:GAUGE:U:U, threshold:GAUGE:U:U
voltage value:GAUGE:U:U
vs_memory value:GAUGE:0:9223372036854775807
vs_processes value:GAUGE:0:65535
vs_threads value:GAUGE:0:65535
#
# Legacy types
# (required for the v5 upgrade target)
#
arc_counts demand_data:COUNTER:0:U, demand_metadata:COUNTER:0:U, prefetch_data:COUNTER:0:U, prefetch_metadata:COUNTER:0:U
arc_l2_bytes read:COUNTER:0:U, write:COUNTER:0:U
arc_l2_size value:GAUGE:0:U
arc_ratio value:GAUGE:0:U
arc_size current:GAUGE:0:U, target:GAUGE:0:U, minlimit:GAUGE:0:U, maxlimit:GAUGE:0:U
mysql_qcache hits:COUNTER:0:U, inserts:COUNTER:0:U, not_cached:COUNTER:0:U, lowmem_prunes:COUNTER:0:U, queries_in_cache:GAUGE:0:U
mysql_threads running:GAUGE:0:U, connected:GAUGE:0:U, cached:GAUGE:0:U, created:COUNTER:0:U
Plugin
Ya que tenemos el script y el archivo de tipos de métricas con la información necesaria, ahora implementamos el plugin que ejecutará el agente de Collectd.
LoadPlugin execExec "haproxy:haproxy" "/opt/collectd/etc/collect.d/haproxy_stats.sh" "-s" "/run/haproxy/admin.sock" "-h" "myhost" "-p" "10"
Con esto, ya es posible visualizar cada una de las métricas desde el graficador que utilicen. Los archivos que modifiqué los pueden obtener de mi cuenta de Github.
Comentarios
Publicar un comentario