Pluging de HAProxy para Collectd.

Introducción

Por fin encuentro un poco de tiempo para escribir. Los temas que ahora me atañen son sobre SRE (Site Reliability Engineering), y dentro de está multidisciplina podemos hallar un cantidad sorprendente de áreas que no aparecen directamente en su definición, sin embargo, la hacen posible. 

Uno de los pilares es el monitoreo de los componentes de la arquitectura que soporta las aplicaciones, incluídos los productos de software utilizados. Con el propósito de controlar el día a día (estadísticas) de comportamiento y desempeño de cada pieza, es necesario instrumentar los recolectores de métricas. 

Yo trabajo con Collectd, puesto que me facilita mucho la vida al momento de integrar agentes que me permitan la recolección de métricas en poco tiempo con un consumo trivial de recursos así como la estabilidad que el proyecto goza. 

Integración de HAProxy con Collectd


Al día de hoy existen al menos 3 plugins que el sitio de collectd.org menciona, no obstante, ninguno de ellos me funcionó. Puede ser por la versión de HAproxy o simplemente que no fui capaz de hacer los arreglos necesarios para que concretar con éxito la implementación.

La única opción viable (incluso en términos de aprendizaje) fue modificar el plugin a mis necesidades (gracias al SL), así que tomé el plugin de bash y lo modifiqué para que pudiera contemplar más métricas de backend. ¿Por qué en bash? siendo sinceros, es el lenguaje en el que me siento más cómodo. Quiero hacer notar que la adaptación del script fue lo más simple, sin embargo, entender que existe una correlación entre la RRDB (Graphite[Whisper] en este caso) y el script.  Por favor, querido lector, ten cuidado con las métricas que recolectas (tipo), el orden en que lo haces y la cantidad de éstas.

Script


 El script quedó como a continuación.

 #!/bin/sh
 
 sock='/run/haproxy/admin.sock'
 host="${COLLECTD_HOSTNAME}"
 pause="${COLLECTD_INTERVAL:-10}"
 
 while getopts "h:p:s:" c; do
        case $c in
                h)      host=$OPTARG;;
                p)      pause=$OPTARG;;
                s)      sock=$OPTARG;;
                *)      echo "Usage: $0 [-h ] [-p ] [-s ]";;
        esac
 done
 
 while [ $? -eq 0 ]; do
        time="$(date +%s)"
 echo 'show stat' | socat - UNIX-CLIENT:/run/haproxy/admin.sock | sed -e 's/#//gp' | while IFS=","; read pxname svname qcur qmax scur smax slim stot bin bout dreq dresp ereq econ eresp wretr wredis status weight act bck chkfail chkdown lastchg downtime qlimit pid iid sid throttle lbtot tracked type rate rate_lim rate_max check_status check_code check_duration hrsp_1xx hrsp_2xx hrsp_3xx hrsp_4xx hrsp_5xx hrsp_other hanafail req_rate req_rate_max req_tot cli_abrt srv_abrt comp_in comp_out comp_byp comp_rsp lastsess last_chk last_agt qtime ctime rtime ttime; do
               if [ ! -z "$svname" ]; then
                   [ "$svname" != 'BACKEND' ] && continue
                   echo "PUTVAL $host/haproxy/haproxy_backend-$pxname $time:${stot:-0}:${econ:-0}:${eresp:-0}:${hrsp_2xx:-0}:${hrsp_5xx:-0}:${dresp:-0}:${qcur:-0}:${qtime:-0}:${wredis:-0}:${wretr:-0}:${rtime:-0}:${req_rate:-0}:${req_rate_max:-0}:${req_tot:-0}:${cli_abrt:-0}:${srv_abrt:-0}:${comp_in:-0}:${comp_out:-0}:${comp_byp:-0}:${comp_rsp:-0}:${lastsess:-0}:${last_chk:-0}:${last_agt:-0}:${ctime:-0}:${ttime:-0}:${hrsp_1xx:-0}:${hrsp_3xx:-0}:${hrsp_4xx:-0}:${hrsp_other:-0}:${qmax:-0}:${scur:-0}:${smax:-0}:${slim:-0}:${bin:-0}:${bout:-0}:${dreq:-0}:${ereq:-0}:${weight:-0}:${act:-0}:${bck:-0}:${chkfail:-0}:${chkdown:-0}:${lastchg:-0}:${downtime:-0}:${qlimit:-0}:${pid:-0}:${iid:-0}:${sid:-0}:${throttle:-0}:${lbtot:-0}:${tracked:-0}:${type:-0}:${rate:-0}:${rate_lim:-0}:${rate_max:-0}:${check_status:-0}:${check_code:-0}:${check_duration:-0}:${hanafail:-0}"
               fi
        done
        sleep $pause
 done



types.db


Este archivo es exclusivo de Graphite, así que también debe modificarse la definición de métricas como sigue.

absolute                value:ABSOLUTE:0:U
apache_bytes            value:DERIVE:0:U
apache_connections      value:GAUGE:0:65535
apache_idle_workers     value:GAUGE:0:65535
apache_requests         value:DERIVE:0:U
apache_scoreboard       value:GAUGE:0:65535
ath_nodes               value:GAUGE:0:65535
ath_stat                value:DERIVE:0:U
backends                value:GAUGE:0:65535
bitrate                 value:GAUGE:0:4294967295
blocked_clients value:GAUGE:0:U
bytes                   value:GAUGE:0:U
cache_eviction          value:DERIVE:0:U
cache_operation         value:DERIVE:0:U
cache_ratio             value:GAUGE:0:100
cache_result            value:DERIVE:0:U
cache_size              value:GAUGE:0:U
capacity        value:GAUGE:0:U
ceph_bytes              value:GAUGE:U:U
ceph_latency    value:GAUGE:U:U
ceph_rate                       value:DERIVE:0:U
changes_since_last_save   value:GAUGE:0:U
charge                  value:GAUGE:0:U
compression_ratio       value:GAUGE:0:2
compression             uncompressed:DERIVE:0:U, compressed:DERIVE:0:U
connections             value:DERIVE:0:U
conntrack               value:GAUGE:0:4294967295
contextswitch           value:DERIVE:0:U
count                   value:GAUGE:0:U
counter                 value:COUNTER:U:U
cpufreq                 value:GAUGE:0:U
cpu                     value:DERIVE:0:U
current_connections     value:GAUGE:0:U
current_sessions        value:GAUGE:0:U
current                 value:GAUGE:U:U
delay                   value:GAUGE:-1000000:1000000
derive                  value:DERIVE:0:U
df_complex              value:GAUGE:0:U
df_inodes               value:GAUGE:0:U
df                      used:GAUGE:0:1125899906842623, free:GAUGE:0:1125899906842623
disk_latency            read:GAUGE:0:U, write:GAUGE:0:U
disk_merged             read:DERIVE:0:U, write:DERIVE:0:U
disk_octets             read:DERIVE:0:U, write:DERIVE:0:U
disk_ops_complex        value:DERIVE:0:U
disk_ops                read:DERIVE:0:U, write:DERIVE:0:U
disk_time               read:DERIVE:0:U, write:DERIVE:0:U
disk_io_time            io_time:DERIVE:0:U, weighted_io_time:DERIVE:0:U
dns_answer              value:DERIVE:0:U
dns_notify              value:DERIVE:0:U
dns_octets              queries:DERIVE:0:U, responses:DERIVE:0:U
dns_opcode              value:DERIVE:0:U
dns_qtype_cached        value:GAUGE:0:4294967295
dns_qtype               value:DERIVE:0:U
dns_query               value:DERIVE:0:U
dns_question            value:DERIVE:0:U
dns_rcode               value:DERIVE:0:U
dns_reject              value:DERIVE:0:U
dns_request             value:DERIVE:0:U
dns_resolver            value:DERIVE:0:U
dns_response            value:DERIVE:0:U
dns_transfer            value:DERIVE:0:U
dns_update              value:DERIVE:0:U
dns_zops                value:DERIVE:0:U
drbd_resource   value:DERIVE:0:U
duration                seconds:GAUGE:0:U
email_check             value:GAUGE:0:U
email_count             value:GAUGE:0:U
email_size              value:GAUGE:0:U
entropy                 value:GAUGE:0:4294967295
expired_keys    value:DERIVE:0:U
fanspeed                value:GAUGE:0:U
file_handles            value:GAUGE:0:U
file_size               value:GAUGE:0:U
files                   value:GAUGE:0:U
flow                    value:GAUGE:0:U
fork_rate               value:DERIVE:0:U
frequency_offset        value:GAUGE:-1000000:1000000
frequency               value:GAUGE:0:U
fscache_stat            value:DERIVE:0:U
gauge                   value:GAUGE:U:U
haproxy_backend   stot:COUNTER:0:U, econ:COUNTER:0:U, eresp:COUNTER:0:U, hrsp_2xx:DERIVE:0:U, hrsp_5xx:DERIVE:0:U, dresp:COUNTER:0:U, qcur:GAUGE:0:U, qtime:GAUGE:0:U, wredis:GAUGE:0:U, wretr:GAUGE:0:U, rtime:GAUGE:0:U, req_rate::GAUGE:0:U, req_rate_max:GAUGE:0:U, req_tot:GAUGE:0:U, cli_abrt:GAUGE:0:U, srv_abrt:GAUGE:0:U, comp_in:GAUGE:0:U, comp_out:GAUGE:0:U, comp_byp:GAUGE:0:U, comp_rsp:GAUGE:0:U, lastsess:GAUGE:0:U, last_chk:GAUGE:0:U, last_agt:GAUGE:0:U, ctime:GAUGE:0:U, ttime:GAUGE:0:U, hrsp_1xx:DERIVE:0:U, hrsp_3xx:DERIVE:0:U, hrsp_4xx:DERIVE:0:U, hrsp_other:GAUGE:0:U, qmax:COUNTER:0:U, scur:GAUGE:0:U, smax:GAUGE:0:U, slim:GAUGE:0:U, bin:GAUGE:0:U, bout:GAUGE:0:U, dreq:GAUGE:0:U, ereq:GAUGE:0:U, weight:GAUGE:0:U, act:GAUGE:0:U, bck:GAUGE:0:U, chkfail:GAUGE:0:U, chkdown:GAUGE:0:U, lastchg:GAUGE:0:U, downtime:GAUGE:0:U, qlimit:GAUGE:0:U, pid:GAUGE:0:U, iid:GAUGE:0:U, sid:GAUGE:0:U, throttle:GAUGE:0:U, lbtot:GAUGE:0:U, tracked:GAUGE:0:U, type:GAUGE:0:U, rate:GAUGE:0:U, rate_lim:GAUGE:0:U, rate_max:GAUGE:0:U, check_status:GAUGE:0:U, check_code:GAUGE:0:U, check_duration:GAUGE:0:U, hanafail:GAUGE:0:U
hash_collisions         value:DERIVE:0:U
http_request_methods    value:DERIVE:0:U
http_requests           value:DERIVE:0:U
http_response_codes     value:DERIVE:0:U
humidity                value:GAUGE:0:100
if_collisions           value:DERIVE:0:U
if_dropped              rx:DERIVE:0:U, tx:DERIVE:0:U
if_errors               rx:DERIVE:0:U, tx:DERIVE:0:U
if_multicast            value:DERIVE:0:U
if_octets               rx:DERIVE:0:U, tx:DERIVE:0:U
if_packets              rx:DERIVE:0:U, tx:DERIVE:0:U
if_rx_errors            value:DERIVE:0:U
if_rx_octets            value:DERIVE:0:U
if_tx_errors            value:DERIVE:0:U
if_tx_octets            value:DERIVE:0:U
invocations             value:DERIVE:0:U
io_octets               rx:DERIVE:0:U, tx:DERIVE:0:U
io_packets              rx:DERIVE:0:U, tx:DERIVE:0:U
ipt_bytes               value:DERIVE:0:U
ipt_packets             value:DERIVE:0:U
irq                     value:DERIVE:0:U
latency                 value:GAUGE:0:U
links                   value:GAUGE:0:U
load                    shortterm:GAUGE:0:5000, midterm:GAUGE:0:5000, longterm:GAUGE:0:5000
md_disks                value:GAUGE:0:U
memcached_command       value:DERIVE:0:U
memcached_connections   value:GAUGE:0:U
memcached_items         value:GAUGE:0:U
memcached_octets        rx:DERIVE:0:U, tx:DERIVE:0:U
memcached_ops           value:DERIVE:0:U
memory                  value:GAUGE:0:281474976710656
memory_lua              value:GAUGE:0:281474976710656
multimeter              value:GAUGE:U:U
mutex_operations        value:DERIVE:0:U
mysql_commands          value:DERIVE:0:U
mysql_handler           value:DERIVE:0:U
mysql_locks             value:DERIVE:0:U
mysql_log_position      value:DERIVE:0:U
mysql_octets            rx:DERIVE:0:U, tx:DERIVE:0:U
mysql_bpool_pages       value:GAUGE:0:U
mysql_bpool_bytes       value:GAUGE:0:U
mysql_bpool_counters    value:DERIVE:0:U
mysql_innodb_data       value:DERIVE:0:U
mysql_innodb_dblwr      value:DERIVE:0:U
mysql_innodb_log        value:DERIVE:0:U
mysql_innodb_pages      value:DERIVE:0:U
mysql_innodb_row_lock   value:DERIVE:0:U
mysql_innodb_rows       value:DERIVE:0:U
mysql_select            value:DERIVE:0:U
mysql_sort              value:DERIVE:0:U
nfs_procedure           value:DERIVE:0:U
nginx_connections       value:GAUGE:0:U
nginx_requests          value:DERIVE:0:U
node_octets             rx:DERIVE:0:U, tx:DERIVE:0:U
node_rssi               value:GAUGE:0:255
node_stat               value:DERIVE:0:U
node_tx_rate            value:GAUGE:0:127
objects                 value:GAUGE:0:U
operations              value:DERIVE:0:U
packets                 value:DERIVE:0:U
pending_operations      value:GAUGE:0:U
percent                 value:GAUGE:0:100.1
percent_bytes           value:GAUGE:0:100.1
percent_inodes          value:GAUGE:0:100.1
pf_counters             value:DERIVE:0:U
pf_limits               value:DERIVE:0:U
pf_source               value:DERIVE:0:U
pf_states               value:GAUGE:0:U
pf_state                value:DERIVE:0:U
pg_blks                 value:DERIVE:0:U
pg_db_size              value:GAUGE:0:U
pg_n_tup_c              value:DERIVE:0:U
pg_n_tup_g              value:GAUGE:0:U
pg_numbackends          value:GAUGE:0:U
pg_scan                 value:DERIVE:0:U
pg_xact                 value:DERIVE:0:U
ping_droprate           value:GAUGE:0:100
ping_stddev             value:GAUGE:0:65535
ping                    value:GAUGE:0:65535
players                 value:GAUGE:0:1000000
power                   value:GAUGE:0:U
pressure                        value:GAUGE:0:U
protocol_counter        value:DERIVE:0:U
ps_code                 value:GAUGE:0:9223372036854775807
ps_count                processes:GAUGE:0:1000000, threads:GAUGE:0:1000000
ps_cputime              user:DERIVE:0:U, syst:DERIVE:0:U
ps_data                 value:GAUGE:0:9223372036854775807
ps_disk_octets          read:DERIVE:0:U, write:DERIVE:0:U
ps_disk_ops             read:DERIVE:0:U, write:DERIVE:0:U
ps_pagefaults           minflt:DERIVE:0:U, majflt:DERIVE:0:U
ps_rss                  value:GAUGE:0:9223372036854775807
ps_stacksize            value:GAUGE:0:9223372036854775807
ps_state                value:GAUGE:0:65535
ps_vm                   value:GAUGE:0:9223372036854775807
pubsub        value:GAUGE:0:U
queue_length            value:GAUGE:0:U
records                 value:GAUGE:0:U
requests                value:GAUGE:0:U
response_time           value:GAUGE:0:U
response_code           value:GAUGE:0:U
route_etx               value:GAUGE:0:U
route_metric            value:GAUGE:0:U
routes                  value:GAUGE:0:U
segments                value:GAUGE:0:65535
serial_octets           rx:DERIVE:0:U, tx:DERIVE:0:U
signal_noise            value:GAUGE:U:0
signal_power            value:GAUGE:U:0
signal_quality          value:GAUGE:0:U
smart_poweron           value:GAUGE:0:U
smart_powercycles       value:GAUGE:0:U
smart_badsectors        value:GAUGE:0:U
smart_temperature       value:GAUGE:-300:300
smart_attribute         current:GAUGE:0:255, worst:GAUGE:0:255, threshold:GAUGE:0:255, pretty:GAUGE:0:U
snr                     value:GAUGE:0:U
spam_check              value:GAUGE:0:U
spam_score              value:GAUGE:U:U
spl                     value:GAUGE:U:U
swap_io                 value:DERIVE:0:U
swap                    value:GAUGE:0:1099511627776
tcp_connections         value:GAUGE:0:4294967295
temperature             value:GAUGE:U:U
threads                 value:GAUGE:0:U
time_dispersion         value:GAUGE:-1000000:1000000
timeleft                value:GAUGE:0:U
time_offset             value:GAUGE:-1000000:1000000
total_bytes             value:DERIVE:0:U
total_connections       value:DERIVE:0:U
total_objects           value:DERIVE:0:U
total_operations        value:DERIVE:0:U
total_requests          value:DERIVE:0:U
total_sessions          value:DERIVE:0:U
total_threads           value:DERIVE:0:U
total_time_in_ms        value:DERIVE:0:U
total_values            value:DERIVE:0:U
uptime                  value:GAUGE:0:4294967295
users                   value:GAUGE:0:65535
vcl                     value:GAUGE:0:65535
vcpu                    value:GAUGE:0:U
virt_cpu_total          value:DERIVE:0:U
virt_vcpu               value:DERIVE:0:U
vmpage_action           value:DERIVE:0:U
vmpage_faults           minflt:DERIVE:0:U, majflt:DERIVE:0:U
vmpage_io               in:DERIVE:0:U, out:DERIVE:0:U
vmpage_number           value:GAUGE:0:4294967295
volatile_changes        value:GAUGE:0:U
voltage_threshold       value:GAUGE:U:U, threshold:GAUGE:U:U
voltage                 value:GAUGE:U:U
vs_memory               value:GAUGE:0:9223372036854775807
vs_processes            value:GAUGE:0:65535
vs_threads              value:GAUGE:0:65535

#
# Legacy types
# (required for the v5 upgrade target)
#
arc_counts              demand_data:COUNTER:0:U, demand_metadata:COUNTER:0:U, prefetch_data:COUNTER:0:U, prefetch_metadata:COUNTER:0:U
arc_l2_bytes            read:COUNTER:0:U, write:COUNTER:0:U
arc_l2_size             value:GAUGE:0:U
arc_ratio               value:GAUGE:0:U
arc_size                current:GAUGE:0:U, target:GAUGE:0:U, minlimit:GAUGE:0:U, maxlimit:GAUGE:0:U
mysql_qcache            hits:COUNTER:0:U, inserts:COUNTER:0:U, not_cached:COUNTER:0:U, lowmem_prunes:COUNTER:0:U, queries_in_cache:GAUGE:0:U
mysql_threads           running:GAUGE:0:U, connected:GAUGE:0:U, cached:GAUGE:0:U, created:COUNTER:0:U


 Plugin

 Ya que tenemos el script y el archivo de tipos de métricas con la información necesaria, ahora implementamos el plugin que ejecutará el agente de Collectd.

LoadPlugin exec

  Exec "haproxy:haproxy" "/opt/collectd/etc/collect.d/haproxy_stats.sh" "-s" "/run/haproxy/admin.sock" "-h" "myhost" "-p" "10"



Con esto, ya es posible visualizar cada una de las métricas desde el graficador que utilicen. Los archivos que modifiqué los pueden obtener de mi cuenta de Github.





Comentarios

Entradas populares de este blog

Análisis de conexiones TIME_WAIT

Agregar un usuario a un grupo secundario

Desencriptar passwords AES y DES en WebLogic 10