How to collect Hadoop metrics with Chukwa (Part I)

Chukwa is an Apache Project which is built on top of the HDFS and MapReduce framework. According to the Chukwa site, it is an open source data collection system for monitoring of distributed systems and more specifically Hadoop clusters.In this post (which is the first of two), we’ll install and configure Chukwa in a standalone scheme over the HDFS and then on HBase. So before continuing with this post, it is highly recommended read the Chukwa Architecture.

For this, we’re going to use:

  • Hadoop 1.0.4
  • HBase 0.94.1
  • Chukwa 0.5.0
  • Java SE 1.6 update 37 o superior

Next, it’s presented a description of Chwukwa components:

  • Agents are process that run in each computer and emit data to Collectors. Data emitted is generated by means of adapters. Adapters generally wrap some other data source, such as a file or a Unix command-line tool from which the information is extracted.
  • Collectors receive data from the agent and write it to stable storage. According to Chukwa site, rather than have each adaptor write directly to HDFS, data is sent across the network to a collector process, that does the HDFS writes. Each collector receives data from up to several hundred hosts.
  • ETL Processes for parsing and archiving the data. Collectors can write data directly to HBase or sequence files in HDFS.  Chukwa has a toolbox of MapReduce jobs for organizing and processing incoming data. These jobs come in two kinds, Archiving and Demux.
  • Data Analytics Scripts for aggregate Hadoop cluster health. These scripts provide visualization and interpretation of health of Hadoop cluster.
  • HICC, the Hadoop Infrastructure Care Center; a web-portal style interface for displaying data. Data is fetched from HBase, which in turn is populated by collector or data analytic scripts that runs on the collected data, after Demux.

This post is based on the Chukwa Administration Guide and they are included a few comments as configuration examples and compatibility options. In addition, it’s important to mention that we are working with Hadoop 1.0.4 and Chukwa 0.5.0 versions which can be download at this link. Thus, in this first part we will describe the basic setup of Chukwa, which is made up of three components: agents, collectors and ETL process.

A. Agent configuration

  1. Obtain a copy of Chukwa. You can find the latest release on the Chukwa release page.
  2. Un-tar the release, via tar xzf.
  3. We refer to the directory containing Chukwa as CHUKWA_HOME. It may be helpful to set CHUKWA_HOME explicitly in your environment, but Chukwa does not require that you do so.
  4. Make sure that JAVA_HOME is set correctly and points to a Java 1.6 JRE. It’s generally best to set this in etc/chukwa/chukwa-env.sh.
  5. In etc/chukwa/chukwa-env.sh, set CHUKWA_LOG_DIR and CHUKWA_PID_DIR to the directories where Chukwa should store its console logs and pid files. The pid directory must not be shared between different Chukwa instances: it should be local, not NFS-mounted.
  6. Optionally, set CHUKWA_IDENT_STRING. This string is used to name Chukwa’s own console log files. Next, it is presented a file example.
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_37

# Optional
# The location of HBase Configuration directory. For writing data to
# HBase, you need to set environment variable HBASE_CONF to HBase conf
# directory.
export HBASE_CONF_DIR="${HBASE_CONF_DIR}";

# Hadoop Configuration directory
export HADOOP_CONF_DIR="/usr/local/hadoop-1.0.4/conf";

# The location of chukwa data repository (in either HDFS or your local
# file system, whichever you are using)
export chukwaRecordsRepository="/chukwa/repos/"

# The directory where pid files are stored. CHUKWA_HOME/var/run by default.
export CHUKWA_PID_DIR=/tmp/chukwa/pidDir

# The location of chukwa logs, defaults to CHUKWA_HOME/logs
export CHUKWA_LOG_DIR=/tmp/chukwa/log

# The location to store chukwa data, defaults to CHUKWA_HOME/data
#export CHUKWA_DATA_DIR="{CHUKWA_HOME}/data"

# Instance name for chukwa deployment
export CHUKWA_IDENT_STRING=$USER
export JAVA_PLATFORM=Linux-i386-32
export JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}

# Datatbase driver name for storing Chukwa Data.
export JDBC_DRIVER=${TODO_CHUKWA_JDBC_DRIVER}

# Database URL prefix for Database Loader.
export JDBC_URL_PREFIX=${TODO_CHUKWA_JDBC_URL_PREFIX}

# HICC Jetty Server heap memory settings
# Specify min and max size of heap to JVM, e.g. 300M
export CHUKWA_HICC_MIN_MEM=
export CHUKWA_HICC_MAX_MEM=

# HICC Jetty Server port, defaults to 4080
#export CHUKWA_HICC_PORT=
export CLASSPATH=${CLASSPATH}:${HBASE_CONF_DIR}:${HADOOP_CONF_DIR}

Note. It is important to mention that in this first part we are NOT going to work with HBase as repository of data collected, instead we are going to store it into the HDFS.

  1. Agents sends data collected to a random collector from a list of collectors. So, it’s necessary to indicate what the collector list is. The collector list is specified in the $CHUKWA_HOME/etc/chukwa/collectors file, so the file should look something like:

http://collector1HostName:collector1Port/


http://collector2HostName:collector2Port/


http://collector3HostName:collector3Port/

Our collectors file only contains localhost, example:

localhost
  1. Other file that should be modified is $CHUKWA_HOME/etc/chukwa/chukwa-agent-conf.xml. The most important value to modify is the cluster/group name which identifies the monitored source nodes. This value is stored in each Chunk of collected data and it can be used to distinguish data coming from different clusters. Our chukwa-agent-conf.xml looks like:
  
  
    chukwaAgent.tags
    cluster="chukwa"
    The cluster's name for this agent
  
  
  
    chukwaAgent.control.port
    9093
    The socket port number the agent's control interface can be contacted at.
  

  
    chukwaAgent.hostname
    localhost
    The hostname of the agent on this node. Usually localhost, this is used by the chukwa instrumentation agent-control interface library
  

  
    chukwaAgent.checkpoint.name
    chukwa_agent_checkpoint
    the prefix to to prepend to the agent's checkpoint file(s)
  
  
  
    chukwaAgent.checkpoint.dir
    ${CHUKWA_LOG_DIR}/
    the location to put the agent's checkpoint file(s)
  

  
    chukwaAgent.checkpoint.interval
    5000
    the frequency interval for the agent to do checkpoints, in milliseconds
  

  
    chukwaAgent.sender.fastRetries
    4
    the number of post attempts to make to a single collector, before marking it failed
  

  
    chukwaAgent.collector.retries
    144000
    the number of attempts to find a working collector
  

  
    chukwaAgent.collector.retryInterval
    20000
    the number of milliseconds to wait between searches for a collector
  

  
    syslog.adaptor.port.9095.facility.LOCAL1
    HADOOP
  

Note. It’s important to comment that it’t necessary to open the 9093, 9095 and 9097 ports  in our firewall to be able to connect to the agent.

  1. Configuring Hadoop for monitoring. One of the key goals for Chukwa is to collect logs from Hadoop clusters. The Hadoop configuration files are located in HADOOP_HOME/etc/hadoop. To setup Chukwa to collect logs from Hadoop, we  need to change some of the Hadoop configuration files.
  • Copy CHUKWA_HOME/etc/chukwa/hadoop-log4j.properties file to HADOOP_CONF_DIR/log4j.properties
  • Copy CHUKWA_HOME/etc/chukwa/hadoop-metrics2.properties file to HADOOP_CONF_DIR/hadoop-metrics2.properties
  • Edit HADOOP_HOME/etc/hadoop/log4.properties file and change “hadoop.log.dir” to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)

Note. To avoid the following error; log4j:ERROR Could not instantiate class [org.apache.hadoop.chukwa.inputtools.log4j.ChukwaDailyRollingFileAppender], we should copy the chukwa-client-xx.jar and json-simple-xx.jar into hadoop/lib directory. These files should be available on all hadoop nodes.

  1. Collector configuration. Since we are going to use HDFS for data storage in this tutorial, we must disable the HBase options and work only with the HDFS configuration parameters like writer.hdfs.filesystem. This should be set to the HDFS root URL on which Chukwa will store data. Next it’s presented an example of the chukwa-collector-conf.xml file.

  
    chukwaCollector.writerClass
    org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter
  
  
  
    chukwaCollector.pipelineorg.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter

  
  
    chukwaCollector.localOutputDir
    /tmp/chukwa/dataSink/
    Chukwa local data sink directory, see LocalWriter.java
  

  
    chukwaCollector.writerClass
    org.apache.hadoop.chukwa.datacollection.writer.localfs.LocalWriter
    Local chukwa writer, see LocalWriter.java
  
  

  
  
  

  
    writer.hdfs.filesystem
    hdfs://localhost:9000
    HDFS to dump to
  
  
  
    chukwaCollector.outputDir
    /chukwa/logs/
    Chukwa data sink directory
  

  
    chukwaCollector.rotateInterval
    300000
    Chukwa rotate interval (ms)
  

  
    chukwaCollector.isFixedTimeRotatorScheme
    false
    A flag to indicate that the collector should close at a fixed
    offset after every rotateInterval. The default value is false which uses
    the default scheme where collectors close after regular rotateIntervals.
    If set to true then specify chukwaCollector.fixedTimeIntervalOffset value.
    e.g., if isFixedTimeRotatorScheme is true and fixedTimeIntervalOffset is
    set to 10000 and rotateInterval is set to 300000, then the collector will
    close its files at 10 seconds past the 5 minute mark, if
    isFixedTimeRotatorScheme is false, collectors will rotate approximately
    once every 5 minutes
    
  

  
    chukwaCollector.fixedTimeIntervalOffset
    30000
    Chukwa fixed time interval offset value (ms)
  

  
    chukwaCollector.http.port
    8080
    The HTTP port number the collector will listen on
  

Note. Chukwa 0.5.0 includes the Hadoop libraries hadoop-core-1.0.0.jar and hadoop-test-1.0.0.jar to comunicate to IPC Server version 4. So it’s necessary to replace the above libreries with the hadoop-core-1.0.4.jar and hadoop-test-1.0.4.jar files located in the chukwa-0.5.0/share/chukwa/lib directory.

  1. Once modified our configuration files, we can start the services and collect data from chukwa. For this, we are going to start first the agent and then the collector as follow:
[bautista@Zen-UnderLinx chukwa-0.5.0]$ bin/chukwa agent
OK chukwaAgent.checkpoint.dir [File] = /tmp/chukwa/log/
OK chukwaAgent.checkpoint.interval [Time] = 5000
WARN: option chukwaAgent.collector.retries may not exist; val = 144000
Guesses:
chukwaAgent.connector.retryRate Time
chukwaAgent.sender.retries Integral
chukwaAgent.control.remote Boolean
WARN: option chukwaAgent.collector.retryInterval may not exist; val = 20000
Guesses:
chukwaAgent.sender.retryInterval Integral
chukwaAgent.connector.retryRate Time
chukwaCollector.rotateInterval Time
OK chukwaAgent.control.port [Portno] = 9093
WARN: option chukwaAgent.hostname may not exist; val = localhost
Guesses:
chukwaAgent.control.remote Boolean
chukwaAgent.checkpoint.enabled Boolean
chukwaAgent.sender.retries Integral
OK chukwaAgent.sender.fastRetries [Integral] = 4
WARN: option syslog.adaptor.port.9095.facility.LOCAL1 may not exist; val = HADOOP
Guesses:
adaptor.dirscan.intervalMs Integral
adaptor.memBufWrapper.size Integral
chukwaAgent.adaptor.context.switch.time Time
No checker rules for: chukwaAgent.checkpoint.name chukwaAgent.tags
[bautista@Zen-UnderLinx chukwa-0.5.0]$
  1. Next we start the Hadoop services, like this:
[bautista@Zen-UnderLinx hadoop-1.0.4]$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.4/bin/../logs/hadoop-bautista-namenode-Zen-UnderLinx.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.4/bin/../logs/hadoop-bautista-datanode-Zen-UnderLinx.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/bin/../logs/hadoop-bautista-secondarynamenode-Zen-UnderLinx.out
starting jobtracker, logging to /usr/local/hadoop-1.0.4/bin/../logs/hadoop-bautista-jobtracker-Zen-UnderLinx.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/bin/../logs/hadoop-bautista-tasktracker-Zen-UnderLinx.out
[bautista@Zen-UnderLinx hadoop-1.0.4]$
  1. Finally, we start the collector with the following command:
[bautista@Zen-UnderLinx chukwa-0.5.0]$ bin/chukwa collector
[bautista@Zen-UnderLinx chukwa-0.5.0]$ WARN: option chukwa.data.dir may not exist; val = /chukwa
Guesses:
chukwaRootDir null
fs.default.name URI
nullWriter.dataRate Time
WARN: option chukwa.tmp.data.dir may not exist; val = /chukwa/temp
Guesses:
chukwaRootDir null
nullWriter.dataRate Time
chukwaCollector.tee.port Integral
WARN: option chukwaCollector.fixedTimeIntervalOffset may not exist; val = 30000
Guesses:
chukwaCollector.minPercentFreeDisk Integral
chukwaCollector.tee.keepalive Boolean
chukwaCollector.http.threads Integral
OK chukwaCollector.http.port [Integral] = 8080
WARN: option chukwaCollector.isFixedTimeRotatorScheme may not exist; val = false
Guesses:
chukwaCollector.writeChunkRetries Integral
chukwaCollector.showLogs.enabled Boolean
chukwaCollector.minPercentFreeDisk Integral
OK chukwaCollector.localOutputDir [File] = /tmp/chukwa/dataSink/
OK chukwaCollector.pipeline [ClassName list] = org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter
OK chukwaCollector.rotateInterval [Time] = 300000
OK chukwaCollector.writerClass [ClassName] = org.apache.hadoop.chukwa.datacollection.writer.localfs.LocalWriter
OK writer.hdfs.filesystem [URI] = hdfs://localhost:9000
No checker rules for: chukwaCollector.outputDir
started Chukwa http collector on port 8080

[bautista@Zen-UnderLinx chukwa-0.5.0]$
  1. In a few minutes, we will see that chuckwa has collected some dataSinkArchives files which include different metrics like the next screenshots:

In the next post, we are going to modify our configuration files to storage the collected metrics into HBase.