Article

mar2007.tar

Managing Log Files with Splunk

Matthew Sacks

As a systems or network administrator, you probably have to manage and process log files for the various systems that are present on your networks. In a mid-sized data center, you may have 100-200 nodes all dumping large amounts of sometimes repetitive, useless data into abyss-like log files. The log file is the primary source of information and the main starting point when complying with various security standards, during incident handling, and when troubleshooting errors with systems, network devices, and applications.

What if you could simply drop syslog, SNMP, and application log data over the network to a central server and process all the logs there? It certainly would simplify the process of scouring tedious log files in the context of location, but there needs to be something more to get a hold of these unwieldy logs that pervade every crevice of our networks. Database systems make things much more manageable when taking extremely large volumes of data and filtering a single term or event amongst all the log file chaos. One might relate it to finding a needle in a haystack. You may be used to going into each individual node and grepping or tailing log files until you find what you need, then performing the same procedure again to get the information that you need to troubleshoot a problem you are looking for. The process of searching through log files on hundreds of systems can be demanding and time consuming when equipped with only a vi editor and grep command.

Logging data over SNMP can be cumbersome, especially when you are dealing with hundreds of machines in a datacenter. Also, it is difficult to secure the log data if you store the logs on a local machine; therefore, it is better to send the data to a centralized log server over syslog-udp or SNMP. It can take hours upon hours to format a MySQL database and build a GUI front end that can store SNMP and syslog data.

Splunk Server

The free implementation of Splunk server allows you to dump all logs to a central server, pump the data into a database, and then search through it using common Boolean and regular expressions all wrapped up in a pretty Ajax user interface. I came across this product while setting out to design my own MySQL database where syslog-ng would capture network syslog dumps and insert them into my database. The task was cumbersome, to say the least, and let's just say that time doesn't come in abundance in our line of work.

System Requirements

To get the free implementation of Splunk server, go to splunk.com/download and download Splunk for your distribution. Splunk is supported on FreeBSD, Linux 2.4+ kernel, OS X Intel, OS X PPC, Solaris x86, and Solaris Sparc. In this article, I will cover installation on Red Hat Enterprise Linux 3, but you can use any of the above versions that you like. Please do not get Splunk free confused with the professional edition. It is free, but you should check splunk.com for more information about licensing and the differences between the free and professional versions.

Splunk recommends that you use a dual-processor server with 2.8 GHz clock speed and 5 GB of memory, and to allow for as much disk space as the raw data you are processing. Personally, I recommend that you use the highest-end server that you can afford, but depending on how much data you are processing, your hardware requirements will vary. The general rule here is that more is better, and when cost is not a factor, get the beefiest machine you can. Depending on how much data you are processing and how your Splunk server is configured, you should be able to get by on a single dual-core Intel processor with 2 GB of memory for a small implementation.

Installation

The installation of Splunk is very straightforward. In this example, we will do an rpm installation on a Red Hat ES3 Linux server.

Download Splunk installation for your distribution. The gzip archive will install on all versions of Linux. Read the instructions at http://www.splunk.com/doc/installation for installing using a tarball or Debian package:

rpm -i --force --prefix=/opt/splunk2.1 splunk-2.1-0.i386.rpm

Set the $SPLUNK_HOME environment variable in your .bash_profile, shell profile, or /etc/profile directory. I use bash shell, so I will add the $SPLUNK_HOME environment variable in my .bash_profile file.

Type SPLUNK_HOME=/opt/splunk2.1 at the command prompt to set the environment variable.
Type export SPLUNK_HOME to export the environment variable.
Type $SPLUNK_HOME/bin/splunk start to fire up the Splunk daemon and Web server.

Now you should have a basic implementation of the free version of Splunk server running. You can test the installation by pointing your Web browser to http://your-machine-ip:8000 (see Figure 1).

Splunk Data Input

Splunk has a number of ways in which it can index data. To index and mine logs into the database over the network, we must configure the servers to export the logs over NFS, but there are endless options for transmitting the data over the network into Splunk. You can configure NFS over SSL for enhanced security, but it is beyond the scope of this article and will impede performance. I will assume that NFS exports will be in a secure subnet.

The amount of data transported over NFS is minimal, even when logging 15 different log files over NFS, but if you were to log Apache access logs on a busy Web server over NFS, you may experience some network latency and create unnecessary congestion on your network, so be careful when logging large volumes of data over the network.

Network traffic should be considered, but how much data should be transferred and the appropriate network configurations for the amount of data you want to transfer will vary depending on the amount of data being indexed and the capabilities of your network. You should be fine transporting your logs over NFS on your production network, assuming it is not already congested.

Configuring Splunk to Accept Remote Syslog Data

To set up a data input:

Point your Web browser to http://localhost:8000.
Click on the Admin Tab in the top left-hand corner of the Web interface.
Click on the Data Inputs Tab.

Most network devices such as router, switches, and firewalls are configurable to log to Splunk over syslog, but you will need to consult the vendor's documentation to find out how to configure logging for these respective devices and which remote logging methods your device supports. In general, you will only be required to define a host to receive the syslog messages. To start processing log data, we will need to generate data to be processed by Splunk

You can configure the default syslog daemon to log to a remote host. The Linux /var/log/messages file contains most of the syslog data produced by the host, including remote logins, which is critical when doing an audit trail analysis after an intrusion or troubleshooting other various systems-related problems.

Edit your syslog to a remote host by editing the following line. Your /etc/syslog.conf file, which is used to configure syslog processing should look like the following:

*.info;mail.none;news.none;authpriv.none;cron.none  /var/log/messages

Modify it so that it looks like this:

*.info;mail.none;news.none;authpriv.none;cron.none   @Splunk-server

Splunk-server should be the hostname or IP address of your Splunk server. This will allow you to forward syslog messages to a remote host and, in this case, your Splunk server. Now any attacks on a given host are logged in a remote location, unbeknownst to a potential attacker, or while troubleshooting systems problems on a machine that is soon to fail.

Next, define a data input processor in your Splunk server for receiving this syslog data by pointing your browser to and logging in to http://splunk-server:8000.

Navigate to Admin->Data Inputs.
Click on the Network Ports Tab below Data Inputs.
Click on the Add Input Button.

For remote-syslog, choose the following configuration:

Protocol: UDP
Port #: 514
Accept connections from all hosts? Yes
Set source type: from list
Source type: linux_messages_syslog

Do some activities that will cause syslog data to be generated so you can verify whether Splunk is receiving the data over the network. If you do not see any data, you may want to do a packet capture using wireshark or ngrep on the destination host to verify that packets are being received. Try logging in to the machine over SSH. Now if you log in to Splunk, you should see some events that have been processed and logged by Splunk (see Figure 2).

Now you are ready to feed the remote syslog into Splunk. The safest way to add new data inputs to Splunk is to use the Splunk command line or Web user interface. For simplicity's sake, I will use the Web user interface in my example to configure data imports. Now you can add as many hosts as desired in this fashion.

SNMP and NFS Configuration

SNMP is a network management protocol that has been around since 1990 and is embedded in almost all modern network devices and operating systems for the purpose of managing and monitoring network-enabled devices using a standardized protocol. You can retrieve volumes of information using SNMP that you otherwise would not have and graph it over time using a tool such as MRTG.

For Splunk to index the SNMP data, you will have to configure the SNMP daemon to receive SNMP traps on your Splunk server. Configuring an SNMP infrastructure can be cumbersome, especially for someone who has not configured it before, but almost all network devices and operating systems have native support for SNMP. Configuration of the SNMP daemon will be covered briefly here, but for more information about SNMP I recommend the book Essential SNMP by Kevin J. Schmidt and Douglas R. Mauro.

I will cover a quick and dirty configuration of SNMP so that your Splunk server can start writing the SNMP traps it receives to a file, which in turn will be processed and searchable (without complex grepping or formatting of an SNMP trap history log). The beauty of SNMP is that once a host sends a trap to a remote server over the network, regardless of what a potential intruder attempts to do to clear the logs, the S.O.S. message has been bottled, corked, and tossed in the ocean.

When following the audit trail of an intrusion, this setup provides a very secure means by which to be alerted to the situation, as well as keeping the logs in a secured location. (Splunk also comes with nifty alerting options via e-mail as well so you can configure alerts based on your search terms instead of writing your own alerting script for each different trap.)

For a quick configuration of the snmp daemon, we will do the following. To configure a Linux host for sending SNMP traps, download net-snmp tools from:

http://net-snmp.sourceforge.net

This is open source software and free.

To compile the SNMP source, download net-snmp-5.3.1.tar.gz, then:

tar -xvzf net-snmp
cd net-snmp
./configure
make
make install

You will then be prompted with a series of questions provided the configuration script did not encounter any errors. Below are the answers to the configuration prompts that you will want to use for a default configuration for the purpose of getting the most basic SNMP configuration to send data to our Splunk server for this article. A full explanation and configuration of SNMP is beyond the scope of this article, but there is plenty of documentation on advanced configurations at: http://net-snmp.sourceforge.net.

Default Version of SNMP to Use (3): 2
System Contact Information (root@): matthew.sacks@gmail.com

System Location (Unkown): data center
Location to write logfile (/var/log/snmpd.log): /var/log/snmpd.log
Location to write persistent information (/var/net-snmp): /var/net-snmp

Now let's build the configuration:

shell# make
shell# make install

The minimal configuration necessary for configuring SNMP on the Linux host resides in /etc/snmp.conf. The values we need to define here are as follows:

defversion 1
defcommunity public
logtimestamp yes
printnumericoids yes

Note: You never want to use the community public in real life because it is the default SNMP community; it is shown here only for testing purposes. Please change your SNMP communities to a more obscure name when you have successfully configured your server.

Now start your snmp daemon. I use the following command-line parameters:

snmpd -a -A -c /etc/snmp/snmpd.conf -C -Lf /var/log/snmpd.log

The -c option defines where to read the configuration. The -Lf option defines where to write the log file. The -a option logs the source addresses of incoming requests. The -C option does the following: Do not read any configuration files except the ones optionally specified by the -c option. Note that this behavior also covers the persistent configuration files. This may result in dynamically assigned values being reset following an agent restart, unless the relevant persistent config files are explicitly loaded using the -c option.

Now you may simply add the /var/log/snmpd.log file as a data input for Splunk and all traps received by the snmp daemon on that host will now be processed by Splunk as well.

Configuration of NFS

The location of the NFS exports varies for each OS. Consult your documentation to locate the specific NFS configuration for your OS. Generally most operating systems have a similar configuration to the one I will describe and will work for almost all Linux variants.

Configuring a Linux Server for Exporting

Its Logs over NFS

In my case I just exported the entire /var/log directory over NFS and mounted it on my Splunk server. I exported the whole directory because if in the future I add an application that I would like to have logged, I can create a symbolic link in /var/log to the application's log directory, and it will be included in the NFS export.

Add the following entry to your /etc/exports directory configuration file on the host that will be exporting its logs. The /etc/exports file defines which directories and hosts can mount the NFS export.

#/etc/exports
/var/log ip-of-Splunk-server/255.255.255.0(ro,sync)

Now make sure the NFS server is running on the export host by typing the command /etc/init.d/nfs restart. Verify that the desired directory is being exported by typing the command showmount -e.

Next, mount the NFS export from the host to your Splunk server by editing the fstab file. The fstab file defines which directories are automatically mounted at boot time. The location and name of this tab varies on each operating system. On Solaris, the fstab is the vfstab, but its configuration is basically the same across the board. To do this, we will need to create a mount point for our NFS mounts:

mkdir /var/log/imports
mkdir /var/log/imports/host1

To keep the structure of imports organized, we will create a directory under imports for each different host:

#/etc/fstab
Splunk-server-ip:/var/log    /var/log/imports/host1    nfs ro  0 0

Type the command mount -a. You are now ready to mount the NFS export on your Splunk server. Log in to your Splunk server and configure your /etc/fstab (or /etc/vfstab on Sun Solaris) and then add the following entry to mount the NFS export. Type mount -a, which should mount all entries in the fstab. If you do not see any errors, then you most likely have done everything correctly and the NFS export is mounted. You can confirm this by typing the mount command, and then you should see your import mounted on the server. Verify the mount by changing your current directory to the mount directory and seeing if the files are present. You should see the /var/log directory of the remote machine.

Now we can define the data source to be processed by Splunk:

Navigate to your Splunk Web site http://splunk-server:8000.
Navigate to Admin->Data Inputs->Files and Directories.
Click on Add Input.

The tail data access will process data on the file as if you were running a tail -f command on the host itself. This method of inputting a file is the closest to real-time log processing and indexing into a database as you can get, but it does take more resources than some of the other methods. You may enter the path to a file or directory for the tailfile data input method. If you enter the path of a directory, all files within that directory will be processed.

For an explanation of all the different data access methods you can go to http://www.splunk.com/support/ and read the data access manual. I prefer to stick strictly to tailfile, but you can tinker and find your personal preferences. Set up your tailfile as follows (see Figure 3):

Data Access should be set to tailfile.
Full path on server: /var/log/imports/host1

The set host option defines how you want to process and display the hostname in Splunk. I will set it manually, but when you are dealing with thousands of servers, the regex option can come in handy.

Set host: constant value
DNS Name or IP Address: host1

Set sourcetype: manual
Source type: snmp

You can use tailfile to process any log file on the face of the planet. It is very useful in logging application server logs or database log files, and I suggest using the NFS/tailfile configuration for these types of logs.

Next we verify that data is being processed by Splunk by logging in to our Splunk server and seeing whether the files we configured for indexing are being indexed.

Using the Splunk Web Interface

Syslog and SNMP data is now being transmitted over the network to a central location where it is being mined into a database and is made human-readable and easily searchable by the Web user interface. We have the data, it is being processed, so now what do we do with it?

Searching through Splunk is fairly straightforward, and a full search command-line query reference is available on the Splunk support site. Searching through the data that Splunk has indexed is very similar to searching through a typical search engine. Splunk supports Boolean expressions for filtering and refining the data, so you can find a single event in an entire server farm. An example is given below for searching for searching for all RPC events that are not from the nfsd on a single host:

host::192.168.10.130 AND rpc NOT nfs

The Boolean expressions are case sensitive, so be careful. Splunk will also auto-complete the query as you are typing so that you may see all available information that has been processed by Splunk that is similar to your query. We can also specify a date range in addition to the query to narrow down a particular event to a given date and time, down to the second.

Conclusion

With this setup, you can import log data from multiple sources, network, database, and files, and funnel them into a database over the network. You can search through hundreds or thousands of log files in a matter of seconds. Splunk provides a way to get a handle on your log files and then some. The time saved searching through log files can now be dedicated to solving problems rather than finding them.

Matthew Sacks currently works as a systems engineer/network engineer at Reunion.com where he supports the datacenter and corporate office networks supporting production and corporate networks. You can reach Matthew at matthew.sacks@gmail.com or on his forum at systemsnetworkadmin.com.