ELK Stack

ELK stands for:

Elasticsearch - Stores all of the logs Logstash - processes incoming logs Kibana - web interface for searching and visualizing logs, which are proxied through Nginx

Filebeat - installed on client servers that will send their logs to Logstash. Filebeat is good for smaller log collection. Other options for log shippers include Fluentd

Installation

I installed via apt-get so I’m running these as systemctl, e.g.

sudo systemctl start kibana
sudo systemctl start elasticsearch
sudo systemctl start nginx

If running locally, make sure to set vm_max_map_count to at least:

sudo sysctl -w vm.max_map_count=262144

Configs

Elasticsearch is configured through /etc/elasticsearch/elasticsearch.yml Kibana is configured through /etc/kibana/kibana.yml or /opt/kibana/config/kibana.yml

Elasticsearch Config

Setup your /etc/elasticsearch/elasticsearch.yml file so that it’s

network.host: localhost

Elasticsearch’s configuration is in elasticsearch/config/elasticsearch.yml

Kibana Config

Setup your /opt/kibana/config/kibana.yml file with

server.host: "localhost"

Nginx Config

For nginx, also install apache2-utils

sudo apt-get install nginx apache2-utils

# Create an admin user e.g. ('kibanaadmin', but use another name)
sudo htpasswd -c /etc/nginx/htpasswd.users kibanaadmin

Logstash Config

The Logstash configuration is stored in logstash/config/logstash.yml

Tutorials

Kibana’s Getting Started - https://www.elastic.co/guide/en/kibana/current/getting-started.html

Kibana Example Dataset

Shakespeare Data

{
    "line_id": INT,
    "play_name": "String",
    "speech_number": INT,
    "line_number": "String",
    "speaker": "String",
    "text_entry": "String",
}

Accounts Data

{
    "account_number": INT,
    "balance": INT,
    "firstname": "String",
    "lastname": "String",
    "age": INT,
    "gender": "M or F",
    "address": "String",
    "employer": "String",
    "email": "String",
    "city": "String",
    "state": "String"
}

Schema for Logs Data

{
    "memory": INT,
    "geo.coordinates": "geo_point"
    "@timestamp": "date"
}

Kibana Mappings

Setup mapping for the datasets

Shakespeare Mapping

curl -XPUT 'localhost:9200/shakespeare?pretty' -H 'Content-Type: application/json' -d'
{
 "mappings": {
  "doc": {
   "properties": {
    "speaker": {"type": "keyword"},
    "play_name": {"type": "keyword"},
    "line_id": {"type": "integer"},
    "speech_number": {"type": "integer"}
   }
  }
 }
}
'

Accounts Data doesn’t require Mapping

Log Mappings

curl -XPUT 'localhost:9200/logstash-2015.05.18?pretty' -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
'

Load Data Sets

Load data sets into Elasticsearch using the Elasticsearch API:

curl -H ‘Content-Type: application/x-ndjson’ -XPOST ‘localhost:9200/bank/account/_bulk?pretty’ –data-binary @accounts.json curl -H ‘Content-Type: application/x-ndjson’ -XPOST ‘localhost:9200/shakespeare/doc/_bulk?pretty’ –data-binary @shakespeare_6.0.json curl -H ‘Content-Type: application/x-ndjson’ -XPOST ‘localhost:9200/_bulk?pretty’ –data-binary @logs.jsonl

Let’s see it work

So remember that we’re trying to see the following:

Docker containers send logs using a log shipper like Filebeat or Fluentd. For our example, we’ll send some logs through the command line
Logstash receives the logs and processes them
Elasticsearch searches logs
Kibana visualizes the logs

Try it with the docker-compose setup below

Docker

So far the best docker-elk setup I’ve seen is here:

https://github.com/deviantony/docker-elk

Ports

You’ll see this on your localhost.

5601 shows Kibana 9300 Elasticsearch TCP transport 9200 Elasticsearch HTTP 5000 Logstash TCP input

Test out the ports

Elasticsearch

curl localhost:9200

+will@xps ~ $ curl localhost:9200
{
  "name" : "vCF3oXg",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "q49MoAYWRzSTgpnV2VU2rw",
  "version" : {
    "number" : "6.2.2",
    "build_hash" : "10b1edd",
    "build_date" : "2018-02-16T19:01:30.685723Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Injecting Log Entries

First we need to inject some log entries.

$ nc localhost 5000 < /path/to/logfile.log

Setting up Kibana Index Pattern

After getting some logs, we’ll create an index pattern via the Kibana API:

$ curl -XPOST -D- 'http://localhost:5601/api/saved_objects/index-pattern' \
    -H 'Content-Type: application/json' \
    -H 'kbn-version: 6.2.2' \
    -d '{"attributes":{"title":"logstash-*","timeFieldName":"@timestamp"}}'

Logging Formats

There are a lot of different logging formats, including:

Syslog - standard solution for UNIX logging
Journald – structured logging

Syslog

Syslog is the standard solution for logging on UNIX.

In a normal configuration, log messages are written to plain text files. It’s simple, but there is a lack of structure working with plain text files. If there are a lot of unrelated information in one file, it’s hard to filter through to the topics you want. If you want to split up files by pre-defined topics, then you end up with many smaller files and no easy way to correlate information between files.

Advantages:

Plain text, can use UNIX tools

Disadvantages:

Syslog protocol does not have a way to separate messages by application-defined targets
Trouble with log files that write messages terminated by a newline
Requires log rotation to prevent files becoming too large

Journald

Journald replaces text files with a more structured format while retaining full syslog compability (e.g. forwarding plain-text versions of messages to an existing syslog).

Advantages:

Structured data, allows easier for sorting and filtering
Allows multiple fields and multi-line text

Disadvantages:

Structured file format does not work well with standard UNIX tools (try journalctl)

General Logging Tips

Go to https://tools.ietf.org/html/rfc5424 for the RFC for Structured Logging. This defines common stuff all logs should have, including: hostname, app name, datetime, message, priority

Consider structured logging to have more multiline message support (e.g. stack traces)
Version parsing logs
Log with a common UUID
Production should not show DEBUG logs or have a separate file for DEBUG logging

Search Query Syntax

In the Kibana query bar, you can search the index pattern using one of:

Lucene query syntax
full JSON-based Elasticsearch Query DSL
Kuery, an experimental query language specific to Kibana

By default, it uses either Lucene or Elasticsearch

Lucene Query Syntax

Search for the word ‘foo’ in the ‘title’ field

title:foo

Search for phrase ‘foo bar’ in the ‘title’ field

title:"foo bar"

Search for phrase ‘foo bar’ in the ‘title’ field AND the phrase ‘quick fox’ in the ‘body’ field

title:"foo bar" AND body: "quick fox"

Elasticsearch Index Pattern

Each set of data loaded to Elasticsearch has an index pattern. An index pattern is a string with optional wildcards that can match multiple indices.

An example index name contains the YYYY.MM.DD format and an index pattern for May might look like logstash-2018.05*.