使用 elasticsearch+logstash 存储获取实时日志【cdn realtime analytics】

安装的我就不写了。

主要说下方案

nginx 实时吐日志给syslog-ng via pipe
syslog-ng 向logstash 推送日志 via internet udp
logstash 把日志塞进elasticsearch 并index

发送方:
nginx.conf

# ...
log_format real_time '- $time_iso8601 $host $request_time $status $bytes_sent';
server {
        listen 80;
        server_name my_test_rt;
        access_log /dev/realtime.pipe real_time;
        location /{
                proxy_pass http://backend.com;
        }
    }
# ...

syslog-ng.conf

source s_pipe {
	pipe("/dev/realtime.pipe"); };

destination d_udp { udp("127.0.0.1" port(9999) template ("$MSG\n") ); };

log {source(s_pipe); destination(d_udp); };
#创建一个管道:
makefifo /dev/realtime.pipe

#先启动syslog-ng
#不然nginx启动时会卡住
service syslog-ng start

service nginx start

接收方:
/etc/logstash/conf.d/rt.conf

input {
	udp {
		port =>9999
	}
}

filter {
  grok {
    pattern => ["%{TIMESTAMP_ISO8601:timestamp} %{IPORHOST:host} %{IPORHOST:domain} %{NUMBER:request_time} %{NUMBER:status} %{NUMBER:bytes_sent}" ]
  }
  mutate {
    remove_field => [ "message", "@version" ]
  }
}
output {
	elasticsearch {
		host => "127.0.0.1"
		flush_size => 1
		index => "rt-%{+YYYY.MM.dd.HH.mm}"
	}
}

把logstash 和 elasticsearch 都启动 。整个体系就运转起来了

Share

elasticsearch+logstash+kibana 初探

花了一点时间搭了个初步的测试环境,分析的apache日志。
鉴于网络上的资料都比较过时了,所以在这里log一下。

测试环境 centos6.3 64bit

/********************************************************/
安装:

elasticsearch [goto]

<pre>;Download and install the Public Signing Key
rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch

#Add the following in your /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-1.1]
name=Elasticsearch repository for 1.1.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.1/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

#Install
yum install elasticsearch

logstash [goto]

#Add the key
rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch

#Add the following in your /etc/yum.repos.d/logstash.repo
[logstash-1.4]
name=logstash repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/logstash/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

#Install logstash with:
yum install logstash

kibana [download]
这个不用安装,解压然后放在httpd服务的目录里可以直接用是个纯html5应用(就是个网站),而且装在本机不需要配置,如果elasticsearch不在本机请编辑目录下的config.js指定url

/********************************************************/
配置:

elasticsearch 不需要配置,直接run起

/etc/init.d/elasticsearch start

P.S. 这个软件很奇葩,默认装的路径在 /usr/share 下。

logstash 配置文件默认是没有的,配置目录在 /etc/logstash/conf.d/
比如我在此目录下创建了一个配置文件

#/etc/logstash/conf.d/seven.conf
input {
  file {
    path => "/var/log/httpd/access_json.log"
    type => "apache"

    # This format tells logstash to expect 'logstash' json events from the file.
    format => json_event
  }
}

output {
elasticsearch {
    host => "127.0.0.1"
}
}

解释一下,input 这里设置的apache日志格式是个json格式,这就意味着apache的日志要进行改造,这个方式比用redis,grok等方案更简单,apache的配置见后文。

重点注意:

#/etc/init.d/logstash
...
name=logstash
pidfile="/var/run/$name.pid"

#请把原来用户和用户组logstash改成root,不然没有权限读apache日志
LS_USER=root
LS_GROUP=root
LS_HOME=/var/lib/logstash
LS_HEAP_SIZE="500m"
...

kibana 这个也不用配置,直接可以跑

apache [goto]

# Create a log format called 'logstash_json' that emits, in json, the parts of an http
# request I care about. For more details on the features of the 'LogFormat'
# directive, see the apache docs:
# http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats
LogFormat "{ \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \"@fields\": { \"client\": \"%a\", \"duration_usec\": %D, \"status\": %s, \"request\": \"%U%q\", \"method\": \"%m\", \"referrer\": \"%{Referer}i\" } }" logstash_json

LogFormat "{ \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \"@message\": \"%r\", \"@fields\": { \"user-agent\": \"%{User-agent}i\", \"client\": \"%a\", \"duration_usec\": %D, \"duration_sec\": %T, \"status\": %s, \"request_path\": \"%U\", \"request\": \"%U%q\", \"method\": \"%m\", \"referrer\": \"%{Referer}i\" } }" logstash_ext_json

# Write our 'logstash_json' logs to logs/access_json.log
CustomLog logs/access_json.log logstash_ext_json

提供的cookbook[goto]里还有让apache同时支持传统raw data和json日志的方法,我没试过。

kibana

Share

Whoosh 开始

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.

whoosh 是一个纯python实现的快速,多功能的全文索引和搜索库。

Programmers can use it to easily add search functionality to their applications and websites.

程序员可以轻松的给应用和网站添加搜索功能。

Every part of how Whoosh works can be extended or replaced to meet your needs exactly.

whoosh的每个部分都可以扩展或者替换以迎合你的需求。

Some of Whoosh’s features include:

  • Pythonic API.
  • Pure-Python. No compilation or binary packages needed, no mysterious crashes.
  • Fielded indexing and search.
  • Fast indexing and retrieval — faster than any other pure-Python search solution I know of.
  • Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
  • Powerful query language.
  • Production-quality pure Python spell-checker (as far as I know, the only one).

whoosh的一些功能包括:

  • python化的API.
  • 纯Python. 不需要编译或者二进制包,没有神秘的崩溃.
  • 派出索引和搜索
  • 快速索引和取回 — 比我知道的任何一个纯python搜索方案都快
  • 可插拔的评分算法 (包括 BM25F), 文本分析, 存储, 发帖格式, 等.
  • 强大的查询语句.
  • 产品级的纯python拼写检查 (我所知的最快的,没有之一).

 

 

Share