在”hello World” 示例中,我们已经见到并介绍了Logstash 的运行流程和配置的基础语法。 请记住一个原则:Logstash 配置一定要有一个 input 和一个 output。在演示过程中,如果没有写明 input,默认就会使用 “hello world” 里我们已经演示过的 input/stdin ,同理,没有写明的 output 就是 output/stdout
如果有什么问题的话,请查看该文档:http://udn.yyuap.com/doc/logstash-best-practice-cn/input/index.html。以下是input插件的具体解释:
(1),标准输入。type和tags是logstash事件中特殊的字段。 type 用来标记事件类型 —— 我们肯定是提前能知道这个事件属于什么类型的。而 tags 则是在数据处理过程中,由具体的插件来添加或者删除的。
[root@localhost test]# vim stdin.conf input { stdin { add_field => {"key" => "value"} codec => "plain" tags => ["add"] type => "std-lqb" } } output { stdout { codec => rubydebug } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf Settings: Default pipeline workers: 1 Logstash startup completed hello world { "message" => "hello world", "@version" => "1", "@timestamp" => "2017-05-24T08:11:45.852Z", "type" => "std-lqb", "key" => "value", "tags" => [ [0] "add" ], "host" => "localhost.localdomain" } abclqb { "message" => "abclqb", "@version" => "1", "@timestamp" => "2017-05-24T08:13:21.192Z", "type" => "std-lqb", "key" => "value", "tags" => [ [0] "add" ], "host" => "localhost.localdomain" } #####对stdin进行修改,添加tags列 [root@localhost test]# vim stdin.conf input { stdin { add_field => {"key" => "value2222222222222222222222222222222222222222222 2"} codec => "plain" tags => ["add","xxyy","abc"] type => "std-lqb" } } output { stdout { codec => rubydebug } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf Settings: Default pipeline workers: 1 Logstash startup completed hello world { "message" => "hello world", "@version" => "1", "@timestamp" => "2017-05-24T09:07:43.228Z", "type" => "std-lqb", "key" => "value22222222222222222222222222222222222222222222", "tags" => [ [0] "add", [1] "xxyy", [2] "abc" ], "host" => "localhost.localdomain" } #########根据tags来进行判断: [root@localhost test]# vim stdin_2.conf input { stdin { add_field =>{"key11"=>"value22"} codec=>"plain" tags=>["add","xxyy"] type=>"std" } } output { if "tttt" in [tags]{ stdout { codec=>rubydebug{} } } else if "add" in [tags]{ stdout { codec=>json } } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin_2.con f Settings: Default pipeline workers: 1 Logstash startup completed yyxxx {"message":"yyxxx","@version":"1","@timestamp":"2017-05-24T09:32:25.840Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"} {"message":"","@version":"1","@timestamp":"2017-05-24T09:32:32.480Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}xxyy {"message":"xxyy","@version":"1","@timestamp":"2017-05-24T09:32:42.249Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}
(2).读取文件。Logstash 使用一个名叫 FileWatch 的 Ruby Gem 库来监听文件变化。这个库支持 glob 展开文件路径,而且会记录一个叫 .sincedb 的数据库文件来跟踪被监听的日志文件的当前读取位置。所以,不要担心 logstash 会漏过你的数据.
[root@localhost test]# cat log.conf input { file { path =>"/usr/local/nginx/logs/access.log" type=>"system" start_position =>"beginning" } } output { stdout { codec => rubydebug } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/log.conf Settings: Default pipeline workers: 1 Logstash startup completed { "message" => "192.168.181.231 - - [24/May/2017:15:04:29 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"", "@version" => "1", "@timestamp" => "2017-05-24T09:39:16.600Z", "path" => "/usr/local/nginx/logs/access.log", "host" => "localhost.localdomain", "type" => "system" } { "message" => "192.168.181.231 - - [24/May/2017:15:04:32 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"", "@version" => "1", "@timestamp" => "2017-05-24T09:39:16.614Z", "path" => "/usr/local/nginx/logs/access.log", "host" => "localhost.localdomain", "type" => "system" }
解释:
有一些比较有用的配置项,可以用来指定 FileWatch 库的行为:
-
discover_interval
logstash 每隔多久去检查一次被监听的 path
下是否有新文件。默认值是 15 秒。
-
exclude
不想被监听的文件可以排除出去,这里跟 path
一样支持 glob 展开。
-
sincedb_path
如果你不想用默认的 $HOME/.sincedb
(Windows 平台上在 C:\Windows\System32\config\systemprofile\.sincedb
),可以通过这个配置定义 sincedb 文件到其他位置。
-
sincedb_write_interval
logstash 每隔多久写一次 sincedb 文件,默认是 15 秒。
-
stat_interval
logstash 每隔多久检查一次被监听文件状态(是否有更新),默认是 1 秒。
-
start_position
logstash 从什么位置开始读取文件数据,默认是结束位置,也就是说 logstash 进程会以类似 tail -F
的形式运行。如果你是要导入原有数据,把这个设定改成 “beginning”,logstash 进程就从头开始读取,有点类似 cat
,但是读到最后一行不会终止,而是继续变成 tail -F
。
注意
-
通常你要导入原有数据进 Elasticsearch 的话,你还需要 filter/date 插件来修改默认的”@timestamp” 字段值。稍后会学习这方面的知识。
-
FileWatch 只支持文件的绝对路径,而且会不自动递归目录。所以有需要的话,请用数组方式都写明具体哪些文件。
-
LogStash::Inputs::File 只是在进程运行的注册阶段初始化一个 FileWatch 对象。所以它不能支持类似 fluentd 那样的
path => "/path/to/%{+yyyy/MM/dd/hh}.log"
写法。达到相同目的,你只能写成path => "/path/to/*/*/*/*.log"
。 -
start_position
仅在该文件从未被监听过的时候起作用。如果 sincedb 文件中已经有这个文件的 inode 记录了,那么 logstash 依然会从记录过的 pos 开始读取数据。所以重复测试的时候每回需要删除 sincedb 文件。 -
因为 windows 平台上没有 inode 的概念,Logstash 某些版本在 windows 平台上监听文件不是很靠谱。windows 平台上,推荐考虑使用 nxlog 作为收集端
(3).TCP输入。未来你可能会用 Redis 服务器或者其他的消息队列系统来作为 logstash broker 的角色。不过 Logstash 其实也有自己的 TCP/UDP 插件,在临时任务的时候,也算能用,尤其是测试环境。
[root@localhost test]# cat tcp.conf input { tcp { port =>8888 mode=>"server" ssl_enable =>false } } output { stdout { codec => rubydebug } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/tcp.conf Settings: Default pipeline workers: 1 Logstash startup completed { "message" => "GET /jenkins/ HTTP/1.1\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:53.980Z", "host" => "192.168.181.231", "port" => 59426 } { "message" => "Host: 192.168.180.9:8888\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:54.175Z", "host" => "192.168.181.231", "port" => 59426 } { "message" => "Connection: keep-alive\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:54.180Z", "host" => "192.168.181.231", "port" => 59426 }
备注:先关闭8888端口的应用,再开启,会输出如下日志。
(4)编码插件Codec:
Codec 是 logstash 从 1.3.0 版开始新引入的概念(Codec 来自 Coder/decoder 两个单词的首字母缩写)。在此之前,logstash 只支持纯文本形式输入,然后以过滤器处理它。但现在,我们可以在输入 期处理不同类型的数据,这全是因为有了 codec 设置。我们在第一个“Hello world”列子中已经用过Codec编码了,rubydebug就是一种Codec虽然它一般只会在stdout插件中,作为配置测试或者调试的工具。
(4.1)采用JSON编码,直接输入预定义好的 JSON 数据,这样就可以省略掉 filter/grok 配置!
配置实例以nginx为例,具体步骤如下:
a,编辑配置nginx配置文件nginx.conf。把原先的配置文件注释掉,换成json的格式,然后重启下你的nginx
[root@localhost test]# vim /usr/local/nginx/conf/nginx.conf user ftp; worker_processes 2; worker_rlimit_nofile 65535; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; include proxy.conf; #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' # '$status $body_bytes_sent "$http_referer" ' # '"$http_user_agent" "$http_x_forwarded_for"'; # log_format json '{"@timestamp":"$time_iso8601",' '"@version":"1",' '"host":"$server_addr",' '"client":"$remote_addr",' '"size":$body_bytes_sent,' '"responsetime":$request_time,' '"domain":"$host",' '"url":"$uri",' '"status":"$status"}'; access_log logs/nginx_access.log json; # access_log logs/access.log main; ####################注意:在$request_time和$body_bytes_sent 变量两头没有双引号"" ,这两个数据在JSON 里应该是数值类型。
b,编辑下你的logstash配置文件json.conf
[root@localhost test]# vim json.conf input { file { path => "/usr/local/nginx/logs/nginx_access.log" type => "nginx" start_position => "beginning" add_field => { "key"=>"value"} codec => "json" } } output { stdout{ codec => rubydebug{ } } }
c,logstash加载启动测试:
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/json.conf Settings: Default pipeline workers: 1 Logstash startup completed { "@timestamp" => "2017-05-25T03:26:19.000Z", "@version" => "1", "host" => "192.168.180.9", "client" => "192.168.181.231", "size" => 8250, "responsetime" => 0.157, "domain" => "192.168.180.9", "url" => "/", "status" => "200", "path" => "/usr/local/nginx/logs/nginx_access.log", "type" => "nginx", "key" => "value" } { "@timestamp" => "2017-05-25T03:26:19.000Z", "@version" => "1", "host" => "192.168.180.9", "client" => "192.168.181.231", "size" => 450, "responsetime" => 0.017, "domain" => "192.168.180.9", "url" => "/sc.do", "status" => "200", "path" => "/usr/local/nginx/logs/nginx_access.log", "type" => "nginx", "key" => "value" } { "@timestamp" => "2017-05-25T03:26:19.000Z", "@version" => "1", "host" => "192.168.180.9", "client" => "192.168.181.231", "size" => 16, "responsetime" => 0.083, "domain" => "192.168.180.9", "url" => "/logger/catch.do", "status" => "200", "path" => "/usr/local/nginx/logs/nginx_access.log", "type" => "nginx", "key" => "value" } { "@timestamp" => "2017-05-25T03:26:19.000Z", "@version" => "1", "host" => "192.168.180.9", "client" => "192.168.181.231", "size" => 41153, "responsetime" => 0.362, "domain" => "192.168.180.9", "url" => "/getPageData.do", "status" => "200", "path" => "/usr/local/nginx/logs/nginx_access.log", "type" => "nginx", "key" => "value" } { "@timestamp" => "2017-05-25T03:26:20.000Z", "@version" => "1", "host" => "192.168.180.9", "client" => "192.168.181.231", "size" => 51042, "responsetime" => 0.565, "domain" => "192.168.180.9", "url" => "/getPageData.do", "status" => "200", "path" => "/usr/local/nginx/logs/nginx_access.log", "type" => "nginx", "key" => "value"
(4.2)合并多行数据(Multiline):有些时候,应用程序调试日志会包含非常丰富的内容,为一个事件打印出很多行内容。这种日志通常都很难通过命令行解析的方式做分析。 logstash 正为此准备好了 codec/multiline 插件。multiline 插件也可以用于其他类似的堆栈式信息,比如 linux 的内核日志。
当启动logstash及配置文件时会让你输入一连串的字符,知道输入[ 时才终止当前输入,如下:
[
root@localhost test]# vim multiline.conf input { stdin { codec => multiline { pattern => "^\[" negate => true what => "previous" } } } output { stdout { codec => rubydebug{ } } } [root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/multiline.c onf Settings: Default pipeline workers: 1 Logstash startup completed hello hello world how are you abc2345 [ { "@timestamp" => "2017-05-25T03:44:35.604Z", "message" => "[\nhello\nhello world\nhow are you \nabc2345", "@version" => "1", "tags" => [ [0] "multiline" ], "host" => "localhost.localdomain" }
总之,这个插件的原理很简单,就是把当前行的数据添加到前面一行后面,直到新进的当前行匹配 “[” 正则为止。这个正则还可以用 grok 表达式。