fluentdについて

fluentdは主にログコレクターとして使用される。
すなわち、各サーバでログを収集し、ログサーバに送る。
ログサーバは受け取った各サーバのログを集約管理する。
いずれの用途も設定次第で切り替えることができるので、インストールパッケージ自体は同じである。

fluentdはログ入出力プラグインから主になっており、fluentdは入力プラグインから出力プラグインへのログデータの中継を行う。

用語

pos

posとはpositionの略で、特定のログファイルをどの部分まで処理し終えたのかを示すデータのことである。
posデータを保存するかどうかは任意であるが、posデータがないとfluentdがクラッシュした際にログの処理漏れが発生することになるので、posデータを保存するよう設定することが強く推奨されている。

構築

fluentd

インストール

参考
- 通常のインストール方法
```
curl -L http://toolbelt.treasuredata.com/sh/install-redhat.sh | sh
```
- パッケージをダウンロードしてインストール
  https://td-agent-package-browser.herokuapp.com/3/redhat/7/x86_64

サービスを有効化する

chkconfig --level 3 td-agent on
systemctl enable td-agent

起動

service td-agent start
systemctl start td-agent

ステータス確認

service td-agent status
systemctl status td-agent

終了

service td-agent stop
systemctl stop td-agent

再起動
```
service td-agent restart
```

クライアント用追加構築作業

posディレクトリ作成

インストール後、クライアントはpos用のデータディレクトリを作成する。

mkdir /var/log/td-agent/pos
chown td-agent:td-agent /var/log/td-agent/pos

JOSN形式パースの無効化

Fluentdはログをパースし、JSON形式で扱うが、単にログファイルを集約したいだけや、ログの元の書式を維持したい場合、そういった処理は不要である。
ただし、転送元を判別するために転送元のホスト名(IPアドレス)は記録した方がよい。

以下ではJOSN形式でパースせず、かつ出力テキストもタグやfluentによる時刻付加をしない、生ログ状態で出力するようFluentdを改造する。
なお、対象プラグインはin_tailとout_fileである。
設定はin_tailとout_fileのものがそのまま使用可能。

in_tailプラグインを改造する。
ここでは正規表現でパースしない様にする。

ファイルパスを検索する

find / -name in_tail.rb | grep lib/fluent

# 出力例
# /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb

バックアップを作成する

cp ～～～in_tail.rb ～～～in_tail.rb.bk

# 実行例
# cp /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb \
     /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb.bk

既存ファイルを修正する

sed -e "s/return @parser.parse(line)/return Engine.now,line/g" ～～～in_tail.rb.bk > ～～～in_tail.rb

# 実行例
# sed -e "s/return @parser.parse(line)/return Engine.now,line/g" \
   /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb.bk \
   > /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/in_tail.rb

out_fileプラグインを改造する。
ここでは正規表現でパースしない様にする。

ファイルパスを検索する

find / -name out_file.rb | grep lib/fluent

# 出力例
# /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/out_file.rb

バックアップを作成する

cp ～～～out_file.rb ～～～out_file.rb.bk

# 実行例
# cp /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/out_file.rb \
     /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/out_file.rb.bk

既存ファイルを修正する

sed -e "s/time_str = @timef.format(time)/if \/\\\\\/\/ =~ tag : taginfo = tag.gsub(\/.*\\\\\/\/, \"\")\\n      else taginfo = \"no host info\"\\n      end/g" \
  -e "s/\"#{time_str}\\\t#{tag}\\\t#{Yajl.dump(record)}\\\n\"/\"#{taginfo}\\\t#{record}\\\n\"/g" \
  ～～～/out_file.rb.bk > ～～～/out_file.rb

# 実行例
# sed -e "s/time_str = @timef.format(time)/if \/\\\\\/\/ =~ tag then taginfo = tag.gsub(\/.*\\\\\/\/, \"\")\\n      else taginfo = \"no host info\"\\n      end/g" \
    -e "s/\"#{time_str}\\\t#{tag}\\\t#{Yajl.dump(record)}\\\n\"/\"#{taginfo}\\\t#{record}\\\n\"/g" \
    /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/out_file.rb.bk \
    > /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.45/lib/fluent/plugin/out_file.rb

in_tailの設定を変更する
1. 自身のホスト名/IPアドレスを調べる
2. tag部分を次のように変更する
```
tag <元のタグ>/ホスト名
```
  or
```
tag <元のタグ>/IPアドレス
```
3. match部分を次のように変更する
```
<match <元のタグ>**>
```
out_fileの設定を変更する
1. match部分を次のように変更する
```
<match <元のタグ>**>
```

Fluentd設定

設定ファイル

/etc/td-agent/td-agent.conf

基本構成

下記が設定ファイルの基本構成である。
クライアントはログの入力元をファイル、出力先をサーバへの転送に設定する。
サーバはログの入力元をネットワークからの受信、出力先をローカルへのファイル書き出しに設定する。

<source>
	type <プラグイン名>
	ログの入力元について設定する
</source>

<match **>
	type <プラグイン名>
	ログの出力処理方法について設定する
</match>

プラグイン

Fluentdはログの入力(読取)処理、出力処理に関して、それぞれ独立したプラグインで行っている。

入力系プラグイン

in_tail

参考:http://docs.fluentd.org/articles/in_tail

ログローテーションに対応している
パス指定でワイルドカードに対応しているが、新規にファイルを作成した場合、何故か先頭行は処理されない。
ただし、ファイル名が事前に予測できる場合は、空のファイルを予め作成しておくことで、これを回避できる。

設定項目

format
noneとすることで、JSON形式でなく、そのまま転送することが可能

pos_file
pos_file中には以下が書き込まれる。
指定は任意であるが、指定することが強く薦められている。
```
<ログファイルパス> <転送済みバイト数(16進数)> <inode(16進数)>
```

出力系プラグイン

out_file

参考:http://docs.fluentd.org/articles/out_file

共通設定

buffer

 <buffer>
   @type file
   path <パス>
   flush_mode interval
   flush_interval 1s
   timekey 1h
 </buffer>

設定例

Apache

アクセスログ

クライアント設定

<source>
	type tail
	format none
	path /etc/httpd/logs/access_log
	pos_file /var/log/td-agent/pos/apache.access
	tag web.apache.access
</source>

<source>
	type tail
	format none
	path /etc/httpd/logs/error_log
	pos_file /var/log/td-agent/pos/apache.error
	tag web.apache.error
</source>

<source>
	type tail
	format none
	path /var/venusr/www/fuel/app/logs/*/*/*.php
	pos_file /var/log/td-agent/pos/php.app
	tag web.php.app
</source>

<match **>
	type forward
	<server>
		host 10.3.0.9
	</server>
</match>

サーバ設定

<source>
	type forward
</source>

<match web.apache.access**>
	type file
	path /var/log/td-agent/web.apache.access
	time_slice_format %Y%m%d
	time_slice_wait 1m
</match>

<match web.apache.error**>
	type file
	path /var/log/td-agent/web.apache.error
	time_slice_format %Y%m%d
	time_slice_wait 1m
</match>

<match web.php.app**>
	type file
	path /var/log/td-agent/web.php.app
	time_slice_format %Y%m%d
	time_slice_wait 1m
</match>

<match img.apache.access**>
	type file
	path /var/log/td-agent/img.apache.access
	time_slice_format %Y%m%d
	time_slice_wait 1m
</match>

<match img.apache.error**>
	type file
	path /var/log/td-agent/img.apache.error
	time_slice_format %Y%m%d
	time_slice_wait 1m
</match>

AWS

chkconfig td-agent off

wget http://169.254.169.254/latest/meta-data/local-ipv4 -O /var/run/local-ipv4
sed -e "s/^\(\s*tag\s\+.*\)/\1\/`cat /var/run/local-ipv4`/g" /etc/td-agent/td-agent.conf.base > /etc/td-agent/td-agent.conf