InfluxDB V2+ Telegraf + Grafana 监控服务器

安装必要包

1
apt-get update && apt-get install sudo vim curl wget gpg

安装NTP服务和设置时区

1
apt-get install systemd-timesyncd

设置时区

1
timedatectl set-timezone Asia/Shanghai

部署 InfluxDB

添加官方的 APT 源

1
2
3
4
# Ubuntu and Debian
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list

申请证书

下载 acme.sh

1
curl https://get.acme.sh | sh -s email=[email protected]

创建 alias

1
alias acme.sh=~/.acme.sh/acme.sh

启用自动更新

1
acme.sh --upgrade --auto-upgrade

导入 DNS API,怎么使用 DNS API wiki

1
2
3
4
5
6
7
# 操作前备份 ~/.bashrc 及写入 CF_Token 到 ~/.bashrc
cp ~/.bashrc ~/.bashrc.backup && cat ~/.bashrc

# Cloudflare DNS API Token
export CF_Token="your_CF_Token".
export CF_Account_ID="CF_Account_ID"
export CF_Zone_ID="CF_Zone_ID"

加载环境变量

1
source ~/.bashrc

检查是否生效

1
echo $CF_Token

删除 ~/.bashrc 的备份

1
rm ~/.bashrc.backup

正式申请证书 可以参考本博客 使用 Acme.sh 申请 Google 的免费 SSL 证书

1
2
3
4
5
6
7
# 申请通配符证书
acme.sh --issue --dns dns_cf -d '*.example.com'

# 或者

# 申请子域名证书
acme.sh --issue --dns dns_cf -d 'influxdb.example.com' -d 'grafana.example.com'

安装influxdb2

1
apt-get update && apt-get install influxdb2

删除 key 文件

1
rm influxdata-archive_compat.key

创建目录及更改目录所属用户 创建专门存放 InfluxDB 的数据的目录 创建专门存放 InfluxDB 的数据的目录 /mnt/data/influxdb

1
mkdir -p /mnt/data/influxdb

更改 /mnt/data/influxdb 的所属用户

1
chown influxdb /mnt/data/influxdb && chgrp influxdb /mnt/data/influxdb && chmod 750 /mnt/data/influxdb

安装证书和私钥

先创建一个脚本

1
mkdir -p /etc/influxdb/tls && nano /etc/influxdb/tls/set-permissions-and-restart.sh

输入以下内容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash

# 设置证书、私钥的用户
chown influxdb /etc/influxdb/tls/key.pem /etc/influxdb/tls/fullchain.pem

# 设置证书、私钥的权限为 400
chmod 400 /etc/influxdb/tls/key.pem /etc/influxdb/tls/fullchain.pem

# 检查并重启 influxdb.service
if systemctl list-units | grep influxdb.service; then
   systemctl restart influxdb.service
fi

acme 安装证书和私钥

1
2
3
4
acme.sh --install-cert -d '*.example.com' \
--key-file /etc/influxdb/tls/key.pem \
--fullchain-file /etc/influxdb/tls/fullchain.pem \
--reloadcmd "bash /etc/influxdb/tls/set-permissions-and-restart.sh"

编辑配置文件vim /etc/influxdb/config.toml 输入以下内容,详细的配置说明请查阅官网

1
2
3
4
5
6
7
bolt-path = "/mnt/data/influxdb/influxd.bolt"      # 单磁盘忽略
engine-path = "/mnt/data/influxdb/engine"          # 单磁盘忽略
sqlite-path = "/mnt/data/influxdb/influxd.sqlite"  # 单磁盘忽略

tls-cert = "/etc/influxdb/tls/fullchain.pem"
tls-key = "/etc/influxdb/tls/key.pem"
tls-min-version = "1.3"

设置环境变量 检查 $XDG_RUNTIME_DIR/bus 是否存在

1
echo $XDG_RUNTIME_DIR/bus

如果存在,可以将 DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus 写入到 /etc/default/influxdb2

1
echo -e "DBUS_SESSION_BUS_ADDRESS=\$XDG_RUNTIME_DIR/bus" >> /etc/default/influxdb2

如果不存在,就将 $XDG_RUNTIME_DIR/bus 改为 /dev/null 写入到 /etc/default/influxdb2

1
echo "DBUS_SESSION_BUS_ADDRESS=/dev/null" >> /etc/default/influxdb2

启动

1
systemctl start influxdb

浏览器 » https://influxdb.example.com:8086

1

初始化信息随意填写,但要记住后续登录 WebUI 都需要。点击 CONTINUE

2

把生成的 Token 保存下来,后续不再显示。点击 CONFIGURE LATER

3

进入之后,需要创建一个新的 Bucket,用于存放采集到的监控数据,与 Initial Bucket 的数据分离。路径:

Load Data > Bukets > CREATE BUCKET

4

Name 我就命名为 Telegraf 吧,然后选择数据保留的时间,可以选择从不删除和选择预设时间、自定义时间 5

部署 Grafana

官方下载地址

1
2
3
apt-get update && apt-get install libfontconfig1 musl
wget https://dl.grafana.com/oss/release/grafana_11.2.0_amd64.deb
dpkg -i grafana_11.2.0_amd64.deb

删除包

1
rm grafana_11.2.0_amd64.deb

设置自启动

1
systemctl enable grafana-server

安装证书 先创建证书更新脚本

1
mkdir -p /etc/grafana/tls && nano /etc/grafana/tls/set-permissions-and-restart.sh

输入以下内容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash 
 
# 设置证书、私钥的用户
chown grafana /etc/grafana/tls/key.pem /etc/grafana/tls/fullchain.pem

# 设置证书、私钥的权限为 400
chmod 400 /etc/grafana/tls/key.pem /etc/grafana/tls/fullchain.pem

# 检查并重启 grafana-server.service
if systemctl list-units | grep grafana-server.service; then
   systemctl restart grafana-server.service
fi

执行 acme.sh

1
2
3
4
acme.sh --install-cert -d '*.example.com' \
--key-file /etc/grafana/tls/key.pem \
--fullchain-file /etc/grafana/tls/fullchain.pem \
--reloadcmd "bash /etc/grafana/tls/set-permissions-and-restart.sh"

配置 HTTPS

1
nano /etc/grafana/grafana.ini

修改以下内容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[server]
protocol = h2
min_tls_version = "TLS1.3"
http_addr = 0.0.0.0
http_port = 3000
domain = grafana.example.com
enforce_domain = false
root_url = https://grafana.example.com:3000
cert_file = /etc/grafana/tls/fullchain.pem
cert_key = /etc/grafana/tls/key.pem

启动

1
systemctl start grafana-server

访问 浏览器 » https://grafana.example.com:3000,默认账号密码均是 admin

6 Grafana 连接 InfluxDB 回到 InfluxDB WebUI 创建允许 Grafana 读取数据的 Token。路径:

InfluxDB > Load Data > API Tokens > GENERATE API TOKEN > Custom API Token

7

在 Telegraf 的权限栏里勾上 Read。点击 GENERATE 把生成的 Token 保存下来,后续不再显示。(不小心没保存也不要紧,可以凭在初始化时生成的 Token 在命令行里查看) 8

回到 Grafana WebUI 添加数据源。路径:

Grafana > Home > Connection > Data Sources > Add data source > InfluxDB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Query language

InfluxQL
HTTP

url: https://localhost:8086
Auth

Skip TLS Verify 开启
Custom HTTP Headers

Header: Authorization | Value: Token+空格+<Token> 示例: Token zA61yxXG_SoS-VeWYTXBi27Dg8RDcCiMKne2kyafXU7jRAFgNzreFVKhazrxTl7W00_CJjG-cKbEzcdqkmKz1w==
InfluxDB Details**

Database: Telegraf
HTTP Method: GET

9

部署 Telegraf

安装必要的包

1
2
#Telegraf 与 InfluxDB 部署在同一台服务器上不需要此步骤,前面已经安装过了
apt-get update && apt-get install sudo wget gpg

添加 InfluxData 官方的 APT 源 RedHat系请查阅官网:https://www.influxdata.com/downloads

1
2
3
4
# Ubuntu and Debian Telegraf 与 InfluxDB 部署在同一台服务器上不需要此步骤,前面已经添加过了
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list

安装

1
apt-get update && apt-get install telegraf

设置 Telegraf 环境变量 检查 $XDG_RUNTIME_DIR/bus 是否存在

1
2
echo $XDG_RUNTIME_DIR/bus
/run/user/0/bus

如果存在,可以将 DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus 写入到 /etc/default/telegraf

1
echo -e "DBUS_SESSION_BUS_ADDRESS=\$XDG_RUNTIME_DIR/bus" >> /etc/default/telegraf

如果不存在,就将 $XDG_RUNTIME_DIR/bus 改为 /dev/null 写入到 /etc/default/telegraf

1
echo "DBUS_SESSION_BUS_ADDRESS=/dev/null" >> /etc/default/telegraf

创建配置文件 Telegraf 加载配置的方式有两种,一是本地配置文件,二是远程配置文件,其中二选一 本地配置文件 在 /etc/telegraf/telegraf.conf 下直接编辑就行。

远程配置文件 因为第一次部署还没有配置文件,需要要先创建。后续部署到其他服务器无需重复此步骤 回到 InfluxDB WebUI 创建远程配置文件。路径:

InfluxDB > Load Data > Telegraf > CREATE CONFIGURATION

10

Bucket 选择 Telegraf ,模板随便选一个。点击 CONTINUE CONFIGURING

11

Configuration Name 我就命名为 Example 吧,然后删掉模板的全部内容,填入自己的配置,点击 SAVE AND TEST。 详细的配置说明请查阅官方Github文档

也可以参考我的配置:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
[global_tags]
  host_info = "$HOST_INFO"

[agent]
  interval = "5s"
  round_interval = true

  metric_batch_size = 200
  metric_buffer_limit = 2000

  collection_jitter = "0s"
  # collection_offset = "0s"
  flush_interval = "6s"
  flush_jitter = "0s"
  precision = "0s"

  debug = false
  quiet = false
  logformat = "text"
  # logfile = "/var/log/telegraf/run.log"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_size = "32MB"
  logfile_rotation_max_archives = 7
  log_with_timezone = "Asia/Shanghai"

  # hostname = ""
  omit_hostname = true
  # snmp_translator = "netsnmp"
  # statefile = ""
  # skip_processors_after_aggregators = false

[[inputs.cpu]]
  ## Plugin configuration
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = true
  core_tags = false
  ## Modifier filters
  fieldinclude = ["usage_system", "usage_user"]

[[inputs.mem]]
  ## Plugin configuration
  # no configuration
  ## Modifier filters
  fieldinclude = ["total", "used", "used_percent"]

[[inputs.swap]]
  ## Plugin configuration
  # no configuration
  ## Modifier filters
  fieldinclude = ["total", "used", "used_percent"]

[[inputs.disk]]
  ## Plugin configuration
  # mount_points = ["/"]
  ignore_fs = ["tmpfs", "devtmpfs"]
  # ignore_mount_opts = []
  ## Modifier filters
  tagexclude = ["fstype", "mode", "path"]
  fieldinclude = ["total", "used", "used_percent"]

[[inputs.diskio]]
  ## Plugin configuration
  devices = ["sd[a-z]", "vd[a-z]", "xvd[a-z]"]
  skip_serial_number = true
  # device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"]
  # name_templates = ["$ID_FS_LABEL","$DM_VG_NAME/$DM_LV_NAME"]
  ## Modifier filters
  fieldinclude = ["reads", "writes"]

[[inputs.net]]
  ## Plugin configuration
  interfaces = ["enX*", "ens*", "eth*"]
  ignore_protocol_stats = true
  ## Modifier filters
  fieldinclude = ["bytes_sent", "bytes_recv"]

[[inputs.netstat]]
  ## Plugin configuration
  # no configuration
  ## Modifier filters
  fieldinclude = ["tcp_established", "udp_socket"]

[[inputs.system]]
  ## Plugin configuration
  # no configured.
  ## Modifier filters
  fieldinclude = ["load1", "load15", "load5"]

[[inputs.system]]
  interval = "60s"
  ## Plugin configuration
  # no configured.
  ## Modifier filters
  fieldinclude = ["uptime_format"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Haikou", company = "Telecom"}
  ## Plugin configuration
  urls = ["124.225.43.220"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Haikou", company = "Unicom"}
  ## Plugin configuration
  urls = ["153.0.226.35"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Haikou", company = "Mobile"}
  ## Plugin configuration
  urls = ["111.29.29.219"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Guangzhou", company = "Telecom"}
  ## Plugin configuration
  urls = ["183.47.126.35"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Guangzhou", company = "Unicom"}
  ## Plugin configuration
  urls = ["157.148.58.29"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Guangzhou", company = "Mobile"}
  ## Plugin configuration
  urls = ["120.233.18.250"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Shanghai", company = "Telecom"}
  ## Plugin configuration
  urls = ["114.80.236.139"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Shanghai", company = "Unicom"}
  ## Plugin configuration
  urls = ["210.22.97.1"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Shanghai", company = "Mobile"}
  ## Plugin configuration
  urls = ["221.183.90.237"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Beijing", company = "Telecom"}
  ## Plugin configuration
  urls = ["49.7.37.74"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Beijing", company = "Unicom"}
  ## Plugin configuration
  urls = ["111.206.209.44"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.ping]]
  interval = "30s"
  tags = {city = "Beijing", company = "Mobile"}
  ## Plugin configuration
  urls = ["112.34.111.194"]
  method = "exec"
  count = 1
  ping_interval = 1.0
  timeout = 1.0
  size = 64
  ## Modifier filters
  fieldinclude = ["average_response_ms", "percent_packet_loss", "result_code"]

[[inputs.internet_speed]] # 
  interval = "6h"
  ## Plugin configuration
  memory_saving_mode = false
  cache = false
  test_mode = "single"
  ## Modifier filters
  tagexclude = ["server_id", "source", "test_mode"]
  fieldinclude = ["download", "upload"]

[[processors.enum]]
  namepass = ["swap"]
  [[processors.enum.mapping]]
    field = "total"
    dest = "used_percent"
    [processors.enum.mapping.value_mappings]
      0 = -1.0

[[processors.enum]]
  namepass = ["swap"]
  [[processors.enum.mapping]]
    field = "total"
    dest = "status_code"
    default = 0
    [processors.enum.mapping.value_mappings]
      0 = 1

[[processors.enum]]
  namepass = ["ping"]
  [[processors.enum.mapping]]
    field = "result_code"
    dest = "average_response_ms"
    [processors.enum.mapping.value_mappings]
      1 = 0.000
      2 = 0.000

[[processors.split]]
  namepass = ["swap"]
  drop_original = true
  [[processors.split.template]]
    name = "swap"
    tags = ["*"]
    fields = ["total", "used", "used_percent"]
  [[processors.split.template]]
    name = "swap"
    tags = ["*"]
    fields = ["status_code"]

[[processors.filter]]
  default = "pass"
  [[processors.filter.rule]]
    name = ["swap"]
    fields = ["in", "out"]
    action = "drop"
  [[processors.filter.rule]]
    name = ["system"]
    fields = ["uptime"]
    action = "drop"

[[outputs.influxdb_v2]]
  urls = ["https://influxdb.example.com:8086"]
  token = "$INFLUX_TOKEN"
  organization = "Monitor"
  bucket = "Telegraf"
  # bucket_tag = ""
  # exclude_bucket_tag = false
  timeout = "5s"
  # http_headers = {"X-Special-Header" = "Special-Value"}
  # http_proxy = "http://corporate.proxy:3128"
  # user_agent = "telegraf"
  content_encoding = "gzip"
  # influx_uint_support = false
  # influx_omit_timestamp = false
  # ping_timeout = "0s"
  # read_idle_timeout = "0s"
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  insecure_skip_verify = false

12

把生成的 Configuration API Token 和 Configuration URL 都复制下来,保存 13

再次编辑配置,把 [global_tags] 以上 InfluxDB 自动添加的,多余的内容删除,只保留自己的。点击 SAVE CHANGES 14

加载配置 本地配置加载的方式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cat >> /etc/profile.d/telegraf.sh << EOF
# Telegraf environment
export HOST_INFO="这里填服务器的信息,随意填写,知道是哪一台服务器就行,比如:xxCloud_HK-1C1G"
export INFLUX_TOKEN="<Configuration API Token>"
EOF

# 加载环境变量
source /etc/profile

# 检查是否生效
echo -e "$HOST_INFO\n$INFLUX_TOKEN"

启动

1
systemctl start telegraf

远程配置加载的方式 设置 Telegraf 环境变量

1
2
3
4
5
cat >> /etc/default/telegraf << EOF
HOST_INFO="这里填服务器的信息,随意填写,知道是哪一台服务器就行,比如:xxCloud_HK-1C1G"
TELEGRAF_OPTS="-config <Configuration URL>"
INFLUX_TOKEN="<Configuration API Token>"
EOF

另外,使用远程配置文件需要清空 /etc/telegraf/telegraf.conf 里的内容或修改 /lib/systemd/system/telegraf.service

1
2
3
4
5
6
7
echo "" > /etc/telegraf/telegraf.conf

# 或者

nano /lib/systemd/system/telegraf.service # 将 ExecStart 参数修改成以下

ExecStart=/usr/bin/telegraf $TELEGRAF_OPTS

可视化数据

路径:Grafana > Dashboards > New > New dashboard > Add visualization > Select data source 15

0%