0801-ElasticStack

前言

★ 学习介绍

★ 技术栈介绍

1. ElasticSearch

​ ElasticSearch是基于ApacheLucene的开源搜索引擎,Lucene可以被认为迄今为止最先进、性能最好、功能最全的搜索引擎库;但是Lucene是Java语法开发,并且非常复杂,需要对其工作原理要有深入理解;ElasticSearch也使用Java开发并使用Lucene来实现核心功能:分布式索引和搜索功能;ElasticSearch通过RESTful API来隐藏Lucene的复杂性,实现搜索引擎的高可用。

2. ELK

​ ELK是Elasticsearch、Kibana、Logstash三个技术的组合,组合使用可以解决大部分软件开发中的日志分析与处理工作,能够安全可靠地获取任何来源、任何格式的数据,并且能够实时地对数据进行搜索、分析和可视化;其中,Logstash负责数据的收集,Kibana负责结果数据的可视化展现,Elasticsearch作为核心部分用于数据的分布式存储以及索引。

3. ElasticStack

​ ElasticStack表示ES技术栈,在ELK的基础设基础上新增了Beats技术成员,所以ELK重新改名为ElasticStack;ElasticStack的基本工作流程如下图:

  1. Beats将采集的各种数据发送到ElasticSearch或交给LogStash进行数据处理
  2. Logstash的主要作用是做数据处理工作
  3. ElasticSearch主要是保存数据
  4. 最终Kibana连接ElasticSearch将数据可视化展示

GfcENQ.png

  1. ElasticSearch:是使用Java语言编写的并且基于Lucene编写的搜索引擎框架,主要特点是:分布式,零配置,自动发现,索引自动分片,索引副本机制,RESTful风格接口,多数据源,自动搜索负载等。核心技术是倒排索引:在ElasticSrearch中数据存储在索引中,ElasticSearch会根据索引中的数据进行分词保存在分词库中;当需要检索数据时候,首先会根据检索关键字在分词库中检索出索引ID,再根据检索的索引ID去索引中直接查找对应的数据;
    • Lucene:本身就是一个搜索引擎的底层;
    • Slor:查询死数据时候,速度相对ES而言Slor更快一些,如果数据实时改变的,Slor速度会受很大影响;ES集群更容易搭建;
    • 分布式:主要是体现在横向扩展能力;
  2. LogStash:基于Java,是一个开源的用于收集、分析和储存日志的工具
  3. Kibana:基于Node,Kibna可以为LogStash和ElasticSearch提供友好的web界面,可以汇总,分析,搜索重要数据
  4. Beats:是elastic公司开源的一款采集系统监控数据的agent,是在被监控的服务器上以客户端的形式运行的数据收集器统称,可以直接把数据发送给ElasticSearch或者通过LogStash发送给ElasticSearch,然后进行后续的数据分析活动。Beat主要的组成:
    • Packetbeat:是一个网络数据包分析器,用于监控、收集网络流量信息,Packetbeat嗅探服务器直接的流量,解析应用层协议,并关联到消息的处理,支持ICMP(IPV4 and IPV6)、DNS、HTTP、MySql、Redis、PostgrpreSQL、MongoDB等协议
    • Filebeat:用于监控收集服务器日志文件,以取代logstash forwarder
    • Metricbeat:可定期获取外部系统监控指标信息,其可以监控收集Apache、HAProxy、MongoDB、MySql、Nginx、Redis、PostgrpreSQL、Redis、System、ZooKeeper等服务
    • Winlogbeat:用于监控收集Windows的日志信息。

第一部分 Elasticsearch

第一章 Elasticsearch安装

1.1 单机版-Windows系统

  • 下载ElasticSearch:https://www.elastic.co/cn/downloads/elasticsearch
  • 解压下载安装包到指定目录
  • 执行es软件包中进入/bin目录:启动elasticsearch.bat脚本文件
  • 测试访问:http://localhost:9200/

1.2 单机版-Mac系统

1.3 单机版-Linux系统

  • 下载Elastic安装包并解压到软件目录:https://www.elastic.co/cn/downloads/elasticsearch

  • 环境准备

    • 删除CentOS中预安装的Java

      1
      2
      rpm -qa | grep java
      rpm -e --nodeps xxx
    • 安装ElasticSearch对应的JDK

      1
      2
      3
      export JAVA_HOME=/opt/jdk
      export PATH=$JAVA_HOME/bin:$PATH
      export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    • 创建ElasticSearch用户:因为ElasticSearch默认不支持root用于运行,所以ElasticSearch需要单独创建用户

      1
      useradd elsearch
    • 新建ElasticSearch的安装目录,安装目录自定义,这里是安装在opt中是search目录中,并上传ElasticSearch的安装包并解压到search目录中

      1
      2
      3
      cd /opt
      mkdir search
      chown elsearch:elsearch search/ -R
  • 修改系统配置

    • 配置系统的内存一个进程在VMAS(虚拟内存)创建内存映射的最大数量:需要使用root用户进行操作

      1
      2
      3
      vim /etc/sysctl.conf		# 编辑配置文件,修改此文件需要重启Linux服务器
      vm.max_map_count=655360 # 修改内存最大映射数量
      sysctl -p # 查看配置后的信息
    • 修改最大文件描述以满足ELasticSearch:max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535];*号的位置标示支持的用户名称,*号匹配所有用户

      1
      2
      3
      4
      5
      6
      vim /etc/security/limits.conf
      # >>>>追加 * 号表示所有用户
      * soft nofile 65536
      * hard nofile 131072
      * soft nproc 2048
      * hard nproc 4096
    • 默认进程中的线程数2014太低 最少是4096:max number of threads [3756] for user [elsearch] is too low, increase to at least [4096]

      1
      2
      3
      vim /etc/security/limits.d/20-nproc.conf
      # >>>>修改
      * soft nproc 4096
    • 开启ElasticSearch端口:9200

      1
      2
      firewall-cmd --zone=public --add-port=9200/tcp --permanent 
      firewall-cmd --reload
    • 修改ElasticSearch的安装的用户所属并切换到elsearch用户:在启动后ElasticSearch生成的日志相关文件所属也是root用户,需要将这些文件也修改为新建的elsearch(非root用户)的所属;

      1
      2
      chown elsearch:elsearch -R /opt/search
      su - elsearch
    • 用elsearch(非root用户)用户启动ElasticSearch服务

      1
      2
      3
      cd /bin
      ./elasticsearch # 前台启动
      ./elasticsearch & # 后台启动
  • ElasticSearch配置文件说明:在/opt/search/config/目录中的相关文件

    • elasticsearch.yml:ElasticSearch的启动配置文件

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      # ---------------------------------- Cluster -----------------------------------
      # 集群名称
      #cluster.name: my-application
      # ------------------------------------ Node ------------------------------------
      # 节点名称:
      node.name: node-1
      # 节点自定义属性:
      #node.attr.rack: r1
      # ----------------------------------- Paths ------------------------------------
      # 存储数据的目录,多个路径用逗号分隔:
      #path.data: /path/to/data
      # 日志文件目录:
      #path.logs: /path/to/logs
      # ----------------------------------- Memory -----------------------------------
      # 启动时是否锁定内存:
      #bootstrap.memory_lock: true
      # ---------------------------------- Network -----------------------------------
      # 将绑定地址设置为特定的IP (IPv4或IPv6) 0 任意端口可以访问
      network.host: 0.0.0.0
      # 为HTTP设置自定义端口:
      http.port: 9200
      # --------------------------------- Discovery ----------------------------------
      #当这个节点启动时,传递一个初始的主机列表来执行发现: 默认的主机列表是["127.0.0.1","[::1]"]
      #discovery.seed_hosts: ["host1", "host2"]
      # 使用主节点的初始集合引导集群:
      cluster.initial_master_nodes: ["node-1"]
      # ---------------------------------- Gateway -----------------------------------
      # 在整个集群重新启动后阻塞初始恢复,直到N个节点启动:
      #gateway.recover_after_nodes: 3
      # ---------------------------------- Various -----------------------------------
      # 删除索引时要求显式名称:
      #action.destructive_requires_name: true
    • jvm.options:ElasticSearch是基于Java开发,jvm.options用于设置ElasticSearch运行时jvm环境相关配置;ElasticSearch中的host配置不是localhost或127.0.0.1会被认为是生产环境,会多ElasticSearch启动要求比较高;

      1
      2
      3
      4
      # Xms 表示总的堆空间的初始大小
      # Xmx 表示堆空间的最大大小
      -Xms128m
      -Xmx128m
    • log4j2.properties

  • 启动ElasticSearch:安装包/bin目录中执行启动脚本,测试ES服务

    1
    curl 127.0.0.1:9200

1.4 单机版-Docker命令行

  • 下载ElasticSearch镜像

    1
    docker pull elasticsearch:8.0.1
  • 将镜像中的配置文件备份到宿主机中的config目录中

  • 使用命令行安装docker:①需要准备好宿主机中config中的配置文件②修改下面命令行中的数据集容器卷路径

    1
    2
    3
    4
    5
    6
    7
    8
    docker run -d --name es801 \
    -p 9200:9200 -p 9300:9300 \
    -v ~/source_docker/es/data:/usr/share/elasticsearch/data \
    -v ~/source_docker/es/config:/usr/share/elasticsearch/config \
    -v ~/source_docker/es/logs:/usr/share/elasticsearch/logs \
    -v ~/source_docker/es/plugins:/usr/share/elasticsearch/plugins \
    -e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
    -e "discovery.type=single-node" elasticsearch:8.0.1

1.5 单机版-DockerFile安装

1.6 单机版-DockerComponse安装

  • 使用Docker命令下载镜像

    1
    2
    3
    4
    5
    6
    # ES版本8
    docker pull elasticsearch:8.2.0
    docker pull kibana:8.2.0
    # ES版本7
    docker pull elasticsearch:7.17.3
    docker pull kibana:7.17.3
  • 初始化DockerCompose安装目录

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ElasticSearch8/
    es8/
    /data/
    /logs/
    /config/
    /elasticsearch.yml
    /jvm.options
    /log4j2.properties
    kibana/
    config/kibana.yml
  • 准备配置文件:elasticsearch.yml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    # ---------------------------------- Cluster -----------------------------------
    # 集群名称
    #cluster.name: my-application
    # ------------------------------------ Node ------------------------------------
    # 节点名称:
    node.name: node-1
    # 节点自定义属性:
    #node.attr.rack: r1
    # ----------------------------------- Paths ------------------------------------
    # 存储数据的目录,多个路径用逗号分隔:
    #path.data: /path/to/data
    # 日志文件目录:
    #path.logs: /path/to/logs
    # ----------------------------------- Memory -----------------------------------
    # 启动时是否锁定内存:
    #bootstrap.memory_lock: true
    # ---------------------------------- Network -----------------------------------
    # 将绑定地址设置为特定的IP (IPv4或IPv6) 0 任意端口可以访问
    network.host: 0.0.0.0
    # 为HTTP设置自定义端口:
    http.port: 9200
    # --------------------------------- Discovery ----------------------------------
    #当这个节点启动时,传递一个初始的主机列表来执行发现: 默认的主机列表是["127.0.0.1","[::1]"]
    #discovery.seed_hosts: ["host1", "host2"]
    # 使用主节点的初始集合引导集群:
    # cluster.initial_master_nodes: ["node-1"]
    # ---------------------------------- Gateway -----------------------------------
    # 在整个集群重新启动后阻塞初始恢复,直到N个节点启动:
    #gateway.recover_after_nodes: 3
    # ---------------------------------- Various -----------------------------------
    # 删除索引时要求显式名称:
    #action.destructive_requires_name: true
    xpack.security.enabled: false
  • 启动需要JVM配置文件:jvm.options

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    ## GC configuration
    8-13:-XX:+UseConcMarkSweepGC
    8-13:-XX:CMSInitiatingOccupancyFraction=75
    8-13:-XX:+UseCMSInitiatingOccupancyOnly

    ## G1GC Configuration
    # to use G1GC, uncomment the next two lines and update the version on the
    # following three lines to your version of the JDK
    # 8-13:-XX:-UseConcMarkSweepGC
    # 8-13:-XX:-UseCMSInitiatingOccupancyOnly
    14-:-XX:+UseG1GC

    ## JVM temporary directory
    -Djava.io.tmpdir=${ES_TMPDIR}

    ## heap dumps

    # generate a heap dump when an allocation from the Java heap fails; heap dumps
    # are created in the working directory of the JVM unless an alternative path is
    # specified
    -XX:+HeapDumpOnOutOfMemoryError

    # exit right after heap dump on out of memory error. Recommended to also use
    # on java 8 for supported versions (8u92+).
    9-:-XX:+ExitOnOutOfMemoryError

    # specify an alternative path for heap dumps; ensure the directory exists and
    # has sufficient space
    -XX:HeapDumpPath=data

    # specify an alternative path for JVM fatal error logs
    -XX:ErrorFile=logs/hs_err_pid%p.log

    ## GC logging
    -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
  • log4j2.properties:安装包默认的log4j2日志文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    status = error

    appender.console.type = Console
    appender.console.name = console
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n

    ######## Server JSON ############################
    appender.rolling.type = RollingFile
    appender.rolling.name = rolling
    appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_server.json
    appender.rolling.layout.type = ECSJsonLayout
    appender.rolling.layout.dataset = elasticsearch.server

    appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.json.gz
    appender.rolling.policies.type = Policies
    appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
    appender.rolling.policies.time.interval = 1
    appender.rolling.policies.time.modulate = true
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size = 128MB
    appender.rolling.strategy.type = DefaultRolloverStrategy
    appender.rolling.strategy.fileIndex = nomax
    appender.rolling.strategy.action.type = Delete
    appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path}
    appender.rolling.strategy.action.condition.type = IfFileName
    appender.rolling.strategy.action.condition.glob = ${sys:es.logs.cluster_name}-*
    appender.rolling.strategy.action.condition.nested_condition.type = IfAccumulatedFileSize
    appender.rolling.strategy.action.condition.nested_condition.exceeds = 2GB
    ################################################
    ######## Server - old style pattern ###########
    appender.rolling_old.type = RollingFile
    appender.rolling_old.name = rolling_old
    appender.rolling_old.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log
    appender.rolling_old.layout.type = PatternLayout
    appender.rolling_old.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n

    appender.rolling_old.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.log.gz
    appender.rolling_old.policies.type = Policies
    appender.rolling_old.policies.time.type = TimeBasedTriggeringPolicy
    appender.rolling_old.policies.time.interval = 1
    appender.rolling_old.policies.time.modulate = true
    appender.rolling_old.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling_old.policies.size.size = 128MB
    appender.rolling_old.strategy.type = DefaultRolloverStrategy
    appender.rolling_old.strategy.fileIndex = nomax
    appender.rolling_old.strategy.action.type = Delete
    appender.rolling_old.strategy.action.basepath = ${sys:es.logs.base_path}
    appender.rolling_old.strategy.action.condition.type = IfFileName
    appender.rolling_old.strategy.action.condition.glob = ${sys:es.logs.cluster_name}-*
    appender.rolling_old.strategy.action.condition.nested_condition.type = IfAccumulatedFileSize
    appender.rolling_old.strategy.action.condition.nested_condition.exceeds = 2GB
    ################################################

    rootLogger.level = info
    rootLogger.appenderRef.console.ref = console
    rootLogger.appenderRef.rolling.ref = rolling
    rootLogger.appenderRef.rolling_old.ref = rolling_old

    ######## Deprecation JSON #######################
    appender.deprecation_rolling.type = RollingFile
    appender.deprecation_rolling.name = deprecation_rolling
    appender.deprecation_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_deprecation.json
    appender.deprecation_rolling.layout.type = ECSJsonLayout
    # Intentionally follows a different pattern to above
    appender.deprecation_rolling.layout.dataset = deprecation.elasticsearch
    appender.deprecation_rolling.filter.rate_limit.type = RateLimitingFilter

    appender.deprecation_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_deprecation-%i.json.gz
    appender.deprecation_rolling.policies.type = Policies
    appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.deprecation_rolling.policies.size.size = 1GB
    appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
    appender.deprecation_rolling.strategy.max = 4

    appender.header_warning.type = HeaderWarningAppender
    appender.header_warning.name = header_warning
    #################################################

    logger.deprecation.name = org.elasticsearch.deprecation
    logger.deprecation.level = WARN
    logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
    logger.deprecation.appenderRef.header_warning.ref = header_warning
    logger.deprecation.additivity = false

    ######## Search slowlog JSON ####################
    appender.index_search_slowlog_rolling.type = RollingFile
    appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
    appender.index_search_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs\
    .cluster_name}_index_search_slowlog.json
    appender.index_search_slowlog_rolling.layout.type = ECSJsonLayout
    appender.index_search_slowlog_rolling.layout.dataset = elasticsearch.index_search_slowlog

    appender.index_search_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs\
    .cluster_name}_index_search_slowlog-%i.json.gz
    appender.index_search_slowlog_rolling.policies.type = Policies
    appender.index_search_slowlog_rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.index_search_slowlog_rolling.policies.size.size = 1GB
    appender.index_search_slowlog_rolling.strategy.type = DefaultRolloverStrategy
    appender.index_search_slowlog_rolling.strategy.max = 4
    #################################################

    #################################################
    logger.index_search_slowlog_rolling.name = index.search.slowlog
    logger.index_search_slowlog_rolling.level = trace
    logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
    logger.index_search_slowlog_rolling.additivity = false

    ######## Indexing slowlog JSON ##################
    appender.index_indexing_slowlog_rolling.type = RollingFile
    appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
    appender.index_indexing_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}\
    _index_indexing_slowlog.json
    appender.index_indexing_slowlog_rolling.layout.type = ECSJsonLayout
    appender.index_indexing_slowlog_rolling.layout.dataset = elasticsearch.index_indexing_slowlog


    appender.index_indexing_slowlog_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}\
    _index_indexing_slowlog-%i.json.gz
    appender.index_indexing_slowlog_rolling.policies.type = Policies
    appender.index_indexing_slowlog_rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.index_indexing_slowlog_rolling.policies.size.size = 1GB
    appender.index_indexing_slowlog_rolling.strategy.type = DefaultRolloverStrategy
    appender.index_indexing_slowlog_rolling.strategy.max = 4
    #################################################


    logger.index_indexing_slowlog.name = index.indexing.slowlog.index
    logger.index_indexing_slowlog.level = trace
    logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
    logger.index_indexing_slowlog.additivity = false


    logger.com_amazonaws.name = com.amazonaws
    logger.com_amazonaws.level = warn

    logger.com_amazonaws_jmx_SdkMBeanRegistrySupport.name = com.amazonaws.jmx.SdkMBeanRegistrySupport
    logger.com_amazonaws_jmx_SdkMBeanRegistrySupport.level = error

    logger.com_amazonaws_metrics_AwsSdkMetrics.name = com.amazonaws.metrics.AwsSdkMetrics
    logger.com_amazonaws_metrics_AwsSdkMetrics.level = error

    logger.com_amazonaws_auth_profile_internal_BasicProfileConfigFileLoader.name = com.amazonaws.auth.profile.internal.BasicProfileConfigFileLoader
    logger.com_amazonaws_auth_profile_internal_BasicProfileConfigFileLoader.level = error

    logger.com_amazonaws_services_s3_internal_UseArnRegionResolver.name = com.amazonaws.services.s3.internal.UseArnRegionResolver
    logger.com_amazonaws_services_s3_internal_UseArnRegionResolver.level = error


    appender.audit_rolling.type = RollingFile
    appender.audit_rolling.name = audit_rolling
    appender.audit_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_audit.json
    appender.audit_rolling.layout.type = PatternLayout
    appender.audit_rolling.layout.pattern = {\
    "type":"audit", \
    "timestamp":"%d{yyyy-MM-dd'T'HH:mm:ss,SSSZ}"\
    %varsNotEmpty{, "cluster.name":"%enc{%map{cluster.name}}{JSON}"}\
    %varsNotEmpty{, "cluster.uuid":"%enc{%map{cluster.uuid}}{JSON}"}\
    %varsNotEmpty{, "node.name":"%enc{%map{node.name}}{JSON}"}\
    %varsNotEmpty{, "node.id":"%enc{%map{node.id}}{JSON}"}\
    %varsNotEmpty{, "host.name":"%enc{%map{host.name}}{JSON}"}\
    %varsNotEmpty{, "host.ip":"%enc{%map{host.ip}}{JSON}"}\
    %varsNotEmpty{, "event.type":"%enc{%map{event.type}}{JSON}"}\
    %varsNotEmpty{, "event.action":"%enc{%map{event.action}}{JSON}"}\
    %varsNotEmpty{, "authentication.type":"%enc{%map{authentication.type}}{JSON}"}\
    %varsNotEmpty{, "user.name":"%enc{%map{user.name}}{JSON}"}\
    %varsNotEmpty{, "user.run_by.name":"%enc{%map{user.run_by.name}}{JSON}"}\
    %varsNotEmpty{, "user.run_as.name":"%enc{%map{user.run_as.name}}{JSON}"}\
    %varsNotEmpty{, "user.realm":"%enc{%map{user.realm}}{JSON}"}\
    %varsNotEmpty{, "user.run_by.realm":"%enc{%map{user.run_by.realm}}{JSON}"}\
    %varsNotEmpty{, "user.run_as.realm":"%enc{%map{user.run_as.realm}}{JSON}"}\
    %varsNotEmpty{, "user.roles":%map{user.roles}}\
    %varsNotEmpty{, "apikey.id":"%enc{%map{apikey.id}}{JSON}"}\
    %varsNotEmpty{, "apikey.name":"%enc{%map{apikey.name}}{JSON}"}\
    %varsNotEmpty{, "authentication.token.name":"%enc{%map{authentication.token.name}}{JSON}"}\
    %varsNotEmpty{, "authentication.token.type":"%enc{%map{authentication.token.type}}{JSON}"}\
    %varsNotEmpty{, "origin.type":"%enc{%map{origin.type}}{JSON}"}\
    %varsNotEmpty{, "origin.address":"%enc{%map{origin.address}}{JSON}"}\
    %varsNotEmpty{, "realm":"%enc{%map{realm}}{JSON}"}\
    %varsNotEmpty{, "url.path":"%enc{%map{url.path}}{JSON}"}\
    %varsNotEmpty{, "url.query":"%enc{%map{url.query}}{JSON}"}\
    %varsNotEmpty{, "request.method":"%enc{%map{request.method}}{JSON}"}\
    %varsNotEmpty{, "request.body":"%enc{%map{request.body}}{JSON}"}\
    %varsNotEmpty{, "request.id":"%enc{%map{request.id}}{JSON}"}\
    %varsNotEmpty{, "action":"%enc{%map{action}}{JSON}"}\
    %varsNotEmpty{, "request.name":"%enc{%map{request.name}}{JSON}"}\
    %varsNotEmpty{, "indices":%map{indices}}\
    %varsNotEmpty{, "opaque_id":"%enc{%map{opaque_id}}{JSON}"}\
    %varsNotEmpty{, "trace.id":"%enc{%map{trace.id}}{JSON}"}\
    %varsNotEmpty{, "x_forwarded_for":"%enc{%map{x_forwarded_for}}{JSON}"}\
    %varsNotEmpty{, "transport.profile":"%enc{%map{transport.profile}}{JSON}"}\
    %varsNotEmpty{, "rule":"%enc{%map{rule}}{JSON}"}\
    %varsNotEmpty{, "put":%map{put}}\
    %varsNotEmpty{, "delete":%map{delete}}\
    %varsNotEmpty{, "change":%map{change}}\
    %varsNotEmpty{, "create":%map{create}}\
    %varsNotEmpty{, "invalidate":%map{invalidate}}\
    }%n
    # "node.name" node name from the `elasticsearch.yml` settings
    # "node.id" node id which should not change between cluster restarts
    # "host.name" unresolved hostname of the local node
    # "host.ip" the local bound ip (i.e. the ip listening for connections)
    # "origin.type" a received REST request is translated into one or more transport requests. This indicates which processing layer generated the event "rest" or "transport" (internal)
    # "event.action" the name of the audited event, eg. "authentication_failed", "access_granted", "run_as_granted", etc.
    # "authentication.type" one of "realm", "api_key", "token", "anonymous" or "internal"
    # "user.name" the subject name as authenticated by a realm
    # "user.run_by.name" the original authenticated subject name that is impersonating another one.
    # "user.run_as.name" if this "event.action" is of a run_as type, this is the subject name to be impersonated as.
    # "user.realm" the name of the realm that authenticated "user.name"
    # "user.run_by.realm" the realm name of the impersonating subject ("user.run_by.name")
    # "user.run_as.realm" if this "event.action" is of a run_as type, this is the realm name the impersonated user is looked up from
    # "user.roles" the roles array of the user; these are the roles that are granting privileges
    # "apikey.id" this field is present if and only if the "authentication.type" is "api_key"
    # "apikey.name" this field is present if and only if the "authentication.type" is "api_key"
    # "authentication.token.name" this field is present if and only if the authenticating credential is a service account token
    # "authentication.token.type" this field is present if and only if the authenticating credential is a service account token
    # "event.type" informs about what internal system generated the event; possible values are "rest", "transport", "ip_filter" and "security_config_change"
    # "origin.address" the remote address and port of the first network hop, i.e. a REST proxy or another cluster node
    # "realm" name of a realm that has generated an "authentication_failed" or an "authentication_successful"; the subject is not yet authenticated
    # "url.path" the URI component between the port and the query string; it is percent (URL) encoded
    # "url.query" the URI component after the path and before the fragment; it is percent (URL) encoded
    # "request.method" the method of the HTTP request, i.e. one of GET, POST, PUT, DELETE, OPTIONS, HEAD, PATCH, TRACE, CONNECT
    # "request.body" the content of the request body entity, JSON escaped
    # "request.id" a synthetic identifier for the incoming request, this is unique per incoming request, and consistent across all audit events generated by that request
    # "action" an action is the most granular operation that is authorized and this identifies it in a namespaced way (internal)
    # "request.name" if the event is in connection to a transport message this is the name of the request class, similar to how rest requests are identified by the url path (internal)
    # "indices" the array of indices that the "action" is acting upon
    # "opaque_id" opaque value conveyed by the "X-Opaque-Id" request header
    # "trace_id" an identifier conveyed by the part of "traceparent" request header
    # "x_forwarded_for" the addresses from the "X-Forwarded-For" request header, as a verbatim string value (not an array)
    # "transport.profile" name of the transport profile in case this is a "connection_granted" or "connection_denied" event
    # "rule" name of the applied rule if the "origin.type" is "ip_filter"
    # the "put", "delete", "change", "create", "invalidate" fields are only present
    # when the "event.type" is "security_config_change" and contain the security config change (as an object) taking effect

    appender.audit_rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_audit-%d{yyyy-MM-dd}-%i.json.gz
    appender.audit_rolling.policies.type = Policies
    appender.audit_rolling.policies.time.type = TimeBasedTriggeringPolicy
    appender.audit_rolling.policies.time.interval = 1
    appender.audit_rolling.policies.time.modulate = true
    appender.audit_rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.audit_rolling.policies.size.size = 1GB
    appender.audit_rolling.strategy.type = DefaultRolloverStrategy
    appender.audit_rolling.strategy.fileIndex = nomax

    logger.xpack_security_audit_logfile.name = org.elasticsearch.xpack.security.audit.logfile.LoggingAuditTrail
    logger.xpack_security_audit_logfile.level = info
    logger.xpack_security_audit_logfile.appenderRef.audit_rolling.ref = audit_rolling
    logger.xpack_security_audit_logfile.additivity = false

    logger.xmlsig.name = org.apache.xml.security.signature.XMLSignature
    logger.xmlsig.level = error
    logger.samlxml_decrypt.name = org.opensaml.xmlsec.encryption.support.Decrypter
    logger.samlxml_decrypt.level = fatal
    logger.saml2_decrypt.name = org.opensaml.saml.saml2.encryption.Decrypter
    logger.saml2_decrypt.level = fatal

  • docker-compose.yml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    # docker network create elk
    version: "3.8"
    services:
    es8:
    image: elasticsearch:8.2.0
    container_name: es8
    ports:
    - "9200:9200"
    - "9300:9300"
    volumes:
    - ./es8/data:/usr/share/elasticsearch/data
    - ./es8/logs:/usr/share/elasticsearch/logs
    - ./es8/config:/usr/share/elasticsearch/config
    - ./es8/plugins:/usr/share/elasticsearch/plugins
    environment:
    - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    - discovery.type=single-node
    kibana8:
    image: kibana:8.2.0
    container_name: kibana8
    ports:
    - "5601:5601"
    volumes:
    - ./kibana8/config:/usr/share/kibana/config
    environment:
    ELASTICSEARCH_HOSTS: http://es8:9200
    I18N_LOCALE: zh-CN
    depends_on:
    - es8
    networks:
    default:
    external:
    ## 此处名称与上面创建的网络名称一致
    name: elk

1.7 集群版-Linux

1.8 集群版-DockerComponse

第二章 Elasticsearch入门

2.1 ES服务相关术语

  1. 集群(Cluster):包含多个节点,每个节点属于哪个集群是通过一个配置(集群名称,默认是elasticsearch)来决定的,集群的目的为了提供高可用和海量数据的存储以及更快的跨节点查询能力。

  2. 节点(Node):集群中的一个节点,节点也有一个名称(默认是随机分配的),节点名称很重要(在执行运维管理操作的时候),默认节点会去加入一个名称为“elasticsearch”的集群,如果直接启动一堆节点,那么它们会自动组成一个elasticsearch集群,当然一个节点也可以组成一个elasticsearch集群

  3. 索引(Index):包含一堆有相似结构的文档数据,一个index包含很多document。在关系型数据库中的概念类似于数据库;

  4. 类型(type):在ElasticSearch版本7以后废除了type概念,但是保留的type代表的意义,ElasticSearch7以后Type默认值是_doc;而Type的这个概念任然保持一致,都是用来表示索引中文档的基本模型,在关系型数据库中的概念类似数据表,意思是你个索引(数据库)中包含多个Type(数据表),但是在版本7以后一个索引中只包含一个Type(_doc)

  5. 文档(Document):是ElasticSearch中最小数据单元,一个document代表一条数据,可以理解为关系型数据库中一行数据;

  6. 字段(field):一个document里面有多个field,每个field就是一个数据字段,可以理解为关系型数据库中一行数据中的每一列;

  7. 映射(mapping):非常类似于静态语言中的数据类型或者关系型数据库的表结构。mapping还有一些其他的含义,mapping不仅告诉ES一个field中是什么类型的值, 它还告诉ES如何索引数据以及数据是否能被搜索到。

  8. 分片(shard):也称 Primary Shard,单台机器无法存储大量数据,es可以将一个索引中的数据切分为多个shard,分布在多台服务器上存储。有了shard就可以横向扩展,存储更多数据,让搜索和分析等操作分布到多台服务器上去执行,提升吞吐量和性能。每个shard都是一个lucene index。primary shard(建立索引时一次设置,不能修改,默认5个)。

  9. 副本(Replica):任何一个服务器随时可能故障或宕机,此时shard可能就会丢失,因此可以为每个shard创建多个replica副本。replica可以在shard故障时提供备用服务,保证数据不丢失,多个replica还可以提升搜索操作的吞吐量和性能。replica shard(随时修改数量,默认1个)。

  10. 正向索引和倒排索引:索引的最终目的都是通过搜索的关键字检索到关键字对应的完整数据;

    • 正向索引:正向索引的检索过程首先将完整数据的关键数据进行分词,然后通过搜索关键字判断哪些分词中包含有搜索关键字,然后将搜索关键字对应的数据唯一值返回,最后通过这个唯一值查找到完整数据;

      document的唯一值 对数据进行分词,并且使用“中国”查询数据
      1001 我爱中国 -》 我、爱、中国
      1002 我是中国人-》我、是、中国、中国人
      1003 发展中的国家-》发展、发展中、国家
    • 倒排索引:倒排索引的索引方式做了改变,也是将数据中的关键数据进行分词,不同的是倒排索引将分词对应的数据的唯一值做映射,如果使用搜索关键字查询时候,如果匹配到分词也就可以得到对应的数据唯一值,然后通过这个唯一值查找到完整数据;比如用上面的三句话用倒排索引生成的结构如下:

      分词信息 数据唯一值
      1001、1002
      1001
      中国 1001、1002
      中国人 1002
      发展 1003
      发展中 1003
  11. RestFul风格

    • GET 请求:获取服务器中的对象,
    • POST 请求:在服务器上更新对象,
    • PUT 请求:在服务器上创建对象,PUT是幂等操,有些接口重复执行不会影响结果,有些接口重复操作会抛出异常
    • DELETE 请求:删除服务器中的对象
    • HEAD 请求:仅仅用于获取对象的基础信息

2.2 Elasticsearch数据类型

5aRcLT.png

1. 基本类型-字符串

  • text:需要被全文检索的字段需要使用text类型,text类型的字段会被分析并生成倒排索引,所以一般在mapping映射时候需要给text类型的字段设置分词器,或者text和keyword结合使用

    • 给text类型设置分词器

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      {
      "mappings": {
      "properties": {
      "name":{
      "type": "text",
      "analyzer": "ik_smart"
      }
      }
      }
      }
    • text和keyword结合使用

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      {
      "mappings": {
      "properties": {
      "name":{
      "type": "text",
      "analyzer": "ik_smart",
      "fields": {
      "keyword":{
      "type":"keyword"
      }
      }
      }
      }
      }
      }
  • keyword:这种类型适用于结构化的字段,不需要进行分词,可以被用来检索过滤、排序和聚合。例如标签、email 地址、网页地址、手机号码等等,这种类型的字段可以用作过滤、排序、聚合等。这种字符串也称之为 not-analyzed 字段

2. 基本类型-数值

  • 数值类型:byte、short、integer、long、float、half_float、scaled_float、double支持range范围查询

3. 基本类型-date

  • date格式可以在put mapping的时候用 format 参数指定,如果不指定的话,则启用默认格式,是”strict_date_optional_time||epoch_millis”。这表明只接合”strict_date_optional_time”格式的字符串值,或者long型数字。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    // 设置索引添加字段映射
    PUT /demo_data_date
    {
    "mappings": {
    "properties": {
    "create_date":{
    "type": "date"
    },
    "date_nacos":{
    "type": "date_nanos"
    }
    }
    }
    }

    // 保存 yyyy-MM-dd 格式
    PUT /demo_data_date/_doc/1
    {
    "create_date": "2020-11-11",
    "date_nacos":"2020-11-11"
    }

    // 保存 yyyy-MM-ddTHH:mm:ss.SSSZ 格式
    PUT /demo_data_date/_doc/2
    {
    "create_date": "2020-01-11T11:11:11Z",
    "date_nacos":"2020-01-11T11:11:11Z"
    }

    // 保存 毫秒级 时间戳
    PUT /demo_data_date/_doc/3
    {
    "create_date": "1604172099958",
    "date_nacos":"1604172099958"
    }

    实测,仅支持如下格式:

    • yyyy-MM-dd
    • yyyyMMdd
    • yyyyMMddHHmmss
    • yyyy-MM-ddTHH:mm:ss
    • yyyy-MM-ddTHH:mm:ss.SSS
    • yyyy-MM-ddTHH:mm:ss.SSSZ
    • 时间戳支持毫秒级
  • 通过format参数来显式指定es接受的date格式,多个date格式需用||分隔,查询时会一次依次匹配

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    GET /demo_data_date/_search
    {
    "query": {
    "range": {
    "create_date": {
    "gte": "2020-01-01 00:00:00",
    "lte": "2020-05-01 00:00:00",
    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    }
    }
    }
    }

4. 基本类型-date_nanos

  • date类型支持到毫秒,如果特殊情况下用到纳秒得用date_nanos这个类型

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    #创建一个index,其date字段是date_nanos类型,支持纳秒
    PUT my_index
    {
    "mappings": {
    "properties": {
    "date": {
    "type": "date_nanos"
    }
    }
    }
    }


    #和普通的date类型一样,可以存strict_date_optional_time||epoch_millis这些格式的
    #不过在es内部是存的长整型是纳秒单位的
    PUT my_index/_doc/1
    {
    "date": "2015-01-01"
    }

    #存一个具体到纳秒的值
    PUT my_index/_doc/2
    {
    "date": "2015-01-01T12:10:30.123456789Z"
    }

    #存的是整型,说明是秒,换成日期就是2015/1/1 8:0:0
    #但是在es内部,会以纳秒为单位的long类型存储
    PUT my_index/_doc/3
    {
    "date": 1420070400
    }

    GET my_index/_search
    {
    "sort": { "date": "asc"}
    }

5. 基本类型-binary

  • binary 类型接受一个以二进制值的base64编码的字符串。默认这个field是不会被存储,且不可以被搜索的

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    PUT demo_data_binary
    {
    "mappings": {
    "properties": {
    "name":{
    "type": "text"
    },
    "bin":{
    "type": "binary"
    }
    }
    }
    }

    PUT /demo_data_binary/_doc/1
    {
    "name":"test",
    "bin":"测试"
    }
  • binary类型不可以被搜索

    1
    2
    3
    4
    5
    6
    7
    8
    GET /demo_data_binary/_search
    {
    "query": {
    "match": {
    "bin": "测试"
    }
    }
    }

    响应结果:error报错

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    {
    "error" : {
    "root_cause" : [
    {
    "type" : "query_shard_exception",
    "reason" : "failed to create query: Field [bin] of type [binary] does not support match queries",
    "index_uuid" : "3JDGb0n3TCm4QK0l6Otnfg",
    "index" : "demo_data_binary"
    }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
    {
    "shard" : 0,
    "index" : "demo_data_binary",
    "node" : "h1LzZtdYT7aVoxOzQs2Vog",
    "reason" : {
    "type" : "query_shard_exception",
    "reason" : "failed to create query: Field [bin] of type [binary] does not support match queries",
    "index_uuid" : "3JDGb0n3TCm4QK0l6Otnfg",
    "index" : "demo_data_binary",
    "caused_by" : {
    "type" : "illegal_argument_exception",
    "reason" : "Field [bin] of type [binary] does not support match queries"
    }
    }
    }
    ]
    },
    "status" : 400
    }

5. 基本类型-范围类型

  • 定义范围类型的字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    PUT demo_data_range
    {
    "mappings": {
    "properties": {
    "expectedAttendees": {
    "type": "integer_range"
    },
    "time": {
    "type": "date_range",
    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    }
    }
    }
    }
  • 新增:新增范围类型字段的文档

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    PUT demo_data_range/_doc/1
    {
    "expectedAttendees": {
    "gte": 10,
    "lte": 20
    },
    "time": {
    "gte": "2019-12-01 12:00:00",
    "lte": "2019-12-02 17:00:00"
    }
    }
  • 查询:根据value关键字判断查询字段是否在范围类型

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    GET demo_data_range/_search
    {
    "query": {
    "term": {
    "expectedAttendees": {
    "value": 12
    }
    }
    }
    }
  • 查询:根据relation关键字判断范围关系参数:relation字段上的范围查询支持一个关系参数,该参数可以是WITHINCONTAINSINTERSECTS之一

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    GET demo_data_range/_search
    {
    "query": {
    "range": {
    "time": {
    "gte": "2019-12-01",
    "lte": "2019-12-02",
    "relation": "within"
    }
    }
    }
    }

6. 复杂类型-object

  • 复杂类型概述:Elasticsearch属于NoSQL文档类型数据存储,文档类型数据库管理关系型数据,支持类似SQL连接,并且对数据嵌套的关系结构具有特别好的支持。Elasticsearch中每个一条数据本质都是一个JSON格式的字符串,JSON格式本身就包含嵌套的层次关系,在Elasticsearch中根据不同的嵌套场景提供了三种嵌套关系的数据类型:①object、②nested、③join

  • object:单对象,即文档中某个属性对应的值也是JSON格式的结构,但是object类型只可以存储单对象的JSON结构,不可以存储对象数据,否则Elasticsearch在检索数据时候会将对象数据进行扁平化处理,在进行query或者aggregation时候查询数据异常

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    // 文档保存到ES时候的数据结构
    {
    "name": "zhang san",
    "skills": [
    {
    "language": "ruby",
    "level": "expert"
    },
    {
    "language": "javascript",
    "level": "beginner"
    }
    ]
    }
    // object类型的JSON数据会被被 Lucene 扁平化
    {
    "name": "zhang san",
    "skills.language" :["ruby", "javascript"],
    "skills.level": ["expert", "beginner"]
    }

    object数据查询失效案例:

    1. 新建索引,设置字段类型为object

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      PUT demo_data_obj
      {
      "mappings": {
      "properties": {
      "name":{
      "type": "text"
      },
      "skills":{
      "type": "object"
      }
      }
      }
      }
    2. 添加数据

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      POST demo_data_obj/_doc/1
      {
      "name": "zhang san",
      "skills": [
      {
      "language": "ruby",
      "level": "expert"
      },
      {
      "language": "javascript",
      "level": "beginner"
      }
      ]
      }

      POST demo_data_obj/_doc/2
      {
      "name": "li si",
      "skills": [
      {
      "language": "ruby",
      "level": "beginner"
      }
      ]
      }
    3. 查询language=ruby并且level=beginner的数据,根据保存进去的数据,应该只能查询到id=2的数据

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      GET demo_data_obj/_search
      {
      "query": {
      "bool": {
      "filter": [
      {
      "match": {
      "skills.language": "ruby"
      }
      },
      {
      "match": {
      "skills.level": "beginner"
      }
      }
      ]
      }
      }
      }

      查询结果

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
      },
      "hits" : {
      "total" : {
      "value" : 2,
      "relation" : "eq"
      },
      "max_score" : 0.0,
      "hits" : [
      {
      "_index" : "demo_data_obj",
      "_id" : "1",
      "_score" : 0.0,
      "_source" : {
      "name" : "zhang san",
      "skills" : [
      {
      "language" : "ruby",
      "level" : "expert"
      },
      {
      "language" : "javascript",
      "level" : "beginner"
      }
      ]
      }
      },
      {
      "_index" : "demo_data_obj",
      "_id" : "2",
      "_score" : 0.0,
      "_source" : {
      "name" : "li si",
      "skills" : [
      {
      "language" : "ruby",
      "level" : "beginner"
      }
      ]
      }
      }
      ]
      }
      }

8. 复杂类型-nested

  • nested :对象数组,嵌套nested数据类型能够让我们对 object 数组建立索引,并且分别进行查询。如果需要维护数组中每个对象的关系,请使用 nested 数据类型

    nested类型查询案例:修改object案例时候的字段类型

    1. 新建索引,设置字段类型为nested

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      PUT demo_data_nes
      {
      "mappings": {
      "properties": {
      "name":{
      "type": "text"
      },
      "skills":{
      "type": "nested"
      }
      }
      }
      }
    2. 添加相同的数据

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      POST demo_data_obj/_doc/1
      {
      "name": "zhang san",
      "skills": [
      {
      "language": "ruby",
      "level": "expert"
      },
      {
      "language": "javascript",
      "level": "beginner"
      }
      ]
      }

      POST demo_data_obj/_doc/2
      {
      "name": "li si",
      "skills": [
      {
      "language": "ruby",
      "level": "beginner"
      }
      ]
      }
    3. 查询nested类型的数据,重置查询language=ruby并且level=beginner的数据,根据保存进去的数据,应该只能查询到id=2的数据

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      GET demo_data_nes/_search
      {
      "query": {
      "nested": {
      "path": "skills",
      "query": {
      "bool": {
      "filter": [
      {
      "match": {
      "skills.language": "ruby"
      }
      },
      {
      "match": {
      "skills.level": "beginner"
      }
      }
      ]
      }
      }
      }
      }
      }

      查询结果,符合预期

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
      },
      "hits" : {
      "total" : {
      "value" : 1,
      "relation" : "eq"
      },
      "max_score" : 0.0,
      "hits" : [
      {
      "_index" : "demo_data_nes",
      "_id" : "2",
      "_score" : 0.0,
      "_source" : {
      "name" : "li si",
      "skills" : [
      {
      "language" : "ruby",
      "level" : "beginner"
      }
      ]
      }
      }
      ]
      }
      }

8. 地理类型

9. 特殊类型

2.3 分词器

1. 分词器概述

​ ElasticSearch是基于Lucene构建的分布式搜索引擎,而分词是词Search是一个构建于Lucene之上的优秀的分布式全文检索引擎(服务器);ElasticSearch倒排索引的第一步需要将需要被全文检索的内容拆分为可检索的关键字,不同的分词器就会有不同的拆分结果,会直接影响ElasticSearch的检索效率和精确度;总结来说:分词就是把全文本转换成一系列单词(term/token)的过程,也叫文本分析。在 ES 中,Analysis 是通过分词器(Analyzer) 来实现的,可使用 ES 内置的分析器或者按需定制化分析器。

​ 分词器主要有三部分组成,三个部分是有顺序的,从上到下依次经过 Character FiltersTokenizer 以及 Token Filters,这个顺序比较好理解,一个文本进来肯定要先对文本数据进行处理,再去分词,最后对分词的结果进行过滤。

  • Character Filters:针对原始文本处理,比如去除 html 标签
  • Tokenizer:按照规则切分为单词,比如按照空格切分
  • Token Filters:将切分的单词进行加工,比如大写转小写,删除 stopwords,增加同义语

2. 常用分词器

  • 分词关键关键字

    • analyzer指定需要的分词器;
    • text指定要拆分的文本
  • 分词结果关键字

    • token 为分词结果;
    • start_offset 为起始偏移;
    • end_offset 为结束偏移;
    • position 为分词位置
  • standard:默认分词器,按词切分,小写处理,默认的 stopwords 是关闭的。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    POST /_analyze
    {
    "analyzer": "standard",
    "text": "I bought a computer,8761元"
    }
    // 分词结果
    {
    "tokens": [
    {
    "token": "i",
    "start_offset": 0,
    "end_offset": 1,
    "type": "<ALPHANUM>",
    "position": 0
    },
    {
    "token": "bought",
    "start_offset": 2,
    "end_offset": 8,
    "type": "<ALPHANUM>",
    "position": 1
    },
    {
    "token": "a",
    "start_offset": 9,
    "end_offset": 10,
    "type": "<ALPHANUM>",
    "position": 2
    },
    {
    "token": "computer",
    "start_offset": 11,
    "end_offset": 19,
    "type": "<ALPHANUM>",
    "position": 3
    },
    {
    "token": "8761",
    "start_offset": 20,
    "end_offset": 24,
    "type": "<NUM>",
    "position": 4
    },
    {
    "token": "元",
    "start_offset": 24,
    "end_offset": 25,
    "type": "<IDEOGRAPHIC>",
    "position": 5
    }
    ]
    }
  • simple:按照非字母切分(符号被过滤),小写处理

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    POST /_analyze
    {
    "analyzer": "simple",
    "text": "I bought a computer,8761元"
    }
    // 分词结果
    {
    "tokens": [
    {
    "token": "i",
    "start_offset": 0,
    "end_offset": 1,
    "type": "word",
    "position": 0
    },
    {
    "token": "bought",
    "start_offset": 2,
    "end_offset": 8,
    "type": "word",
    "position": 1
    },
    {
    "token": "a",
    "start_offset": 9,
    "end_offset": 10,
    "type": "word",
    "position": 2
    },
    {
    "token": "computer",
    "start_offset": 11,
    "end_offset": 19,
    "type": "word",
    "position": 3
    },
    {
    "token": "元",
    "start_offset": 24,
    "end_offset": 25,
    "type": "word",
    "position": 4
    }
    ]
    }
  • whitespace:按照空格切分,不转小写

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    POST /_analyze
    {
    "analyzer": "whitespace",
    "text": "I bought a computer,8761元"
    }
    // 分词结果
    {
    "tokens": [
    {
    "token": "I",
    "start_offset": 0,
    "end_offset": 1,
    "type": "word",
    "position": 0
    },
    {
    "token": "bought",
    "start_offset": 2,
    "end_offset": 8,
    "type": "word",
    "position": 1
    },
    {
    "token": "a",
    "start_offset": 9,
    "end_offset": 10,
    "type": "word",
    "position": 2
    },
    {
    "token": "computer,8761元",
    "start_offset": 11,
    "end_offset": 25,
    "type": "word",
    "position": 3
    }
    ]
    }
  • ik_smart: 会根据词库进行标准分词,下载地址:下载解压到plugins目录并重启es

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    POST /_analyze
    {
    "analyzer": "ik_smart",
    "text": "我买了一台计算机"
    }
    // 分词结果
    {
    "tokens": [
    {
    "token": "我",
    "start_offset": 0,
    "end_offset": 1,
    "type": "CN_CHAR",
    "position": 0
    },
    {
    "token": "买了",
    "start_offset": 1,
    "end_offset": 3,
    "type": "CN_WORD",
    "position": 1
    },
    {
    "token": "一台",
    "start_offset": 3,
    "end_offset": 5,
    "type": "CN_WORD",
    "position": 2
    },
    {
    "token": "计算机",
    "start_offset": 5,
    "end_offset": 8,
    "type": "CN_WORD",
    "position": 3
    }
    ]
    }
  • ik_max_word:会根据词库列表所有的分词结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    POST /_analyze
    {
    "analyzer": "ik_max_word",
    "text": "我买了一台计算机"
    }
    // 分词结果
    {
    "tokens": [
    {
    "token": "我",
    "start_offset": 0,
    "end_offset": 1,
    "type": "CN_CHAR",
    "position": 0
    },
    {
    "token": "买了",
    "start_offset": 1,
    "end_offset": 3,
    "type": "CN_WORD",
    "position": 1
    },
    {
    "token": "一台",
    "start_offset": 3,
    "end_offset": 5,
    "type": "CN_WORD",
    "position": 2
    },
    {
    "token": "一",
    "start_offset": 3,
    "end_offset": 4,
    "type": "TYPE_CNUM",
    "position": 3
    },
    {
    "token": "台",
    "start_offset": 4,
    "end_offset": 5,
    "type": "COUNT",
    "position": 4
    },
    {
    "token": "计算机",
    "start_offset": 5,
    "end_offset": 8,
    "type": "CN_WORD",
    "position": 5
    },
    {
    "token": "计算",
    "start_offset": 5,
    "end_offset": 7,
    "type": "CN_WORD",
    "position": 6
    },
    {
    "token": "算机",
    "start_offset": 6,
    "end_offset": 8,
    "type": "CN_WORD",
    "position": 7
    }
    ]
    }

3. 分词器使用

  • 分词器有索引时的分词器及搜索时的分词器,可以在mapping中设置,索引分词器采用analyzer进行设置,搜索分词器采用search_analyzer设置,如果没有设置分词器,则索引和搜索分词器都用默认的standard分词器,如果只是设置索引分词器没有设置搜索分词器,则搜索分词器也采用索引分词器,如果analyzer和search_analyzer都设置则使用各自设置的分词器。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    // 设置mapping - 定义analyzer索引分词,和搜索分词器使用同一个
    {
    "mappings":{
    "properties":{
    "title":{
    "type":"text",
    "analyzer":"ik_smart"
    }
    }
    }
    }
    // 设置mapping - 分别定义analyzer索引分词 和 search_analyzer搜索分词器
    {
    "mappings":{
    "properties":{
    "title":{
    "type":"text",
    "analyzer":"english",
    "search_analyzer":"standard"
    }
    }
    }
    }
    // 查询时候指定分词器
    {
    "query": {
    "match": {
    "title":{
    "query" : "洗衣液",
    "analyzer" : "standard"
    }
    }
    }
    }

3. 扩展IK词库

  • 下载并解压ik分词器到ElasticSearch的home下的plugins文件夹:
  • 在ES的安装路径下找到配置目录custom(如果没有就mkdir),创建用户自定义的词典myTest.dic
  • ES词典的配置文件为IKAnalyzer.cfg.xml。编辑该文件,加入我们自定义的词典
  • 重启ES

4. ik分词器

​ IK分词器是ES的一个插件,主要用于把一段中文或者英文的划分成一个个的关键字,我们在搜索时候会把自己的信息进行分词,会把数据库中或者索引库中的数据进行分词,然后进行一个匹配操作,默认的中文分词器是将每个字看成一个词,比如”我爱技术”会被分为”我”,”爱”,”技”,”术”,这显然不符合要求,所以我们需要安装中文分词器IK来解决这个问题;IK提供了两个分词算法:ik_smart和ik_max_word

  • ik_smart为最少切分,添加了歧义识别功能,

    1
    2
    3
    4
    5
    GET /_analyze
    {
    "analyzer": "ik_smart",
    "text": "买一台笔记本电脑"
    }

    分词结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    {
    "tokens" : [
    {
    "token" : "买一",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "CN_WORD",
    "position" : 0
    },
    {
    "token" : "台",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "COUNT",
    "position" : 1
    },
    {
    "token" : "笔记本电脑",
    "start_offset" : 3,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 2
    }
    ]
    }

  • ik_max_word为最细切分,能切分的都会被切分;

    1
    2
    3
    4
    5
    GET /_analyze
    {
    "analyzer": "ik_max_word",
    "text": "买一台笔记本电脑"
    }

    响应结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    {
    "tokens" : [
    {
    "token" : "买一",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "CN_WORD",
    "position" : 0
    },
    {
    "token" : "一台",
    "start_offset" : 1,
    "end_offset" : 3,
    "type" : "CN_WORD",
    "position" : 1
    },
    {
    "token" : "一",
    "start_offset" : 1,
    "end_offset" : 2,
    "type" : "TYPE_CNUM",
    "position" : 2
    },
    {
    "token" : "台笔",
    "start_offset" : 2,
    "end_offset" : 4,
    "type" : "CN_WORD",
    "position" : 3
    },
    {
    "token" : "台",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "COUNT",
    "position" : 4
    },
    {
    "token" : "笔记本电脑",
    "start_offset" : 3,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 5
    },
    {
    "token" : "笔记本",
    "start_offset" : 3,
    "end_offset" : 6,
    "type" : "CN_WORD",
    "position" : 6
    },
    {
    "token" : "笔记",
    "start_offset" : 3,
    "end_offset" : 5,
    "type" : "CN_WORD",
    "position" : 7
    },
    {
    "token" : "本",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "CN_CHAR",
    "position" : 8
    },
    {
    "token" : "电脑",
    "start_offset" : 6,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 9
    }
    ]
    }

2.4 Elasticsearch映射

1. Mapping概述

​ ElasticSearch中的数据是以JSON文档的格式存储在索引中,ES中文档的格式称为Type;为提高ES检索性能,Type中字段对应的值会处理成特定的数据类型,这些数据类型与文档的对应关系就是Type的Mapping;在早期的版本中一个索引下可以添加多个Type类型的文档;从7.0开始,一个索引只有一个Type,也可以说一个 Type 有一个 Mapping 定义(可以理解为MySQL的表结构,用来约束字段的数据类型);ES中Mapping的作用如下:

  • 定义索引中的字段的名称以及字段对应的数据类型,日期的格式等等;
  • 字段的倒排索引的方式,或者设置是否可以被索引;
  • 自定义规则,用于控制动态添加字段的映射

​ mapping有三种不同的特性,即设置mapping的dynamic属性的三种取值:①当 dynamic 设置为 true 时,这个文档可以被索引进 ES,这个字段也可以被索引,也就是这个字段可以被搜索,Mapping 也同时被更新;②当 dynamic 被设置为 false 时候,存在新增字段的数据写入,该数据可以被索引,但是新增字段被丢弃;③当设置成 strict 模式时候,数据写入直接出错。

true false strict
文档可索引
字段可索引
mapping可更新

​ ES的Type映射方式有两中:①mapping:类似于数据库的schema的定义,mapping会把文档映射成lucene需要的扁平格式,一个mapping属于一个索引的type,一个type中有一个mapping定义;②dynamic mapping:写入文档的时候,索引不存在,会自动创建索引, 无需手动创建,ES会根据内容推断字段的类型,推断会不准确,可能造成某些功能无法使用,例如 范围查询。

2. 动态映射

  • 类型的自动识别关系

    JSON类型 ElasticSearch dynamic mapping
    字符串 匹配日期格式,设置为date 匹配数字,设置为float或者long,功能默认关闭 设置为text,并增加keyword子字段
    布尔值 boolean
    浮点数 float
    整数 long
    对象 object
    数组 由第一个非空数值的类型决定
    空值 忽略
  • 查看索引的映射

    1
    GET http://ip:port/索引名称/_mapping
  • 关闭动态映射

    1
    2
    3
    4
    PUT /_settings 
    {
    "index.mapper.dynamic":false
    }

3. mapping

  • 每个索引都拥有唯一的 mapping type,用来决定文档将如何被索引。mapping type由下面两部分组成

    • Meta-fields:元字段用于自定义如何处理文档的相关元数据。 元字段的示例包括文档的_index_type_id_source字段。
    • Fields or properties:映射类型包含与文档相关的字段或属性的列表。
  • mapping

    • 初始化mapping:在新增索引的时候添加mapping

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      PUT /twitter
      {
      "mappings": {
      "properties": {
      "message": {
      "type": "text"
      }
      }
      }
      }
    • 新增字段:为索引增加新的mapping,对fields的映射进行设置

      1
      2
      3
      4
      5
      6
      7
      8
      PUT /twitter/_mapping
      {
      "properties": {
      "name": {
      "type": "keyword"
      }
      }
      }
    • 修改字段:mapping在建好之后不可以更改字段类型,但是可以通过重建索引和索引别名完成索引的字段重建

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      # 原索引
      PUT my_index
      {
      "mappings": {
      "properties": {
      "create_date": {
      "type": "date",
      "format": "yyyy-MM-dd ||yyyy/MM/dd"
      }
      }
      }
      }
      # 创建新索引,重置原索引的字段
      PUT my_index2
      {
      "mappings": {
      "properties": {
      "create_date": {
      "type": "text"
      }
      }
      }
      }
      # 同步数据
      POST _reindex
      {
      "source": {
      "index": "my_index"
      },
      "dest": {
      "index": "my_index2"
      }
      }
      # 删除原索引
      DELETE my_index
      # 设置新索引别名
      POST /_aliases
      {
      "actions": [
      {"add": {"index": "my_index2", "alias": "my_index"}}
      ]
      }

第三章 Elasticsearch基础操作

3.1 ES服务查询

1. 查询ES服务信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET /
{
"name" : "8eaa5a50b54d",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "cyhnwUkJR7eCj2Hb01r88Q",
"version" : {
"number" : "8.0.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "801d9ccc7c2ee0f2cb121bbe22ab5af77a902372",
"build_date" : "2022-02-24T13:55:40.601285296Z",
"build_snapshot" : false,
"lucene_version" : "9.0.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}

2. _cat指令使用参数

&是参数连接符,可以多个参数一起使用

参数 说明 案例
v(verbose) 显示指令的详细信息 GET _cat/health?v
help 示指令返回参数的说明 GET _cat/health?help
h(header) 选择要显示的列 GET _cat/count?h=timestamp,count
format 设置返回内容的格式 GET _cat/master?format=json - 支持json,yaml,text,smile,cbor
s(sort) 排序 GET _cat/indices?s=store.size:desc

4. _cat指令结果说明

指令 说明 案例
indices 查看索引信息 GET _cat/indices?v - health:索引的健康状态 - status:索引的开启状态 - index:索引名字 - uuid:索引的uuid - pri:索引的主分片数量 - rep:索引的复制分片数量 - docs.count:索引下的文档总数 - docs.deleted:索引下删除状态的文档数 - store.size:主分片+复制分片的大小 - pri.store.size:主分片的大小
plugins 显示每个运行插件节点的视图 GET _cat/plugins?v - name:节点名称 - component:插件名称 - version:插件版本
shards 查看分片信息 GET _cat/shards?v - index:索引名称 - shard:分片序号 - prirep:分片类型,p主,r复 - state:分片状态 - docs:该分片存放的文档数量 - store:该分片占用的存储空间大小 - ip:该分片所在的服务器ip - node:该分片所在的节点名称
allocation 显示每个节点分片数量、占用空间 GET _cat/allocation?v - shards:节点承载的分片数量 - disk.indices:索引占用的空间大小 - disk.used:已使用的磁盘空间大小 - disk.avail:节点可用空间大小 - disk.total:节点总空间大小 - disk.percent:节点磁盘占用百分比 - host:节点的host地址 - ip:节点的ip地址 - node:节点名称
thread_pool 查看线程池信息 GET _cat/thread_pool?v - node_name:节点名称 - name:线程池名称 - active:活跃线程数量 - queue:当前队列中的任务数 - rejected:被拒绝的任务数
aliases 显示别名、过滤器、路由信息 GET _cat/aliases?v - alias:别名 - iindex:索引别名指向 - filter:过滤规则 - routing.index:索引路由 - routing.search:搜索路由
count 显示索引文档数量 GET _cat/count?v - epoch:自标准时间以来的秒数 - timestamp:时间 - count:文档总数
health 查看集群健康状况 GET _cat/health?v - epoch:自标准时间以来的秒数 - timestamp:时间 - cluster:集群名称 - status:集群状态 - node.total:节点总数 - node.data:数据节点总数 - shards:分片总数 - pri:主分片总数 - repo:复制节点的数量 - init: 初始化节点的数量 - unassign:未分配分片的数量 - pending_tasks:待定任务数 - max_task_wait_time:最长任务等待时间 - active_shards_percent:活动分片百分比
master 显示master节点信息 GET _cat/master?v - id:节点ID - host:主机名称 - ip:主机IP - node:节点名称
nodeattrs 显示node节点属性 GET _cat/nodeattrs?v - node:节点名称 - host:主机地址 - ip:主机ip - attr:属性描述 - value:属性值
nodes 显示node节点信息 GET _cat/nodes?v - ip:node节点的IP - heap.percent:堆内存占用百分比 - ram.percent:内存占用百分比 - cpu:CPU占用百分比 - load_1m:1分钟的系统负载 - load_5m:5分钟的系统负载 - load_15m:15分钟的系统负载 - node.role:node节点的角色 - master:是否是master节点 - name:节点名称
pending_tasks 显示正在等待的任务 GET _cat/pending_tasks?v - insertOrder:任务插入顺序 - timeInQueue:任务排队了多长时间 - priority:任务优先级 - source:任务源
recovery 显示索引碎片恢复的视图 GET _cat/recovery?v - index:索引名称 - shard:分片名称 - time:恢复时间 - type:恢复类型 - stage:恢复阶段 - source_host:源主机 - source_node:源节点名称 - target_host:目标主机 - target_node:目标节点名称 - repository:仓库 - snapshot:快照 - files:要恢复的文件数 - files_recovered:已恢复的文件数 - files_percent:恢复文件百分比 - files_total:文件总数 - bytes:要恢复的字节数 - bytes_recovered:已恢复的字节数 - bytes_percent:恢复字节百分比 - bytes_total:字节总数 - translog_ops:要恢复的数 - translog_ops_recovered:已恢复的数 - translog_ops_percent:恢复的百分比
segments 显示碎片中的分段信息 GET _cat/segments?v - index:索引名称 - shard:分片名称 - prirep:主分片还是副本分片 - ip:所在节点IP - segment:segments段名 - generation:分段生成 - docs.count:段中的文档树 - docs.deleted:段中删除的文档数 - size:段大小,以字节为单位 - size.memory:段内存字节大小 - committed:段是否已提交 - searchable:段是否可搜索 - version:版本 - compound:compound模式

3.2 ES索引操作

1. 新建索引

  • 创建索引:使用PUT方式请求表示该请求具有幂等性,同样的请求只有发送一次,否则会报错

  • 创建空索引:索引配置使用默认的配置值

    1
    PUT /索引名称
  • 禁止自动创建索引:在全局配置文件 elasticsearch.yml 中

    • 如果直接执行新增文档的操作,默认会直接创建这个索引;并且type字段也会自动创建。也就是说,ES并不需要像传统的数据库事先定义表的结构。
    • 每个索引中的类型都有一个mapping映射,这个映射是动态生成的,因此当增加新的字段时,会自动增加mapping的设置。
    • 通过在配置文件中设置action.auto_create_index为false,可以关闭自动创建index这个功能。
    • 也可以设置黑名单或者白名单,比如:设置action.auto_create_index为+aaa*,-bbb*+号意味着允许创建aaa开头的索引,-号意味着不允许创建bbb开头的索引。
    1
    action.auto_create_index: false
  • 创建索引并设置分片和副本

    1
    2
    3
    4
    5
    6
    7
    PUT /索引名称
    {
    "settings":{
    "number_of_shards": 分片数量,
    "number_of_replicas": 副本数量
    }
    }
  • 创建指定名称的索引并设置索引mapping:详细用法参考mapping

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    PUT /索引名称
    {
    "mapping":{
    "properties":{
    "<字段>":{
    "type":"字段的数据类型"
    }
    }
    }
    }

2. 查询索引

  • 索引相关信息查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    // 查看所有索引
    GET /_cat/indices

    // 查看所有索引完整信息
    GET /_all
    // 查看单个索引完整信息
    GET /user_empty
    // 查看多个索引完整信息
    GET /user_empty,user_setting
    // 查看集群中所有索引的setting信息
    GET /_all/_settings
    // 查看集群中所有索引的mapping信息
    GET /_all/_mapping

    // 查看索引的setting信息
    GET /user_setting/_settings

    // 查看索引的mapping信息
    GET /user_mapping/_mapping

    // 查看多个索引的settings信息
    GET /user_empty,user_mapping/_settings
    // 查看多个索引的mapping信息
    GET /user_empty,user_mapping/_mapping

3. 编辑索引

  • 修改索引的副本数:分片数据创建后不能修改

    1
    2
    3
    4
    5
    6
    PUT /<索引名称>/_settings
    {
    "settings": {
    "number_of_replicas": 2
    }
    }
  • 增加索引字段:已定义好的字段不允许被修改

    1
    2
    3
    4
    5
    6
    7
    8
    PUT /<索引名称>/_mapping
    {
    "properties":{
    "字段名称":{
    "type":"自动类型"
    }
    }
    }

4. 删除索引

  • 删除索引

    1
    2
    3
    4
    DELETE /<索引名称>
    {
    "acknowledged" : true
    }

5. 索引打开和关闭

索引关闭以后就几乎不会占用系统资源

  • 索引关闭相关操作

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # 关闭单个索引
    POST /索引名称/_close

    # 关闭多个索引
    POST /索引名称1,索引名称2/_close

    # 关闭所以并添加ignore_unavailable参数:如果关闭不存在索引,设置是否抛异常
    POST /索引名称1,索引名称2/_close?ignore_unavailable=true

    # 关闭集群中所有索引
    POST /_all/_close

    # 使用通配符关闭索引
    POST /test*/_close

6. 索引别名

  • 创建索引别名

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    POST /_aliases
    {
    "actions": [
    {
    "add": {
    "index": "索引名称",
    "alias": "索引名称对应的别名"
    }
    }
    ]
    }
  • 移除索引别名

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    POST /_aliases
    {
    "actions": [
    {
    "remove": {
    "index": "索引名称",
    "alias": "索引名称对应的别名"
    }
    }
    ]
    }
  • 查看索引的别名

    1
    GET /索引名称/_aliases
  • 查看集群中所有可用别名

    1
    2
    3
    GET /_all/_aliases

    GET /_aliases

3.3 ES文档操作

1. document详解

  • document核心元数据

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {
    "_index": "music",
    "_type": "children",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
    // ... ...
    }
    }
    1. index元数据:代表这个document存放在哪个index中,名称小写,**`不能以’‘, ‘-‘, 或 ‘+’开头`**。
    2. _type元数据:ES 6.0.0之后一个index下面只能有一个type,高版本ES没有这个字段了;
    3. _id元数据:document的唯一标识,与index一起唯一标识和定位一个document,可以手动指定,也可以由ES自动创建。
    4. _version元数据:ES内部使用乐观锁对document的写操作进行控制,version版本号最初是1,更新操作成功后自动+1。
    5. _source元数据:真正的ES需要存储的数据;

2. 新增文档

  • 新增:文档在es服务中的唯一标识(_index索引 _type类型 _id 主键自动生成)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    # 自动生成_id
    POST /book/_doc
    {
    "name": "盘龙",
    "author": "土豆",
    "count": "652354457",
    "onSale": "2020-12-23",
    "descr": "为什么不能修改一个字段的type?原因是一个字段的类型修改以后,那么该字段的所有数据都需要重新索引。Elasticsearch底层使用的是lucene库,字段类型修改以后索引和搜索要涉及分词方式等操作,不允许修改类型在我看来是符合lucene机制的。"
    }
  • 新增 手动指定ID

    1
    2
    3
    4
    5
    6
    7
    8
    POST /book/_doc/1
    {
    "name": "凡人修仙",
    "author": "天蚕",
    "count": "652354457",
    "onSale": "2020-12-23",
    "descr": "我欲成仙"
    }

3. 修改文档

  • 覆盖式修改 如果ID存在值则字段全量更新

    1
    2
    3
    4
    5
    6
    7
    POST /book/_doc/1
    {
    "name": "凡人修仙",
    "author": "天蚕",
    "count": "652354457",
    "descr": "快乐七天"
    }
  • 懒修改

    1
    2
    3
    4
    5
    6
    7
    POST /book/_doc/1/_update
    {
    "doc":{
    "count": "652354457",
    "onSale": "2020-12-23"
    }
    }

4. 删除文档

  • 根据ID删除文档

    1
    DELETE /book/_doc/87bgvnMBZnIOh5hMwAW5

3.4 ES查询准备

  1. 数据结构准备:具备各种数据类型

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    PUT /demo_query01
    {
    "mappings": {
    "properties": {
    "code": {
    "type": "keyword"
    },
    "name": {
    "type": "text",
    "analyzer": "ik_smart",
    "fields": {
    "keyword": {
    "type": "keyword"
    }
    }
    },
    "title": {
    "type": "text",
    "analyzer": "ik_smart"
    },
    "age": {
    "type": "integer"
    },
    "price": {
    "type": "double"
    },
    "card": {
    "type": "object",
    "properties": {
    "code": {
    "type": "keyword"
    },
    "cardType": {
    "type": "keyword"
    }
    }
    },
    "address": {
    "type": "nested",
    "properties": {
    "province": {
    "type": "keyword"
    },
    "city": {
    "type": "keyword"
    }
    }
    },
    "rangeInt": {
    "type": "integer_range"
    },
    "rangeTime": {
    "type": "date_range",
    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    },
    "createTime": {
    "type": "date",
    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    }
    }
    }
    }
  2. 初始化基本数据

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    http://127.0.0.1:9200/demo_query01/_doc/1
    {
    "code": "hl002",
    "name": "韩立",
    "title": "韩立已经到达天南溪国,梅凝错失姻缘抱憾终生!",
    "age": 65,
    "price": 3411.63,
    "card": {
    "code": "8237923",
    "catdType": "SFZ"
    },
    "address": [
    {
    "province": "玄武国",
    "city": "天南市"
    },
    {
    "province": "乱星海",
    "city": "小环岛"
    }
    ],
    "rangeInt": {
    "gte": 10,
    "lte": 40
    },
    "rangeTime": {
    "gte": "2019-12-01 12:00:00",
    "lte": "2019-12-02 17:00:00"
    },
    "createTime": "2022-02-02 17:00:00"
    }
    http://127.0.0.1:9200/demo_query01/_doc/2
    {
    "code": "lxd001",
    "name": "小刘学生",
    "title": "为了让悟空脱离低级趣味,佛祖究竟花了多少经费?",
    "age": 34,
    "price": 34.63,
    "card": {
    "code": "710008",
    "catdType": "JSZ"
    },
    "address": [
    {
    "province": "陕西省",
    "city": "西安市"
    },
    {
    "province": "陕西省",
    "city": "延安市"
    }
    ],
    "rangeInt": {
    "gte": 10,
    "lte": 40
    },
    "rangeTime": {
    "gte": "2019-12-01 12:00:00",
    "lte": "2019-12-02 17:00:00"
    },
    "createTime": "2022-02-02 17:00:00"
    }
    http://127.0.0.1:9200/demo_query01/_doc/3
    {
    "code": "huz001",
    "name": "黑胡子",
    "title": "黑胡子如何夺取震震果实?恶魔果实力量有什么奥秘?",
    "age": 34,
    "price": 3334.63,
    "card": {
    "code": "745008",
    "catdType": "JSZ"
    },
    "address": [
    {
    "province": "浙江省",
    "city": "杭州市"
    },
    {
    "province": "陕西省",
    "city": "榆林市"
    }
    ],
    "rangeInt": {
    "gte": 340,
    "lte": 400
    },
    "rangeTime": {
    "gte": "2012-12-01 12:00:00",
    "lte": "2015-12-02 17:00:00"
    },
    "createTime": "2022-01-02 17:00:00"
    }
    http://127.0.0.1:9200/demo_query01/_doc/4
    {
    "code": "huz001",
    "name": "二狗子",
    "title": "狗生无憾了!土狗独自跨越10000公里,从长沙到欧洲和两年没见的主人团聚",
    "age": 14,
    "price": 234.63,
    "card": {
    "code": "133008",
    "catdType": "SFZ"
    },
    "address": [
    {
    "province": "甘肃省",
    "city": "天水市"
    },
    {
    "province": "四川省",
    "city": "成都市"
    }
    ],
    "rangeInt": {
    "gte": 670,
    "lte": 7400
    },
    "rangeTime": {
    "gte": "2018-12-01 12:00:00",
    "lte": "2022-12-02 17:00:00"
    },
    "createTime": "2022-06-02 12:00:00"
    }

第四章 Elasticsearch查询

4.1 基础查询

1. query查询

将查询参数添加到查询的url后

1
GET /索引名称/_search?q=key:查询的值&...

2. 主键查询

1
GET /索引名称/_doc/查询ID

3. match_all:查询所有

1
2
3
4
5
6
GET product_order/_search
{
"query": {
"match_all": {}
}
}

响应结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
// 响应结果
]
}
}

4. match:全文检索

  • 全文检索:用于text类型的字段

    1
    2
    3
    4
    5
    6
    7
    8
    http://127.0.0.1:9200/demo_query01/_search
    {
    "query": {
    "match": {
    "name": "小刘"
    }
    }
    }

    响应结果:hits中是查询结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    {
    "took": 4,
    "timed_out": false,
    "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 2,
    "relation": "eq"
    },
    "max_score": 1.2667098,
    "hits": [

    ]
    }
    }

5. term:精确查找

  • 精确查找:只可以查找keyword类型的字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    POST /demo_query01/_search
    {
    "query": {
    "term": {
    "name.keyword": {
    "value": "黑胡子"
    }
    }
    }
    }

    响应结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    {
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.2039728,
    "hits" : [
    ]
    }
    }

6. terms:多值精确查询

  • 多值精确查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    POST /demo_query01/_search
    {
    "query": {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    }
    }

    响应结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    {
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 3,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [

    ]
    }
    }

7. _source检索需要的字段

  • 只查询需要的字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    POST /demo_query01/_search
    {
    "query": {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    },
    "_source": ["code","name.keyword"]
    }

    响应结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    {
    "took" : 3,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 3,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
    "_index" : "demo_query01",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
    "code" : "lxd001"
    }
    },
    {
    "_index" : "demo_query01",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
    "code" : "lxd001"
    }
    },
    {
    "_index" : "demo_query01",
    "_id" : "3",
    "_score" : 1.0,
    "_source" : {
    "code" : "huz001"
    }
    }
    ]
    }
    }
  • _source中使用过滤属性:includes之关注需要的字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    POST /demo_query01/_search
    {
    "query": {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    },
    "_source": {
    "includes": ["createTime","name"]
    }
    }
  • _source中使用过滤属性:excludes忽略不需要的字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    POST /demo_query01/_search
    {
    "query": {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    },
    "_source": {
    "excludes": ["createTime","name"]
    }
    }

8. range区间查

  • 区间查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    POST /demo_query01/_search
    {
    "query": {
    "range": {
    "code": {
    "gte": "huz000",
    "lte": "huz002"
    }
    }
    }
    }

    响应结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    {
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
    "_index" : "demo_query01",
    "_id" : "3",
    "_score" : 1.0,
    "_source" : {
    "code" : "huz001",
    "name" : "黑胡子",
    "title" : "黑胡子如何夺取震震果实?恶魔果实力量有什么奥秘?",
    "age" : 34,
    "price" : 3334.63,
    "card" : {
    "code" : "745008",
    "catdType" : "JSZ"
    },
    "address" : [
    {
    "province" : "浙江省",
    "city" : "杭州市"
    },
    {
    "province" : "陕西省",
    "city" : "榆林市"
    }
    ],
    "rangeInt" : {
    "gte" : 340,
    "lte" : 400
    },
    "rangeTime" : {
    "gte" : "2012-12-01 12:00:00",
    "lte" : "2015-12-02 17:00:00"
    },
    "createTime" : "2022-01-02 17:00:00"
    }
    }
    ]
    }
    }

  • filter过滤

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
      ```



    ### 4.2 复合查询

    #### 1. bool.must: and查询

    - and查询:即多条件复合,类似SQL中的and条件

    ```json
    POST /demo_query01/_search
    {
    "query": {
    "bool": {
    "must": [
    {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    },
    {
    "nested": {
    "path": "address",
    "query": {
    "term": {
    "address.city": {
    "value": "榆林市"
    }
    }
    }
    }
    }
    ]
    }
    }
    }

3. filter过滤

  • filter:表示在这个里面的查询条件都

2. bool.should: or查询

  • should查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    POST /demo_query01/_search
    {
    "query": {
    "bool": {
    "should": [
    {
    "terms": {
    "code": [
    "lxd001",
    "huz001"
    ]
    }
    },
    {
    "nested": {
    "path": "address",
    "query": {
    "term": {
    "address.city": {
    "value": "榆林市"
    }
    }
    }
    }
    }
    ]
    }
    }
    }

3. bool.must_not

  • must_not: 与must相反

4.2 复杂查询

1. bool-复合查询

2. from&size-分页查询

3. sort-查询排序

5. highlight-高亮查询

4.3 聚合查询

1. 平均值

2. 求和

3. 最大值

4. TopN

第二部分 Kibana

第三部分 LogStash

第四部分 Beats

打赏
  • 版权声明: 本博客所有文章除特别声明外,均采用 Apache License 2.0 许可协议。转载请注明出处!
  • © 2020-2022 xiaoliuxuesheng
  • PV: UV:

老板,来一杯Java

支付宝
微信