ElasticSearch学习笔记

本文作者：陈进坚
个人博客：https://jian1098.github.io
CSDN博客：https://blog.csdn.net/c_jian
简书：https://www.jianshu.com/u/8ba9ac5706b6
联系方式：jian1098@qq.com

简介

Elasticsearch是一个基于Lucene的搜索服务器，它提供了一个分布式多用户能力的全文搜索引擎。全文搜索属于最常见的需求，开源的 Elasticsearch 是目前全文搜索引擎的首选。它可以快速地储存、搜索和分析海量数据。维基百科、Stack Overflow、Github 都采用它。由于Elasticsearch基于RESTful web接口，所以所有支持http请求访问的编程语言都可以直接对接Elasticsearch。

安装

根据自己的系统类型按官方文档安装即可

https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

启动与停止

Elastic 需要 Java 8 环境。如果你的机器还没安装 Java请先安装。

以zip包方式安装为例，进入elasticsearch-7.10.0/目录，执行启动命令

1 2	.\bin\elasticsearch.bat # Windows系统 ./bin/elasticsearch # Linux系统

启动成功后在另一个终端中访问9200端口会返回信息

$ curl localhost:9200

{
  "name" : "DESKTOP-R3EDI39",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Ek4sDY1ZTzig_sNpzJnFaA",
  "version" : {
    "number" : "7.10.0",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
    "build_date" : "2020-11-09T21:30:33.964949Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

按Ctrl + C，Elastic 就会停止运行。

注意:默认情况下，Elastic 只允许本机访问，如果需要远程访问，可以修改 Elastic 安装目录的config/elasticsearch.yml文件，去掉network.host的注释，将它的值改成0.0.0.0，然后重新启动 Elastic。

基本概念

节点和集群

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

索引

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

文档

Index 里面单条的记录称为 Document（文档）。许多条 Document 构成了一个 Index。

Document 使用 JSON 格式表示，下面是一个例子。

{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}

同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

分组

Document 可以分组，比如weather这个 Index 里面，可以按城市分组（北京和上海），也可以按气候分组（晴天和雨天）。这种分组就叫做 Type，它是虚拟的逻辑分组，用来过滤 Document。

不同的 Type 应该有相似的结构（schema），举例来说，id字段不能在这个组是字符串，在另一个组是数值。这是与关系型数据库的表的一个区别。性质完全不同的数据（比如products和logs）应该存成两个 Index，而不是一个 Index 里面的两个 Type（虽然可以做到）。

下面的命令可以列出每个 Index 所包含的 Type。

1	$ curl 'localhost:9200/_mapping?pretty=true'

索引操作

新增索引

向服务器发送PUT请求即可新增索引，例：新增索引dog

1	curl -X PUT "localhost:9200/dog"

服务器返回一个 JSON 对象，里面的acknowledged字段表示操作成功

1	{"acknowledged":true,"shards_acknowledged":true,"index":"dog"}

删除索引

向服务器发送DELETE请求即可新增索引，例：删除索引dog

1	curl -X DELETE "localhost:9200/dog"

返回结果

1	{"acknowledged":true}

数据操作

新增记录

向服务器发送PUT请求的json数据即可新增记录，需要向指定的 /Index/Type 发送 PUT 请求

1	curl -H "Content-Type: application/json" -X PUT "localhost:9200/person/student/1" -d "{"""user""": """tom""","""sex""": """m""","""age""":12}"

这里本人是在windos系统下运行，所以对json中的"做了处理，其中/person/student/1中的person是index，服务器中没有这个索引也不会报错，新增记录的时候会自动生成；student是type；1是改天记录的id，id不一定是数字，也可以是字符串，服务器会返回下面的结果：

{
    "_index":"person",
    "_type":"student",
    "_id":"1",
    "_version":1,
    "result":"created",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":0,
    "_primary_term":1
}

会给出 Index、Type、Id、Version 等信息。

新增记录时也可以不指定id，但是要将PUT请求改为POST请求，服务器返回的 JSON 对象里面，_id字段就是一个类似AV3qGfrC6jMbsbXb6k1p这样的随机字符串。

查询记录

带上索引和id，用GET请求服务器即可，后面的?pretty=true可以使返回的json格式化以增加可读性

1	curl "localhost:9200/person/student/1?pretty=true"

返回结果：

{
  "_index" : "person",
  "_type" : "student",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "user" : "tom",
    "sex" : "m",
    "age" : 12
  }
}

如果索引不存在会报错，如果id不存在返回的json数据中found为false

{
  "_index" : "person",
  "_type" : "student",
  "_id" : "2",
  "found" : false
}

删除记录

删除记录就是发出 DELETE 请求

1	curl -X DELETE "localhost:9200/person/student/1"

返回结果：

{
    "_index":"person",
    "_type":"student",
    "_id":"1",
    "_version":2,
    "result":"deleted",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":1,
    "_primary_term":1
}

更新记录

按照新增记录的方法，用同一个index/type/id带上数据以PUT请求即可覆盖原来的记录，但是返回的数据中记录的 Id 没变，但是版本（version）从1变成2，操作类型（result）从created变成updated，created字段变成false

{
    "_index":"person",
    "_type":"student",
    "_id":"1",
    "_version":2,
    "result":"updated",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":3,
    "_primary_term":1
}

查询所有记录

使用 GET 方法，直接请求/Index/Type/_search，就会返回所有记录。

1	curl "localhost:9200/person/student/_search?pretty=true"

返回结果：

{
  "took" : 240,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "person",
        "_type" : "student",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "user" : "john",
          "sex" : "f",
          "age" : 15
        }
      },
      {
        "_index" : "person",
        "_type" : "student",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "user" : "tom",
          "sex" : "m",
          "age" : 12
        }
      }
    ]
  }
}

全文搜索

带指定格式的json参数用GET请求Index/Type/_search即可

1	curl -H "Content-Type: application/json" "localhost:9200/person/student/_search" -d "{"""query""" : { """match""" : { """user""" : """tom""" }},"""size""": 1,"""from""":0}"

其中user为搜索字段名，tom为搜索关键字，size为记录数量，默认10条，from为位移，默认是从位置0开始，可用于分页处理

逻辑搜索

OR

如果有多个搜索关键字，并且关键字用空格分开，系统默认为OR条件查询

AND

如果多个搜索关键字是and关系，需要在请求参数中指定must,例如

curl 'localhost:9200/person/student/_search'  -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user": "tom" } },
        { "match": { "user": "john" } }
      ]
    }
  }
}'

大于小于

{
    "query":{
        "range":{
		"age":{
           		 "gte":18,
			 	"lte":60
      	 		}
		}
	}
}

参考文档

http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/getting-started.html