Elasticsearch元数据

2023-11-16 Lang elastic 评论字数统计: 1.7k(字) 阅读时长: 8(分)

Meta-Fields(元数据)

本文基于 Elasticsearch 6.6.0

_all

_all字段是把其它字段拼接在一起的超级字段，所有的字段用空格分开，_all字段会被解析和索引，但是不存储。当你只想返回包含某个关键字的文档但是不明确地搜某个字段的时候就需要使用_all字段。
例子：

PUT my_index/blog/1 
{
  "title":    "Master Java",
  "content":     "learn java",
  "author": "Tom"
}

_all字段包含:[ “Master”, “Java”, “learn”, “Tom” ]

搜索：

GET my_index/_search
{
  "query": {
    "match": {
      "_all": "Java"
    }
  }
}

返回结果：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.39063013,
    "hits": [
      {
        "_index": "my_index",
        "_type": "blog",
        "_id": "1",
        "_score": 0.39063013,
        "_source": {
          "title": "Master Java",
          "content": "learn java",
          "author": "Tom"
        }
      }
    ]
  }
}

使用copy_to自定义_all字段：

PUT myindex
{
  "mappings": {
    "mytype": {
      "properties": {
        "title": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "content": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "full_content": {
          "type":    "text"
        }
      }
    }
  }
}

PUT myindex/mytype/1
{
  "title": "Master Java",
  "content": "learn Java"
}

GET myindex/_search
{
  "query": {
    "match": {
      "full_content": "java"
    }
  }
}

_field_names

_field_names字段用来存储文档中的所有非空字段的名字，这个字段常用于exists查询。例子如下:

PUT my_index/my_type/1
{
  "title": "This is a document"
}

PUT my_index/my_type/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "body" ] 
    }
  }
}

结果会返回第二条文档，因为第一条文档没有title字段。
同样，可以使用exists查询：

GET my_index/_search
{
    "query": {
        "exists" : { "field" : "body" }
    }
}

_id

每条被索引的文档都有一个_type和_id字段，_id可以用于term查询、temrs查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。例子如下：

PUT my_index/my_type/1
{
  "text": "Document with ID 1"
}

PUT my_index/my_type/2
{
  "text": "Document with ID 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}

_index

多索引查询时，有时候只需要在特地索引名上进行查询，_index字段提供了便利，也就是说可以对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。

_index是一个虚拟字段，不会真的加到索引中，对_index进行term、terms查询(也包括match、query_string、simple_query_string)，但是不支持prefix、wildcard、regexp和fuzzy查询。

举例，2个索引2条文档

PUT index_1/my_type/1
{
  "text": "Document in index 1"
}

PUT index_2/my_type/2
{
  "text": "Document in index 2"
}

对索引名做查询、聚合、排序并使用脚本新增字段：

GET index_1,index_2/_search
{
  "query": {
    "terms": {
      "_index": ["index_1", "index_2"] 
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_index": { 
        "order": "asc"
      }
    }
  ],
  "script_fields": {
    "index_name": {
      "script": {
        "lang": "painless",
        "inline": "doc['_index']" 
      }
    }
  }
}

_meta

_parent

_parent用于指定同一索引中文档的父子关系。下面例子中现在mapping中指定文档的父子关系，然后索引父文档，索引子文档时指定父id，最后根据子文档查询父文档。

PUT my_index
{
  "mappings": {
    "my_parent": {},
    "my_child": {
      "_parent": {
        "type": "my_parent" 
      }
    }
  }
}


PUT my_index/my_parent/1 
{
  "text": "This is a parent document"
}

PUT my_index/my_child/2?parent=1 
{
  "text": "This is a child document"
}

PUT my_index/my_child/3?parent=1&refresh=true 
{
  "text": "This is another child document"
}


GET my_index/my_parent/_search
{
  "query": {
    "has_child": { 
      "type": "my_child",
      "query": {
        "match": {
          "text": "child document"
        }
      }
    }
  }
}

_routing

路由参数，ELasticsearch通过以下公式计算文档应该分到哪个分片上：

1	shard_num = hash(_routing) % num_primary_shards

默认的_routing值是文档的_id或者_parent，通过_routing参数可以设置自定义路由。例如，想把user1发布的博客存储到同一个分片上，索引时指定routing参数，查询时在指定路由上查询：

PUT my_index/my_type/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}

GET my_index/my_type/1?routing=user1

在查询的时候通过routing参数查询：

GET my_index/_search
{
  "query": {
    "terms": {
      "_routing": [ "user1" ] 
    }
  }
}

GET my_index/_search?routing=user1,user2 
{
  "query": {
    "match": {
      "title": "document"
    }
  }
}

在Mapping中指定routing为必须的：

PUT my_index2
{
  "mappings": {
    "my_type": {
      "_routing": {
        "required": true 
      }
    }
  }
}

PUT my_index2/my_type/1 
{
  "text": "No routing value provided"
}

_source

存储的文档的原始值。默认_source字段是开启的，也可以关闭：

PUT tweets
{
  "mappings": {
    "tweet": {
      "_source": {
        "enabled": false
      }
    }
  }
}

但是一般情况下不要关闭，除非你不想做以下操作：

使用update、update_by_query、reindex
使用高亮
数据备份、改变mapping、升级索引
通过原始字段debug查询或者聚合

_type

每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。例子如下：

PUT my_index/type_1/1
{
  "text": "Document with type 1"
}

PUT my_index/type_2/2?refresh=true
{
  "text": "Document with type 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_type": [ "type_1", "type_2" ] 
    }
  },
  "aggs": {
    "types": {
      "terms": {
        "field": "_type", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_type": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "type": {
      "script": {
        "lang": "painless",
        "inline": "doc['_type']" 
      }
    }
  }
}

_uid

_uid 即 _type + _id 的组合，可用于查询、聚合、脚本和排序。例子如下：

PUT my_index/my_type/1
{
  "text": "Document with ID 1"
}

PUT my_index/my_type/2?refresh=true
{
  "text": "Document with ID 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_uid": [ "my_type#1", "my_type#2" ] 
    }
  },
  "aggs": {
    "UIDs": {
      "terms": {
        "field": "_uid", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_uid": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "UID": {
      "script": {
         "lang": "painless",
         "inline": "doc['_uid']" 
      }
    }
  }
}

本文链接： https://lxb.wiki/afde4b9b/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY-NC 4.0 许可协议。转载请注明出处！