SpringBoot整合Elasticsearch实现全文检索

前言

之前做搜索功能都是直接用MySQL的LIKE查询,数据量小的时候还行,一到大几万条数据就慢得不行

所以决定用Elasticsearch做全文检索,记录一下整合过程

其实Elasticsearch这个东西早就听说过,但一直没机会用,这次总算是有个实际项目可以用它了

环境准备

安装Elasticsearch

我用Docker安装的,比较方便:

1
2
3
4
5
6
7
8
docker pull elasticsearch:8.8.0
docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "xpack.security.enabled=false" \
-p 9200:9200 \
-p 9300:9300 \
elasticsearch:8.8.0

安装完访问 http://localhost:9200 能看到JSON响应就说明成功了

添加依赖

1
2
3
4
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

配置

1
2
3
4
5
spring:
elasticsearch:
uris: http://localhost:9200
connection-timeout: 10s
socket-timeout: 30s

创建实体类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@Document(indexName = "product")
@Data
public class Product {
@Id
private Long id;

@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String name;

@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String description;

@Field(type = FieldType.Double)
private Double price;

@Field(type = FieldType.Keyword)
private String category;

@Field(type = FieldType.Date)
private Date createTime;
}

这里用了@Document注解标识这是一个ES的文档类

@Field注解可以指定字段类型和分词器,我用了IK分词器,对中文支持比较好

创建Repository

1
2
3
4
5
6
7
8
public interface ProductRepository extends ElasticsearchRepository<Product, Long> {

List<Product> findByNameContaining(String name);

List<Product> findByPriceBetween(Double min, Double max);

List<Product> findByCategory(String category);
}

继承ElasticsearchRepository就有基本的CRUD方法了,也可以按照方法命名规则自定义查询方法

基本CRUD操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Service
public class ProductService {

@Autowired
ProductRepository productRepository;

// 新增或更新
public Product save(Product product) {
return productRepository.save(product);
}

// 批量保存
public List<Product> saveAll(List<Product> products) {
return productRepository.saveAll(products);
}

// 根据ID查询
public Product findById(Long id) {
return productRepository.findById(id).orElse(null);
}

// 查询所有
public List<Product> findAll() {
return (List<Product>) productRepository.findAll();
}

// 删除
public void deleteById(Long id) {
productRepository.deleteById(id);
}
}

复杂查询

如果需要更复杂的查询,可以用ElasticsearchRestTemplate

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@Service
public class ProductSearchService {

@Autowired
ElasticsearchRestTemplate elasticsearchRestTemplate;

public List<Product> search(String keyword, Double minPrice, Double maxPrice) {
BoolQueryBuilder query = QueryBuilders.boolQuery();

// 关键词查询
if (StringUtils.hasText(keyword)) {
query.must(QueryBuilders.multiMatchQuery(keyword, "name", "description")
.analyzer("ik_max_word"));
}

// 价格范围查询
if (minPrice != null || maxPrice != null) {
RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("price");
if (minPrice != null) {
rangeQuery.gte(minPrice);
}
if (maxPrice != null) {
rangeQuery.lte(maxPrice);
}
query.filter(rangeQuery);
}

NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(query)
.build();

SearchHits<Product> searchHits = elasticsearchRestTemplate.search(searchQuery, Product.class);

return searchHits.getSearchHits().stream()
.map(SearchHit::getContent)
.collect(Collectors.toList());
}

// 分页查询
public Page<Product> searchPage(String keyword, int page, int size) {
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description"))
.withPageable(PageRequest.of(page, size))
.build();

SearchHits<Product> searchHits = elasticsearchRestTemplate.search(searchQuery, Product.class);

List<Product> products = searchHits.getSearchHits().stream()
.map(SearchHit::getContent)
.collect(Collectors.toList());

return new PageImpl<>(products, PageRequest.of(page, size), searchHits.getTotalHits());
}

// 高亮显示
public List<Map<String, Object>> searchWithHighlight(String keyword) {
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description"))
.withHighlightFields(
new HighlightBuilder.Field("name"),
new HighlightBuilder.Field("description")
)
.build();

SearchHits<Product> searchHits = elasticsearchRestTemplate.search(searchQuery, Product.class);

List<Map<String, Object>> results = new ArrayList<>();
for (SearchHit<Product> hit : searchHits) {
Map<String, Object> map = new HashMap<>();
Product product = hit.getContent();
map.put("product", product);

// 获取高亮内容
Map<String, List<String>> highlightFields = hit.getHighlightFields();
if (highlightFields.containsKey("name")) {
map.put("highlightName", highlightFields.get("name").get(0));
}
if (highlightFields.containsKey("description")) {
map.put("highlightDescription", highlightFields.get("description").get(0));
}

results.add(map);
}

return results;
}
}

Controller

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@RestController
@RequestMapping("/product")
public class ProductController {

@Autowired
ProductService productService;

@Autowired
ProductSearchService productSearchService;

@PostMapping
public Product save(@RequestBody Product product) {
return productService.save(product);
}

@GetMapping("/search")
public List<Product> search(
@RequestParam(required = false) String keyword,
@RequestParam(required = false) Double minPrice,
@RequestParam(required = false) Double maxPrice) {
return productSearchService.search(keyword, minPrice, maxPrice);
}

@GetMapping("/search/page")
public Page<Product> searchPage(
@RequestParam String keyword,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "10") int size) {
return productSearchService.searchPage(keyword, page, size);
}
}

MySQL数据同步到ES

一般数据还是存在MySQL里,ES只是用来做搜索,所以需要把MySQL的数据同步到ES

方案一:定时同步

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Component
public class DataSyncScheduler {

@Autowired
ProductMapper productMapper;

@Autowired
ProductRepository productRepository;

// 每天凌晨2点同步一次
@Scheduled(cron = "0 0 2 * * ?")
public void syncData() {
List<Product> products = productMapper.selectList(null);
productRepository.saveAll(products);
}
}

方案二:实时同步(推荐)

在增删改的时候同步更新ES:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
@Service
public class ProductService {

@Autowired
ProductMapper productMapper;

@Autowired
ProductRepository productRepository;

public void save(Product product) {
// 保存到MySQL
productMapper.insert(product);

// 同步到ES
productRepository.save(product);
}

public void updateById(Product product) {
// 更新MySQL
productMapper.updateById(product);

// 更新ES
productRepository.save(product);
}

public void deleteById(Long id) {
// 删除MySQL数据
productMapper.deleteById(id);

// 删除ES数据
productRepository.deleteById(id);
}
}

方案三:使用Canal

如果项目数据量比较大,建议用Canal监听MySQL的binlog,实时同步到ES

这个配置比较复杂,这里就不展开了,有兴趣的可以看看Canal的官方文档

IK分词器配置

默认的分词器对中文支持不好,推荐安装IK分词器

1
2
3
4
5
6
7
8
9
# 进入ES容器
docker exec -it es /bin/bash

# 安装IK分词器
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v8.8.0/elasticsearch-analysis-ik-8.8.0.zip

# 退出并重启容器
exit
docker restart es

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@SpringBootTest
class ProductServiceTest {

@Autowired
ProductRepository productRepository;

@Test
void testSearch() {
// 先保存一些测试数据
Product product1 = new Product();
product1.setId(1L);
product1.setName("苹果手机");
product1.setDescription("新款苹果手机,性能强劲");
product1.setPrice(5999.0);
productRepository.save(product1);

// 搜索
List<Product> products = productRepository.findByNameContaining("手机");
System.out.println(products);
}
}

总结

Elasticsearch做全文检索确实比MySQL强多了,尤其是中文搜索

但是也有缺点:

  1. 需要额外维护ES集群
  2. 数据同步比较麻烦
  3. 占用内存比较大

根据项目实际情况选择吧,如果数据量不大,其实MySQL也能应付

暂时就先记录这么多