目标:
一、搭建准确的千万级数据库的准实时搜索(见详情)
二、实现词语高亮(客户端JS渲染,服务器端渲染,详见7.3)
三、实现搜索联想(输入框onchange,ajax请求搜索,取10条在层上展示方可)
四、实现词库管理(仅需管理scws下的自定义词库dd.txt即可)
五、实现全文搜索(提供了两种方案,详见8)
案例:
本文第五部分,针对实际应用场景,典型案例分析。
软件:
sphinx: sphinx-2.0.2-beta
scws: scws-1.2.0
===========================================================================
一、Sphinx安装
1、安装
# ./configure --prefix=/opt/server/sphinx --with-mysql=/opt/server/mysql # make # make install
1 2 3 | # ./configure --prefix=/opt/server/sphinx --with-mysql=/opt/server/mysql # make # make install |
2、配置
见sphinx.conf
详见下文,多索引增量索引方案
3、php 扩展
性能方面,扩展和直接使用API文件,差别不大;可以做选择;都在源码API中;
个人建议使用API文件,系统更稳定
3.1 sphinx客户端libsphinxclient
# ./configure --prefix=/opt/server/libsphinxclient # make # make install
1 2 3 | # ./configure --prefix=/opt/server/libsphinxclient # make # make install |
3.2 扩展
下载 http://pecl.php.net/package/sphinx
# /opt/server/php/bin/phpize./configure --with-sphinx=/opt/server/libsphinxclient --with-php-config=/opt/server/php/bin/php-config # make # make install 查看 # /opt/server/php/bin/php -m |grep sphinx
1 2 3 4 5 | # /opt/server/php/bin/phpize./configure --with-sphinx=/opt/server/libsphinxclient --with-php-config=/opt/server/php/bin/php-config # make # make install 查看 # /opt/server/php/bin/php -m |grep sphinx |
使用手册
http://docs.php.net/manual/zh/book.sphinx.php
4、索引 启动服务
# /opt/server/sphinx/bin/indexer --all # /opt/server/sphinx/bin/searchd
1 2 | # /opt/server/sphinx/bin/indexer --all # /opt/server/sphinx/bin/searchd |
二、php 分词 scws
官网 http://www.ftphp.com/scws/
1、 安装
# ./configure --prefix=/opt/server/scws # make # make install
1 2 3 | # ./configure --prefix=/opt/server/scws # make # make install |
2、 词库
scws-dict-chs-utf8.tar.bz2 解压放入 /opt/server/scws/etc
词库 dict.utf-8.xdb
规则 rules.utf-8.ini
3、 php 扩展
源码在phpext下
# /opt/server/php/bin/phpize./configure --with-scws=/opt/server/scws --with-php-config=/opt/server/php/bin/php-config # make # make install
1 2 3 | # /opt/server/php/bin/phpize./configure --with-scws=/opt/server/scws --with-php-config=/opt/server/php/bin/php-config # make # make install |
# vi php.ini [scws] extension = scws.so scws.default.charset = utf-8 scws.default.fpath = /opt/server/scws/etc 查看 # /opt/server/php/bin/php -m |grep scws
1 2 3 4 5 6 7 | # vi php.ini [scws] extension = scws.so scws.default.charset = utf-8 scws.default.fpath = /opt/server/scws/etc 查看 # /opt/server/php/bin/php -m |grep scws |
4、 分词测试
http://www.ftphp.com/scws/docs.php
详见测试文件 test_all.php
三、 索引
//索引某个索引 # /opt/server/sphinx/bin/indexer test1 //searchd 索引某个索引 # /opt/server/sphinx/bin/indexer test1 --rotate //指定索引搜索 # /opt/server/sphinx/bin/indexer -i test1 '逗她男'
1 2 3 4 5 6 | //索引某个索引 # /opt/server/sphinx/bin/indexer test1 //searchd 索引某个索引 # /opt/server/sphinx/bin/indexer test1 --rotate //指定索引搜索 # /opt/server/sphinx/bin/indexer -i test1 '逗她男' |
1、 增量索引方案
//创建表记录偏移 CREATE TABLE IF NOT EXISTS `search_counter` ( `counterid` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '统计标示', `max_doc_id` int(11) unsigned NOT NULL COMMENT '已统计数', PRIMARY KEY (`counterid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ; //增量索引 # /opt/server/sphinx/bin/indexer test1stemmed --rotate //合并索引 # /opt/server/sphinx/bin/indexer --merge test1 test1stemmed --rotate
1 2 3 4 5 6 7 8 9 10 | //创建表记录偏移 CREATE TABLE IF NOT EXISTS `search_counter` ( `counterid` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '统计标示', `max_doc_id` int(11) unsigned NOT NULL COMMENT '已统计数', PRIMARY KEY (`counterid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ; //增量索引 # /opt/server/sphinx/bin/indexer test1stemmed --rotate //合并索引 # /opt/server/sphinx/bin/indexer --merge test1 test1stemmed --rotate |
索引策略
1、搜索时,同时从主索引和增量索引取数据
2、每5分钟,运行一次增量索引;满足新数据搜索需求
3、每晚,运行一次主索引,同时会更新索引标示;再运行增量索引,实质为清空增量索引,避免与主索引重复索引
4、好处:避免开合并索引,合并索引效率较差
5、如数据量特别大,可考虑合并索引的方案
索引策略shell
//add.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log //all.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1 --rotate >> /opt/server/sphinx/var/log/all.sh.log /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log
1 2 3 4 5 6 7 | //add.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log //all.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1 --rotate >> /opt/server/sphinx/var/log/all.sh.log /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log |
四、 多个表独立索引方案
场景:如有用户搜索、商品搜索等多个索引需求
策略:配置一个多索引方案,每个表单独建立索引
前端根据不同类型选择不同的查询索引;全部,即选择所有索引
===========================================================================