Full-Text Search is a database technique which will retrieve records even if they don’t exactly match your search criteria. This enables rich natural language searching that feels more familiar.
Searching for “database engine” using a full-text search will return results containing the phrases “database,” “engine” or “database engine”. This resolves possible ambiguities in your search criteria, so rows like “I have a database and an engine” still show up.
Simple searches in MySQL can be made using the
LIKE operator. This is inefficient and limited in functionality. MySQL usually needs to perform a full-table scan to identify records matching your query.
Full-text queries use a specially created index to improve performance. This also enables MySQL to keep track of the words within your dataset, facilitating the natural language search.
Setting Up Full-Text Search
You can only use full-text search with columns that have a full-text index. Use the
FULLTEXT clause in
CREATE TABLE statements to set this up when you seed your database schema. You can also use it with
ALTER TABLE to add a search index to existing columns.
CREATE TABLE articles(content TEXT, FULLTEXT (content)); ALTER TABLE articles ADD FULLTEXT (content);
With the index in-place, you’re ready to start querying your database.
Using Full-Text Searches
Full-text searches start with a
WHERE clause. You use
MATCH AGAINST instead of
LIKE. You need to indicate the indexed columns to match, as well as the query to search for.
SELECT * FROM articles WHERE MATCH (content) AGAINST ('database engine' IN NATURAL LANGUAGE MODE);
This will perform a full-text search of the articles using the query
database engine. By specifying
IN NATURAL LANGUAGE MODE, MySQL is instructed to interpret the query literally, without processing it in any way. This is the default search mode if none is specified.
Ordering By Result Relevance
One of the advantages of full-text search is that it lets you order the returned records by relevance. This is not possible with a regular
LIKE query. You can use a
MATCH ... AGAINST clause as part of a SQL
SELECT. The returned virtual column will contain a relevance score, from 0 to 1, indicating how closely the record matched the search query.
SELECT content, MATCH (content) AGAINST ('database engine') AS relevance FROM articles ORDER BY relevance DESC
This query would return the most relevant results first. This helps your application meet user expectations of how natural language search systems should perform.
MySQL computes search relevant scores by assessing several different factors. These include the number of records which match the query, as well as the number of times the query occurs within each record. A result with several exact matches for the query will rank higher than one which only contains part of the query.
MATCH ... AGAINST in a
SELECT statement, you don’t need to repeat it in the
WHERE clause. You could manually filter the results to include only records with a non-zero relevance score.
SELECT content, MATCH (content) AGAINST ('database engine') AS relevance FROM articles WHERE relevance > 0 ORDER BY relevance DESC
Query Expansion Mode
Natural language isn’t the only supported full-text search mode. Query expansion mode is an alternative which helps widen the range of the search results. It automatically weights the search query against the most relevant terms.
An expanded search begins by finding the records that contain a match for the query. Those records are then checked to identify the most relevant words. The database then runs another search, this time basing it on the relevant words instead of the original query. This usually results in more records being returned while maintaining an acceptable degree of relevance.
Here’s how you enable query expansion mode:
SELECT * FROM articles WHERE MATCH (content) AGAINST ('database engine' WITH QUERY EXPANSION)
The final full-text search mode is boolean mode. This lets you include boolean modifiers in your query. You can use this mode when you need advanced control over the matching logic.
You can require a word to be present in each result by prefixing it with
- to exclude results which contain the word. Other operators can be used to match parts of words, create sub-expressions and make words reduce the relevance score. The latter can be helpful when masking “noise” terms. If you don’t specify an operator, it’s implied the word will be referenced in the query with
SELECT * FROM articles WHERE MATCH (content) AGAINST ('+data* engine -sqlite' IN BOOLEAN MODE)
This query would surface articles with at least one word starting with
sqlite is not mentioned. Each result may or may not include the word
Boolean mode lets you construct powerful searches using your own logic. One caveat is it doesn’t support search relevance scoring. This is the tradeoff in giving users the ability to make boolean searches. The result ranking may not align with human expectations.
Configuring Full-Text Search
MySQL comes with several full-text configuration options which let you finetune how searches are conducted. Here’s a few of the most important.
innodb_ft_min_token_size– Sets the minimum word length for indexed terms. Words with fewer characters than this value won’t be added to the index so you won’t be able to search them. The default value is
3, which excludes extremely common words such as
I. This setting applies to InnoDB tables; use
innodb_ft_max_token_size– Similarly to
innodb_ft_min_token_size, this sets the maximum length of indexed words. Longer words won’t be searchable. Use
ft_max_word_lenfor MyISAM tables.
innodb_ft_enable_stopword– This setting, on by default, lets you control whether MySQL should filter out “stopwords”. Stopwords are very commonly used words which could unduly influence search results. The default stopword list contains 36 frequent phrases.
innodb_ft_user_stopword_table– You can set this value to the name of a database table which MySQL should source the stopword list from. This table must have a single
value. Add your stopwords to exclude as records in this table.
These settings are usually set in your MySQL server configuration file. The location varies by platform; it can often be found at
/etc/mysql/my.cnf. You’ll need to restart the MySQL service after changing the settings.
Once the server is back up, rebuild the full-text indexes for your table. You must do this so your data is reindexed using the current configuration. Otherwise, the previously indexed data will continue to be used.
To reindex an InnoDB table, run
OPTIMIZE TABLE my_table. For MyISAM tables, use
REPAIR TABLE my_table QUICK. The search indexes will then be rebuilt so your configuration changes take effect.
MySQL Full-Text Search is enabled by adding a
FULLTEXT index to your searchable fields. You then use
MATCH ... AGAINST with one of three searching modes to get your results. Natural language queries return a search relevance score which you can use to rank your results.
Full-text search offers more powerful search behaviour than a
LIKE statement. It’s also much more performant, particularly on large datasets, as all the text is indexed in advance.