Note: If your don't need to store keyword frequency, then go with Marmik Bhatt's LIKE
suggestion.
If you have large amount of data and you want to do a keyword search only (i.e. you are not going to be searching for phrases or use concepts like "near"), then you can simply create a keyword table:
CREATE TABLE address
(
id INT(10) PRIMARY KEY,
/* ... */
);
CREATE TABLE keyword
(
word VARCHAR(255),
address_id INT(10),
frequency INT(10),
PRIMARY KEY(word, article_id)
);
You then scan through the text that you are "indexing" and count each word that you find there.
If you want to do several keywords:
SELECT address.*, SUM(frequency) frequency_sum
FROM address
INNER JOIN keyword ON keyword.address_id = address.id
WHERE keyword.word IN ('keyword1', 'keyword2', /*...*/)
GROUP BY address.id;
Here i've done a frequency sum, which can be a dirty way to compare the usefulness of the result, when many are given.
Things to think about:
- Do you want to insert all keywords into the database, or only those, that have a frequency higher than a specific value? If you insert all your table may become huge, if you insert only higher frequency ones, then you will not find the only article that mentions a specific word, but does so only once.
- Do you want to insert all the available keywords for the specific article or only "top ones"? In this case the danger is that frequent words that add nothing to the meaning will begin pushing others out. Consider the word "However", it may be in your article many more times than "mysql", buy it is the latter that defines the article, not the former.
- Do you want to exclude words shorter then a specific length of characters?
- Do you want to exclude known "meaningless" words?
Post a Comment
0 comments
Dear readers, after reading the Content please ask for advice and to provide constructive feedback Please Write Relevant Comment with Polite Language.Your comments inspired me to continue blogging. Your opinion much more valuable to me. Thank you.