21 April 2006

SQL Server Full Text Search Language Features

SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast and efficient querying when you have large amounts of unstructured data.

SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast and efficient querying when you have large amounts of unstructured data. This is the first of a two-part article that explores the full-text language features that ship with SQL Server versions 7, 2000, and 2005, with particular focus on the new language features in SQL 2005.

Here, in part 1, we examine the index time language options: how words or tokens are broken out and stored in a full text index. In part 2, we will switch focus to the query time language options.

SQL FTS architecture

SQL FTS builds a full-text index through a process called population, which fills the index with words and the locations in which they occur in your tables and rows. The full text indexes are stored in catalogs. You can define multiple catalogs per database, but the catalog cannot span databases; it is database specific. Similarly a table can be full-text indexed in a single catalog; or to put it another way – a table’s full-text index cannot span catalogs. However, in SQL 2005 you can full-text index views which may reside in different catalogs than the underlying base tables. This provides performance benefits and allows partitioning of tables.

There is a useful depiction of the SQL FTS architecture in BOL. I won’t repeat it here, but I do encourage you to familiarize yourself with it, and how the various FTS components interact. Very briefly, during the population process the indexing engine (MSSearch in SQL 7 & 2000 and MSFTESQL in SQL 2005) connects to SQL Server using an OLE-DB provider (PKM-Publishing and Knowledge Management) and extracts the textual content from your table on a row by row basis. MSSearch uses the services of COM components called iFilters to extract a text stream from the columns you are indexing.

iFilters

The iFilter used depends on the data type of the column in which the data is stored and on how you have configured your full-text index:

  • For columns of the CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, and NTEXT data types the indexing engine applies the text iFilter. You can’t override this iFilter.
  • For the columns of the XML data type the indexing engine applies the XML iFilter. You can’t override the use of this iFilter.
  • For columns of the IMAGE, and VARBINARY data type, the indexing engine applies the iFilter that corresponds to the document extension this document would have if stored in the file system (i.e. for a Word document, this extension would be doc, for an Excel Spreadsheet this would be xls). Please refer to the later section on Indexing BLOBs for more information on this.

NOTE: It is important to understand that an iFilter can launch a different language word breaker from the one specified as the default in the full-text language setting for your Server, or from the word breaker you set for the column in the table you are full-text indexing. I cover this in more detail in the Word Breaker section.

A complete list of built-in iFilters is accessible from: http://www.indexserverfaq.com/sql2005iFilters.htm. You can also obtain a list of the signed iFilters that are currently accessible on your SQL Server 2005 by issuing the following query in any database:

The iFilters are listed in the path and the document types that are indexed by the iFilters are listed in the document_type column. Note that although there are iFilters for asp and aspx pages, these pages will not be generated by the asp or aspx rendering engines, rather the HTML MetaTags and textual content between the body tags will be indexed and searchable.

NOTE: To see the text (and properties) that the iFilters emit from the documents, you can run them through FiltDump a utility available from the Platform SDK in the binn directory. It will show not only the text and properties the iFilters extract from the documents, but also the language tags applied to the text streams.

A note on iFilters and BLOBs

For SQL 2000, the correct iFilter for the Binary Large Object (BLOB) column you wish to index must be installed on your OS (use the filtreg utility, available from the Platform SDK in the binn folder, to tell which iFilters will be applied for a specific document type).

For SQL 2005, if you wish to index a document type for which a pre-installed iFilter does not exist, you must install the appropriate iFilter on the OS, and then load it using the following commands:

So, for example, if you wish to store and index PDFs in columns of the IMAGE or VARBINARY(MAX) data type you must install the Adobe PDF iFilter and issue the two commands above. Otherwise you will get errors of the following form in the full-text gatherer log (found at C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log):

SQL FTS Overview

In essence, in order to perform full text searching on a table you need to:

  1. Ensure that the table has a unique, not null column (e.g. primary key)
  2. Create a full text catalog in which to store full text indexes for a given table
  3. Create a full text index on the text column of interest

For a brief tutorial on how to create a full text index on SQL 2005 please click on the CODE DOWNLOAD link in the box to the right of the article title.

You can build full-text indexes on textual data stored in CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, NTEXT columns as we as on IMAGE (SQL 200x), VARBINARY(MAX) (SQL 2005), and XML (SQL 2005) data type columns that contain formatted binary data (such as this word document, for example).

Indexing Text

The sophisticated language features of full-text search then allow you to perform a range of advanced searches on your textual data, using the CONTAINS and FREETEXT T-SQL predicates (as well as the CONTAINSTABLE, and FREETEXTTABLE functions), returning a list of rows that contain a word or a phrase in the columns that you are searching on. You can perform:

  • Simple searches for specific words or phrases
  • Thesaurus searches for synonymous forms of word – a search on IE might return hits to Internet Explorer and IE (thesaurus based expansion search); a search on Bombay might also return hits to Mumbai (thesaurus based replacement search)
  • Searches that will return all different linguistic forms of a word (called generations) – a search on bank would return hits to banking, banked, banks, banks’ and bank’s, etc. (all declensions and/or conjugations of the search term bank)
  • Accent insensitive searches – a search on café would return hits to cafe and café

NOTE: there are other forms of searches too, which we don’t cover in this article – for example weighted searches and proximity searches. See MSDN2 for further details.

In short, SQL FTS provides a very powerful text searching mechanism. It is many orders of magnitude faster than a LIKE operator; especially for larger tables. This is because when you use the LIKE operator it does a byte-by-byte search through the contents of a row looking for a match to the search phrase. SQL FTS references the full text index to instantly return a list of matches. SQL FTS also supports a large number of different languages and language characters. As well as accents, it also seamlessly handles: compound words (in German and Chinese) and compound characters which occur in Chinese.

You may have noticed in some of the earlier CTPs (Community Technology Previews) for SQL Server 2005, 23 languages were supported, in the RTM version of SQL Server 2005 only 17 languages are supported. Please refer to http://support.microsoft.com/kb/908441/en-us for more information on how to enable the missing 6 languages.

Indexing BLOBs

With SQL 2000 and later, you also have the ability to full-text index BLOBs (Binary Large Objects or binary files) stored in the IMAGE (SQL 200x) and VARBINARY data type column (SQL 2005). SQL 7 did not have the capability to index BLOBs.

For BLOBs to be successfully full-text indexed you must ensure that the document extension is stored in an additional column (referred to as a document type column), and that you have configured your full-text index to use this document type column (i.e. the content of the document type column would be doc for rows containing Word documents, xls for rows containing Excel Spreadsheets, xml for rows containing XML documents, and so on).

Use VARCHAR(3) or VARCHAR(4) as the length of the document type column because, in some of the earlier versions of SQL FTS, if your document extension was less than the length of your document extension column the CHAR data type would pad the row contents with spaces, and consequently MSSearch would not launch the iFilter correctly. For example if your content was a Word document with the doc file extension, stored in a CHAR(4) document type column, MSSearch would attempt to launch the iFilter corresponding to “doc ” instead of “doc” and fail. The VARCHAR data type does not pad the data with white space, and so this problem is circumvented.

You configure your full-text table to reference this document type column by using the sp_fulltext_column stored procedure (in SQL 2000) or by the CREATE FULL TEXT INDEX statement in SQL 2005.

Script 1 in the download code for this article provides a worked example of indexing a BLOB.

Further information on how to index BLOBs in SQL 2000 and SQL 2005 can be found at http://www.indexserverfaq.com/Blobs.htm and at http://www.indexserverfaq.com/BlobsSQL2005.htm.

SQL FTS Language Features

The language features of SQL FTS come into play both at “Indexing Time” and “Query Time”. At index time we are essentially building an index of words or tokens on the textual data. A token is a character string, delineated from other character strings by white space or punctuation, which may or may not have linguistic significance but is not normally construed as a word, e.g. MSFT, XXX, PartNo 1231, 159-23-4364, or even http://www.microsoft.com.

During the indexing process, linguistic analysis is performed. Word boundaries are located by application to the text data of language specific components called word breakers, thus identifying our tokens. Basically, the word breakers will control how the textual content is broken into words and how these words are stored in the full-text index. Word breakers may tag the word as currency, a date, and for some languages (especially German, and Far East languages) will sometimes analyze the word and extract constituent words or characters.

At query time, the user supplies a query, such as the following:

When the query is submitted, a language specific component called a stemmer is used to expand the search phrase, depending on language specific rules that are in place. These rules are controlled by SQL Server settings, or your SQL Full-text query arguments. So, in the above example, the search term, bank, would be expanded to incorporate variants such as banking, banked, banks, banks’ and bank’s, etc (all conjugations of the search term bank). The index will then be searched for these words and the matching rows will be returned.

Word Breakers

The iFilters return a text stream to the indexing engine which then applies language specific word breakers to the text stream to break the stream into words or tokens typically at white space or punctuation boundaries.

Specifying the Word Breaker

Different language word breakers are loaded depending on how you configure SQL Server, or how you configure full-text indexing or, sometimes, even on the content being indexed.

Via Full-Text Index Settings

The language word breaker will be used that corresponds to the LCID (LoCale IDentifier) specified in the sp_addfulltext_column (for SQL 7 and 2000) or the language specified in the CREATE FULL TEXT INDEX statement in SQL 2005, as illustrated below:

See Script 2 in the download code for a full worked example

For SQL 2005, you can get a list of supported full-text languages in SQL 2005 by issuing the following query:

In SQL 2000 you would issue this query to xp_msfulltext which will return the language name and its LCID. If no language word breakers exist for an LCID, the neutral word breaker is used (LCID=0).

Via SQL Server Settings

If no language is specified in the sp_addfulltext_column or CREATE FULLTEXT INDEX statement then the default full-text language setting for your SQL Server is used. You can check/modify the default full-text language setting using:

You may have to set advanced options to “on” to see or use this option.

Via language tags in the content being indexed

Some documents have language tags and, if the iFilter understands these language tags, it might (depending on the iFilter implementation) launch a different word breaker than the one specified by any the Full-Text index or SQL Server settings.

The language-aware document formats that I know of are Word, XML, and HTML. illustrates these features. These language tags will override all of the above settings. For Word, the entire document, or a portion of the document, can be tagged as having a different language than the default for the machine on which they were created, and there can be multiple different language tags in the same document (i.e. French, German, and Chinese). You set the Word language tags in two ways:

  1. At the document level: open up your document in Word and click Format, in the Show: drop down box, select Custom, and click Styles, Custom, and select Modify. Click Format and select Language.
  2. At the text level: highlight a portion of text, and click Tools, click Language and Set Language and select the language with which you wish this portion of text to be tagged.

For HTML documents, you can use the ms.locale meta tag. For example, for Greek the tag would be:

For XML documents, you can use the xml:lang tag. For example, for Ukrainian use:

Note that the use of the Greek and Ukrainian languages is for illustrative purposes only. These languages are not currently supported in SQL Full-Text search.

Script 3 from the code download illustrates the use of language tags to set the word breaker.

How Words are stored in the index

Generally, the words broken at index time are stored in the catalogs exactly as broken by the word breakers. However, there are some exceptions. For example:

  • c# is indexed as c (if c is not in your noise word list, see more on noise word lists later), but C# is indexed as C# (in SQL 2005 and SQL 2000 running on Win2003 regardless if C or c is in your noise word list). It is not only C# that is stored as C#, but any capital letter followed by #. Conversely, c++ ( and any other lower-cased letter followed by a ++) is indexed as c (regardless of whether c is in your noise word list).
  • Dates are stored in multiple formats for example 1/1/06 is stored as 1/1/06 and 2006/1/1 (and also 1906/01/01! – 1/1/2006 is stored correctly as 1/1/2006 and 2006/01/01).
  • For some languages the word is stored more than once in the catalog – the first time as it appears in the content, and then with alternative word forms of that word. For example, using most word breakers, L’University is stored as University and also as L’University.
  • German compound words are stored as a compound word and the constituent words (for example, wanderlust is stored as wanderlust, wander, and lust) when indexed using the German word breaker. Using other word breakers it is stored as wanderlust.
  • Chinese and other Far East language “words” are stored as the word, characters, and constituent characters.

You can determine how the word breaker will break a word in your content by running the word through the lrtest (language resource test) tool. Lrtest ships with the SharePoint Portal Server resource kit, which you can obtain from SharePoint Portal Server CD in the Support\Tools directory.

You can find a description of how to use this tool here: http://support.microsoft.com/kb/890613. For SQL 2000 the language resource location is:

The relevant language resource location for SQL 2005 is:

Regrettably, lrtest does not work for most SQL 2005 word breakers except for Thai, Swedish, and Japanese, so the following examples use the SQL FTS 2000 word breakers. For the time being, you will have to resort to trial and error to determine if the behavior revealed using lrtest on the SQL 2000 work breakers still applies on SQL 2005.

For index time behavior the relevant key for your language is WBreakerClass. So, to see how C# is broken for US English you get the CLSID of the following key:

And run the following:

The result (abbreviated) should look as follows:

Trying c#:

The result (abbreviated) should look as follows:

Adding support for your own language word breaker

In SQL 2005 you can add support for your own language by adding a key to HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language. For example in order to add support for Arabic, follow these steps:

  1. Download and install the Arabic word breaker from:
    http://www.microsoft.com/middleeast/arabicdev/beta/search/
  2. Run the installation program
  3. Install Complex Script components – open up Regional Settings, click on the Languages tab, and select Install files for complex script and right to left languages.

 130-image001.jpg

  1. Add the Arabic key: HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\Arabic-EG with the following values:

    Locale (dword) 1025 (decimal)

    NoiseFile (String) c:\temp\Arabic.txt

    StemmerClass (String)

    TsaurusFile (String) c:\temp\ArabicThesaurus.txt

    WordBreakerClass (String) {3E0C67A6-38F8-43b6-BD88-3F3F3DAC9EC1}

  2. Add the following key
    HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\CLSID\{3E0C67A6-38F8-43b6-BD88-3F3F3DAC9EC1}
    with a string value of lrar32.dll.
  3. Copy lrar32.dll and lrar32.lex from C:\Program Files\SharePoint Portal Server\Bin\ to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

It is critical that you review your license, or contact your local Microsoft office, with regard to possible licensing issues surrounding use of the Arabic word breaker in your SQL Server search application. The Arabic word breaker seems to be in perpetual beta and it is not clear whether it is licensed for SQL Server 2005. It appears that there are no licensing restrictions on other third-party supplied SQL 2005 word breakers, but they are not supported by Microsoft.

This post on blogs.msdn.com provides more information on how to add custom language support for SQL 2005 FTS.

Token

SQL 2000 on Win2k (SP4)

SQL 2000 on Win2003

SQL 2005

AT&T

AT, T

AT, T

AT&T

C#

C

C#

C#

C++

C

C++

C++

c#

C

C

C

c++

C

C

C

.Net

NET

NET

NET

<Title>rats<Title>

<TITLE>R, ATS, <TITLE>

TITLE, RATS, TITLE

TITLE, RATS, TITLE

Note that the anomalies you see with the HTML tags, or the > and < characters, only occur when you are indexing text; they do not occur if you are using the HTML iFilter. Unless otherwise indicated, these tokens are all searchable using a placeholder in place of the non-alphanumeric character – so a search on AT&T will match with AT&T, AT!T, AT*T, AT T, and so on. To correct this behavior you may need to replace the problem token in your content with a non-breaking version. For example if you want searches on AT&T to be handled properly you need to e. replace AT&T with ATANDT wherever they occur in the tables you are full-text indexing and and then trap for a search phrases which include AT&T, and replace it with ATANDT. The replacement feature in the thesaurus option is ideal for this.

Chinese, Japanese, Korean and Thai

The Far East word breakers (Chinese, Japanese, Korean and Thai) break the text stream into individual characters, or even break the characters into constituent characters which correspond to words. The Chinese writing system is not phonetically based. Most ancient writing systems (Chinese, Sumerian, Egyptian, and Aztec for example) started as pictograms, whereby each character was represented by a picture. Due to the large number of characters required, most pictorial systems collapsed and/or evolved in to phonetic systems. However, Chinese retained a pictogram component and then evolved to incorporate ideograms, whereby the graphical symbols (characters) represent ideas.

There are actually five types of Chinese characters. The simplest are pictograms and ideograms, but these two types comprise only a small percentage of the total number of characters. Most Chinese characters are semantic-phonetic compounds where part of the character indicates the sound of the word, and another part indicates its meaning. The other two types are compound ideograms (where two or more characters are compounded into one, and their individual meanings contribute to the meaning of the whole) and phonetic loans (in which a pre-existing character is borrowed to represent a word whose sound is similar to that of the word the character originally represented).

Consider that the character for tree in Chinese is (130-image003.jpg), forest is (130-image004.jpg), and (130-image005.jpg) is a wood. If you look closely you will see the character for tree is present in the characters for wood and forest; in wood you see two tree characters stuck together, in forest you see three stuck together.

In Japanese or Chinese a character string has to be broken into its component characters to determine what the word is about and to determine what it matches (as the phrase among Natural Language researchers goes – you can tell a word by its neighbors – similarly you can tell a character by its neighbors). So if you were doing a FreeText search on the Chinese character wood you would expect to get hits to rows containing wood and forest.

However to index Chinese characters you need to derive their meaning, which requires further decomposition. Chinese characters consist or one or more strokes, termed radicals. There are 214 radicals used to build characters. By decomposing a character into its constituent characters, or radicals, you can derive its meaning. For more information on radicals consult: http://www.chinese-outpost.com/language/characters/char0035.asp.

The word breakers for Chinese, Japanese, Korean, and Thai will take the text stream, break at white space, and punctuation, and then make multiple passes of the tokens using context to extract the constituent characters. Indexing these languages is a CPU intensive process because of the multiple passes required in the word breaking process for these languages.

UK vs. US English at Index Time

SQL FTS supports two versions of English: UK English and US English. One of the questions I most frequently get asked is “”What is the difference between the UK and US English word breakers?” There are four points to keep in mind here:

  • In actual fact UK English does not refer to the Queen’s English or the English used in the United Kingdom, but International English; the English that is used in all other English speaking countries other than US English.
  • They use the same word breaker.
  • Most frequently the UK word breaker is used when you want to use a different noise word list than the US one to implement a less precise search (we’ll cover this on the section in noise words), or a different thesaurus.
  • Significant differences do occur at query time between the two language stemmers (we’ll cover this in the next article, in the section on UK vs. US English at Query Time), but not in the word breakers. In fact in SQL 2000 FTS the UK English and US English word breakers are the same. In SQL 2005 they have a different CLSID. I did speak with a Microsoft developer who worked on many of the Microsoft Search products and he speculated that the differences are there to handle different word endings in UK English, i.e. Catalog versus catalogue; check versus cheque; bank versus banc, or banque; orientated versus oriented.

German

As mentioned previously, the German language has a large number of compound words. When these words are indexed they are stored as the compound word as well as its constituent words. The German word breaker also handles umlauts (the dots that appear in Mötley Crüe). Umlauts denote plural forms of nouns and sometimes different verb forms (e.g. present tense). Words with umlauts are indexed with the umlauts and the non variant of the word without the umlaut, for example häuser (German for houses) is indexed as häuser and haeuser. Mötley Crüe is stored as Mötley Crüe, and Moetley Cruee.

Noise Words

When computer scientists first started work in the field of information retrieval (the branch of computer science which involves search technology) hard disks were very expensive. Search scientists did anything they could to save on disk space. They recognized that there were a large number of words which occurred very frequently and which were not helpful in resolving searches. Such words are referred to as noise words. For example words which linguists class as articles, such as the, a, and at are for the most part irrelevant in a search. They add nothing to the search and don’t help to resolve searches – for instance a search on University of California at Berkley returns the same results as a search on University of California Berkley. Similarly pronouns like me, my and mine don’t help in a search either. Articles and pronouns (and other similar parts of speech) occur frequently in bodies of text, and by not indexing them search scientists were able to save disk space and improve search speed (very slightly), with only a marginal decrease in accuracy. Google does not index noise words, but MSN Search does to provide greater accuracy in searches!

Another example of a noise word is a word which may not be a part of speech, but occurs frequently in the content and does not help in resolving searches. For example, the word Microsoft occurs on every page of Microsoft.com. A search on Microsoft Windows XP is indistinguishable from a search on Windows XP. By adding such words to your noise word list your catalog size will decrease.

The problem with noise words is that they will break searches which contain them. So a search on the band “the the” will result in the following message on SQL 2005:

Informational: The full-text search condition contained noise word(s).

But results will be returned for matches to other search arguments, so a search on “the the albums” would return rows which contained the word albums in the full-text indexed columns. On SQL 2000 however, the message is:

Execution of a full-text operation failed. A clause of the query contained only ignored words.

And the query will not return any results. BOL states that this behavior can be fixed in SQL 2005 by using:

But as far as I can tell, this command has no effect.

Summary

In this article, we looked at the language options of SQL Full-Text Search which occur at index time. We started off by looking at the SQL full-text architecture with special emphases on iFilters and word breakers. We also looked at how language aware iFilters can override the server language settings or the word breaker you specify to break the text in these documents. We then discussed how the words are stored in the full-text catalogs and had a look at some of the language idiosyncrasies. In the next article in the series we will look at language options at query time; i.e. when you query. Till next time, I hope you find what you are looking for!

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 379123 times – thanks for reading.

Tags: , , , ,

  • Rate
    [Total: 221    Average: 4/5]
  • Share

Hilary Cotter is a SQL Server MVP specializing in replication, and full-text search. He recently published a book on replication and is writing another about Microsoft search technologies. Hilary has been involved in IT for 21 years and is Director of Text Mining and Database Strategy for RelevantNoise.com - dedicated to mining blogs for business intelligence.

View all articles by Hilary Cotter

  • Anonymous

    Noice word error
    give me the solution ASAP
    my mail id is :
    sirji2k@yahoo.com

  • Anonymous

    correction
    Google stoped using stop-words aka noise words some time ago

  • Anonymous

    full text index
    I have tried all the ways to enable the full text index search. How to do that?
    anbu

  • Anonymous

    Date Column as Text
    Hi,

    very helpful site.

    Can something be done to treat a Date Column as a text so that mixed FTS queries such as ‘Purchase 1-Jan-2005’ or ‘Sold 2/2/2005’ are directly possible?

    Thanks

  • Anonymous

    exec sp_configure ‘transform noise words’, 1
    exec sp_configure ‘transform noise words’, 1 worked for me as advertised. I get the results as if I had not included the noise words, and a message is thrown but not an error.

  • Terry

    Broken Link in above
    Hi

    The following link appears broken:

    For a brief tutorial on how to create a full text index on SQL 2005 please click here.

    I could do with this help!
    Thanks

  • Tony Davis

    re: Broken Link
    Terry,

    Sorry about that. I added the PDF tutorial to the code download file. Just click on the CODE DOWNLOAD link in the box to the right of the article title, and you’ll download a zip containing the three code scripts and the tutorial.

    Cheers,

    Tony (Simple-Talk Ed.)

  • Anonymous

    Finding a good Thesaurus as a starting point
    The documentation says that the installed Thesaurus is empty. Can you confirm this is true, and if so, can you recommend a good baseline Thesaurus to get us started? Are there sources for specialty items like IT terms or healthcare terms?

  • Anonymous

    Multiple Catalogs
    Dear Hilary,
    Under the title “SQL FTS architecture” it is written that “a table can be full-text indexed in a single catalog; or to put it another way – a table’s full-text index cannot span catalogs”. Isn’t there any other way to split the catalog so that a table is full-text indexed in multi catalogs because my search queries often time out with the single huge catalog, so I’m thinking on some other way to handle this.
    Any suggestion in this regard will be highly appreciated.

  • Anonymous

    Multiple Catalogs
    Dear Hilary,
    Under the title “SQL FTS architecture” it is written that “a table can be full-text indexed in a single catalog; or to put it another way – a table’s full-text index cannot span catalogs”. Isn’t there any other way to split the catalog so that a table is full-text indexed in multi catalogs because my search queries often time out with the single huge catalog, so I’m thinking on some other way to handle this.
    Any suggestion in this regard will be highly appreciated.

  • Anonymous

    INDEXED COLUMN AND NOISE WORD KNOWLEDEGE
    This article is fabulous and provides one great insight about how FTS work.

  • Anonymous

    How to get excerpts?
    Unfortunately I see no simple way to get the text that matches, there is no function that returns the index where the match was found when using full text search.

  • Anonymous

    How to use combination of results with RANK value
    I am in dilema to use RANK value from different entities like,
    1. Work (Group of titles)
    2. Product (Title information)
    3. Authors (Author Information)

    Relation between each entities are as follows,
    Work 1 – M Product
    Product 1 – M Authors

    I want to build a single search resultset with RANK value which will return title information in combination of search results in each entities.

    How to use combination of RANK value from different entities and combine them as a single search result?

  • Anonymous

    an easy-to-customize database search
    Please take a look at this

    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

    You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

    You can easily index, re-index, incremental-index. It’s also highly scalable and easily customizable.

    The best thing is, it’s super easy. You can create a production-level search in 3 minutes.

  • Anonymous

    how to use full text search?
    i have load all the iis log into database, make me easier to search for some log,, but the table is kind of large,, wondering how to use full text search…

    i had made the table for full text search..
    then how to use that?? do i need to modifer my sql query??
    or sql server itself will use those indexes??

  • Anonymous

    You had a mistake in understanding Chinese
    Hello,
    There is a mistake in your article:
    The Chinese character for tree and wood is wrong, they should be exchanged.
    The character for forest is correct.
    Nice try though!

  • Bastiaan van Utrecht

    Solution for postfix terms?
    Configured full text indexing on item catalog information.

    Is it possible to search with postfix terms? For example.. in the database there is the word bikecomputer. If I search with the term computer this record should be in the return set.

    Any advice is welcome

  • Anonymous

    Not able to use Full Text
    Hi,

    I used the Full Text, which works for me for certain types of formats. But it does not work for .RTF and .PRC ( SQL Scripts ).

    If you have any soluition for this problem, pls let me know.

    My Email ID is swapnil_gaur at yahoo.com

    Thanks,
    Swapnil G

  • Anonymous

    Search using more than one keyword
    If I want to perform a search with more than 1 word like (joe + bloodflow) on full text index column, what should I do?

  • Jou Yaoo

    International Language Word Breaker
    Hi,

    There are both Simplified Chinese and English in the Full-Text search column. I now use Simplified Chinese word breaker when build search catalog. However, the English search becomes less accruate because thesaurus searches and generations do not work. Anybody has solutions for this issue. Please advise. Thanks!

    Jou

  • Gavin Harriss

    Multi-lingual problem converting nvarchars into varbinarys
    Hi Hilary,

    Thanks for a great article – you bought to my attention a great technique for having multi-lingual entries in a single table. However when I tried to apply the technique all went well until I needed to use NVARCHARs which I convert into VARBINARYs. The catalog seems to skip over these records for some reason. Is this something you’ve encountered and have found a work-around for?

    For more detail and a quick T-SQL script to re-create the problem please take a peek at a forum post I made about the issue…
    http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=1844786&SiteID=17

    Any tips would be greatly received. Cheers,
    Gavin.

  • Geof

    Key Word Highlighter

    I am very pleased with the free text searching so far. But is there a way for SQL server to create a html page with the keywords high lighted. Like in a google toolbar search. I would think that most people who do a free text search would like to have this feature built-in. I see that SQL server has some built in reporting services.

    Now I have heard the arguement that this should be implemented by the developer… That this is a GUI thing… BUT that requires us to know all the internal workings of the SQL server free text engine to be able to match up results. So far I have not found alot of info on the free text engine. These articles have been some of the best info.

    What I would like to see is a way to pass a keyword with a color (RGB) parameter and get a html page returned with the keywords high lighted.

    I would think that most people that use free text searching have a need for a high lighting feature. It would seem logical to offer this as a type of report feature.

    Maybe they already do and I just don’t know about it.

  • Perla

    How a full text index works in a partition table
    Can you explain how works a full text index in a partition table?

  • Perla

    How a full text index works in a partition table
    Can you explain how works a full text index in a partition table?

  • Diab

    I have a problem with the arabic catalog
    the arabic full text catalog can’t be populated, please give me the solution

  • Anonymous

    2000 & 2005
    I have a problem to understand in SQL 2000 & 2005.
    TEL ME NEW FEATURES OF SQL 2005 WHICH IS NOT IN SQL 2000.

  • Anonymous

    How many languages support FullText in SQL Express.
    Hi there,

    I want to know how many language can support in SQL Express?

    In table i have more than 1 million rows, need to index for full-text search. Table have more than 17 languages and more than 300 000 rows in translate in english.

    If you compare to Apache Lucene or SQL Server Express FullText Search ? which one is best, good ?

    Please, let me know thanks.

  • Matt Johnson

    No love for “the the”
    So there isn’t an answer for finding their music outside of knowing an album/song name?

  • yasser

    sql like condition
    when i use like condition it doesn’t work with arabic keys

  • yasser

    sql like condition
    if any one know the answer please send it at my e-mail yasoo_moaly@hotmail.com

  • Anonymous

    Converting my Data from English to another Language
    Can someone lead me in the right direction? I have a Web Application that i want the data in the DB rendered as a different language per the users desire. I am using SQL Server 2005. Please send any help to greenro12002@yahoo.com

    Thanks

  • Anonymous

    The full-text search condition contained noise word(s). …
    The full-text search condition contained noise word(s). …

  • Anonymous

    Informational: The full-text search condition contained noise word(s).
    Informational: The full-text search condition contained noise word(s).

    I have looked at all the fulltext noise words noiseeng.txt,noiseenu.txt and noiseneu.txt and the word best is not in those files. If I put best in those files I have a c# function that looks at the value and checks to see if it is in the noise files, if it is it strips it (best business would be just business).

    Does anybody know if this is a known bug? Does any body know a work around besides adding the word to the noise lists?

  • peter.soccar

    arabic full text search is not working on sql 2005
    I followed the steps you stated for supporting a new language in sql 2005 full text search, except for step 6 as when i installed sharepoint portal server 2003 to copy the lrar.dll file it wasn’t there.
    you can mail me at peter.soccar@bibalex.org
    any help will be appreciated
    thanks in advance

  • Anonymous

    Wow
    So much people who come in here and just shout their problems, “solution needed ASAP”.. Try reading the manual! If you are messing about with FTS, you must be a programmer, right? Maybe you could figure this out for yourself!

    As for me: a great article, thanks very much!

  • Saraboji Jayaraman

    Camel case
    I am trying to change the all data like “ShIrT Is VeRy NiCe”. Is it possible?

    See the full sentence each and every word should be start with caps.

    this changes will affect on whole database.

  • Sk

    Searching noise words
    Without deleting or updating the noise word files, is there a way to include Noise words in search?

  • Vlad

    Great article
    Thanks a lot, really helpful

  • Sarah Grady

    Anonymous Comments Disabled
    Due to a high volume of spam, anonymous comments have been disabled.

  • bharatdevada

    Full text is not working for pdf files
    Hi,
    good article, i tried the example http://www.indexserverfaq.com/Blobs.htm, it works for text files saved in image data type but does not work (atleast for me) for pdf files. May i am missing something, but I followed the steps you stated in that article. Please suggest what i am missing.My email id is bharatdevada@gmail.com

    Thanks in advance.

  • JungleOlly

    Band Names and noise words
    Hi,
    I actually came across this article as you mentioned band names! I run a website and have a search field where users can type in artists and/or venues to get a list of events. However, I’m finding that many band names contain noise words i.e. “Take That” and “Get Cape”.
    In these situations, is it best to start removing words from the noise file?
    Thanks!

  • Ekta

    Using freetext() on a column which store multilanguage data
    Hi,

    This article is very informative. But I still have a question. We are developing a multilanguage website.

    I have a table Product with following columns
    productid (bigint)
    code nvarchar(20)
    name nvarchar(250)
    sname nvarchar(250)

    Here full text index is applied on code, name and sname column. Name and Sname column can contain data in different languages. Now on my websit we have a search box, in which user can enter search keywords in any language. and I a have a function like follow, which is not returing me proper result
    —————————————-
    Note : @searchstr = ‘ようこそゴールデ’

    CREATE FUNCTION [dbo].[FN_SearchProduct] (@searchstr nvarchar(50))
    RETURNS TABLE
    AS
    RETURN (
    SELECT productid
    FROM product p
    WHERE freetext(*, @searchstr)
    )
    —————————————-

    I can not use N infront of @searchstr, it gives error. Please let me know if there is any way to get proper result.

    I am stuck at this point, and will be very thankful to you, if you could help me out with this…

    Thanx…

  • Ekta

    Using freetext() on a column which store multilanguage data
    Hi,

    This article is very informative. But I still have a question. We are developing a multilanguage website.

    I have a table Product with following columns
    productid (bigint)
    code nvarchar(20)
    name nvarchar(250)
    sname nvarchar(250)

    Here full text index is applied on code, name and sname column. Name and Sname column can contain data in different languages. Now on my websit we have a search box, in which user can enter search keywords in any language. and I a have a function like follow, which is not returing me proper result
    —————————————-
    Note : @searchstr = ‘ようこそゴールデ’

    CREATE FUNCTION [dbo].[FN_SearchProduct] (@searchstr nvarchar(50))
    RETURNS TABLE
    AS
    RETURN (
    SELECT productid
    FROM product p
    WHERE freetext(*, @searchstr)
    )
    —————————————-

    I can not use N infront of @searchstr, it gives error. Please let me know if there is any way to get proper result.

    I am stuck at this point, and will be very thankful to you, if you could help me out with this…

    Thanx…

  • Ekta

    Using freetext() on a column which store multilanguage data
    Hi,

    This article is very informative. But I still have a question. We are developing a multilanguage website.

    I have a table Product with following columns
    productid (bigint)
    code nvarchar(20)
    name nvarchar(250)
    sname nvarchar(250)

    Here full text index is applied on code, name and sname column. Name and Sname column can contain data in different languages. Now on my websit we have a search box, in which user can enter search keywords in any language. and I a have a function like follow, which is not returing me proper result
    —————————————-
    Note : @searchstr = ‘ようこそゴールデ’

    CREATE FUNCTION [dbo].[FN_SearchProduct] (@searchstr nvarchar(50))
    RETURNS TABLE
    AS
    RETURN (
    SELECT productid
    FROM product p
    WHERE freetext(*, @searchstr)
    )
    —————————————-

    I can not use N infront of @searchstr, it gives error. Please let me know if there is any way to get proper result.

    I am stuck at this point, and will be very thankful to you, if you could help me out with this…

    Thanx…

  • vlsivajyothi

    Fulltext search is not working 64-bit machine -windows 2003 server
    Hello Hilary Cotter,
    SQL Server2005 is installed in Windows 2003 Server(64-Bit) and updated with Sql Server SP2 and SP3. I installed FilterPack and registered with SQL server as per the steps given in the following link
    http://support.microsoft.com/kb/945934

    I’m trying to do Fulltext search on varBinary column (contains visio documents).
    When setting up the full-text index, I specified the type column(.vsd), in addition to picking the binary column.
    then I run the following full-text query.
    select * from visDocuments where freetext([Document],’Dell’)
    But its not returning the rows. I’m sure one of the row is having the word Dell.
    Please guide me to solve this problem. Thanks in advance.

    Note: I followed the above steps in both Windows XP(32-bit) machine and windows 2003 server(32-bit), it worked fine. it returned rows.

  • Meysam Javadi

    could not find any row!
    Hi mr Cotter.
    i created a FTS on my DB with your unique article.
    i have a column with type of varbinary(max) that i filled it with MSWord file(.doc) and another column with type of word(filled by .doc).
    when i run this script , the output is empty
    SELECT
    name,
    content
    FROM tblWordFiles
    WHERE FREETEXT(content, ‘1’);
    or
    SELECT
    name,
    content
    FROM tblWordFiles
    WHERE contains(content, ‘1’);
    where is my problem

  • Meysam Javadi

    could not find any row!
    Hi mr Cotter.
    i created a FTS on my DB with your unique article.
    i have a column with type of varbinary(max) that i filled it with MSWord file(.doc) and another column with type of word(filled by .doc).
    when i run this script , the output is empty
    SELECT
    name,
    content
    FROM tblWordFiles
    WHERE FREETEXT(content, ‘1’);
    or
    SELECT
    name,
    content
    FROM tblWordFiles
    WHERE contains(content, ‘1’);
    where is my problem

  • prakash.gupta83@hotmail.com

    Retrieve documents from their containts
    I am uploading word,pdf files and storing these file in a drive(C) within a folder and store path of this folder into database.
    I want to search any word within document(word,pdf files) and it should display all the files(word,pdf files) containing that word.

    any help

    Thanks
    Prakash Gupta
    prakash.gupta83@hotmail.com

  • mehdionline

    How to write Stored Procedure as Full text Search
    hi to all teachers,…
    Friends who write in the query mode Full text Search in Sql Server have experience
    Who make(Convert) this Stored Procedure as a normal Full text Search, which contains and .. Is used, into

    Advance of the tips I have thanked all friends perfection.

    Email : rezaafandi@yahoo.com

    Create PROCEDURE Sp_student
    @fname varchar(50),
    @lname varchar(50),
    @tel varchar(50),
    @code varchar(50),
    @adr varchar(50),
    @search_operation varchar(50),
    @totalRowCount bigint output
    AS
    begin
    if @search_operation = ‘and’
    begin
    SELECT f3,f4,f5,f6,f7 FROM tb_student
    WHERE(
    f5 like ‘%’ + @fname + ‘%’ and
    f4 like ‘%’ + @lname + ‘%’ and
    f6 like ‘%’ + @tel + ‘%’ and
    f7 like ‘%’ + @code + ‘%’ and
    f3 like ‘%’ + @adr +’%’
    )
    select @totalRowCount = @@rowcount
    end

  • Michal Jezek

    Re:How to write Stored Procedure as Full text Search
    You can modify SP for example like this.

    ALTER PROCEDURE [dbo].[Sp_student]
    @fname varchar(50),
    @lname varchar(50),
    @tel varchar(50),
    @code varchar(50),
    @adr varchar(50),
    @search_operation varchar(50),
    @totalRowCount bigint output
    AS

    if @search_operation = ‘and’
    begin
    SELECT * FROM student
    WHERE(
    freetext(fname, @fname) and
    freetext(lname, @lname) and
    freetext(tel, @tel) and
    freetext(code, @code) and
    freetext(adr, @adr )
    )
    select @totalRowCount = @@rowcount
    end

    You can use FREETEXT, FREETEXTTABLE, CONTAINS, CONTAINSTABLE, ISABOUT, NEAR, HEIGHT
    is really complex. Good application for setting advance fulltext search is <a href=”http://www.karsa.eu”>Karsa SQL Search manager</a>