Spider Data Model : Fields : Scalar Fields : The Analyzer Property : 3.7.1.2.1 TextAnalyzer

3.7.1.2.1 TextAnalyzer
The TextAnalyzer assumes that text values are “plain text” (i.e., no metadata or mark-up). For each field value it tokenizes, the TextAnalyzer generates zero or more terms as lowercased letter/digit/apostrophe sequences separated by consecutive whitespace/punctuation sequences. That is, each contiguous sequence of letters, digits, and/or apostrophes becomes a term. A term can contain an apostrophe, but it cannot begin or end with an apostrophe. For example, the apostrophe is included in doesn’t, but outer apostrophes in the sequence ‘tough’ are excluded, yielding the term tough. An apostrophe is any of the following characters:
As example of how text fields are tokenized by the TextAnalyzer, suppose an email object is created with the following text field values, all indexed with TextAnalyzer:
From: John Smith
To: Betty Sue
Subject: The Office Move
Body: Hi Betty,
Just a reminder that you’re scheduled to move to your “fancy” new office tomorrow, number B413. If you have any questions, please let me know.
Thanks, John.
The TextAnalyzer indexes these fields to generate the following terms:
 
As shown, terms are extracted in lowercase, and punctuation and whitespace are removed. As part of down-casing, the TextAnalyzer converts any apostrophe retained within a term to the “straight apostrophe” character (0x27.) Although a term may appear multiple times within a field, it is indexed but once.
Though not shown above, the TextAnalyzer also creates a term equal to the term’s entire field value, down-cased, and enclosed in single quotes. This value is used as an optimization for equality searches. For example, the whole-field value for the field From is 'john smith'. For text fields with large values, this “whole field” value is created as an MD5 value instead of the literal text.
The terms generated by the TextAnalyzer allow efficient execution of a wide range of full text queries: single terms, phrases, wildcard terms, range clauses, etc. Searches are performed without case sensitivity: for example the phrase “You’re Scheduled to MOVE” will match the Body field shown above.