FreePascal Information Logo Friend of FreePascal Compiler Title
Articles with Feedback, FPC News Library, PDF Collection, Mail Lists, Books, Newsgroups, IRC Open online discussion areas Research and Tutorials Tools, Compilers and Utilities Blurbs about us, advertising, etc.
Welcome to the FoFPC Research Notes: SoundEx

Data Searching using Soundex

by: G.E. Ozz Nixon Jr.
Published: September 2007
©opyright 2009 by Friends of FPC



     Data Searching is a very complex art in itself. Ignoring the long history of Internet "search engines" growing from Archie in 1992 to today's Google engine. Original techniques to provide users with "full text" search required a solution designed for common mistakes, one of which is called Soundex.

     Soundex is a phonetic algorithm for indexing words by sound as pronounced in American English. The goal of course was to encode homophones to the same representation so they could be matched despite minor differences in spelling. Unfortunately Soundex is probably the most used algorithm used in data searching and is often used incorrectly. Soundex is quite simple:

Code snippet from DXSock 6.0 WLM (Courtesy of Brain Patchwork DX, LLC)
function SoundEx(Const S: string): string;
const
   Table: array[1..26] of Char = '.123.12..22455.12623.1.2.2';

var
   SoundString: string[255];
   I1: Integer;
   I2: Integer;
   isNum: boolean;
   Ch: Char;
   Str:String;
   MaxLoop:Integer;

begin
   Result := S;
   if S = '' then Exit;
   Str:=S;
   isNum := true;
   repeat
      Ch := UpCase(S[1]);
      if Ch > #64 then isNum := false
      else Delete(Str, 1, 1);
   until (isNum = false) or (Str = '');
   Result := Str;
   if Str = '' then Exit;
   SoundString[0] := #255;
   FillChar2(SoundString[1], 255, '0');
// Step 1: ASCII to Soundex
{$IFDEF ASSEMBLY}
   asm // 1090108: By Ozz (FASTER THAN Length()!)
     MOV EAX, S;       // Store Str Address
     MOV EAX, [EAX-$04]; // Move to "Size" Int32
     MOV MaxLoop, EAX;    // Put into Result
   End;
{$ELSE}
   MaxLoop:=Length(S);
{$ENDIF}
   for I1 := 1 to MaxLoop - 1 do begin
      I2 := Ord(UpCase(S[I1 + 1])) - 64;
      if ((I2 < 1) or (I2 > 26)) then I2 := 1;
      SoundString[I1] := Table[I2];
   end;
// Initialize for second pass
   I1 := 1;
   repeat
      while (SoundString[I1] = '.') do
         Delete(SoundString, I1, 1);
      while ((SoundString[I1] = SoundString[I1 + 1]) and (SoundString[I1] <> '0')) do
         Delete(SoundString, I1, 1);
      Inc(I1);
   until (SoundString[I1] = '0');
   Result := Ch + Copy(SoundString, 1, 3);
end;

     Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" and "Robin" yields "R150". "Ashcroft" and "Ashcraft" yields "A226". However, "George" and "Jorge" yields "G620" and "J620" and "Dave" and "David" yields "D100" and "D130".

     This simple example shows a need for other comparison algorithms. Read more on other comparison techniques I used in my search engines.

G.E. Ozz Nixon Jr.
 Links and Products we find useful



ButtonGenerator.com
Valid XHTML 1.0 Transitional Internet Map
Programmer's Heaven
grat-i-fi-ca-tion - noun
the state of being gratified; great satisfaction.


I am very happy to see that FoFPC is alive, we need more developers working on this great language.
Matias Vara
toro.sourceforge.net
Locations of visitors to this page world map hits counter
Copyright 2009 by 3F, LLC. All rights reserved. Worldwide.
Your request was processed by server #3 in 0.001546 secs.

sponsor
This sponsor helps us with our documentation