Empress
Technical News – March 2009
Database Text SearchIndex – Fast Text Data Retrieval – Part 2
Introduction
Empress
Ultra Embedded 10.20 offers many additional features to application developers.
One of those features is a Text Search Index capability for fast text data
retrieval.
Empress Text
Search Index
Empress
Ultra Embedded V10.20 Text Search Index capability empowers application
developers to implement an efficient search for database records using
keywords/tokens/phrases. The most typical usage would be to associate those
keywords with particular character/text based attribute in a database table.
The
search index capability is developed as an additional set of C calls that are
used in conjunction with Empress C/C++ Kernel Level API – mr Routines. The
search index is a user maintained index – an index not maintained via Empress
database engine calls.
Application
would supply the list (array) of tokens/keywords/phrases on insertion in the
Empress database in order to create a text search index.
In “Fast
Text Data Retrieval - Part 1” a Text Search facility was demonstrated on a very
simple retrieval example. In Part 2, we are going to demonstrate how to create
a text search index, how to insert keywords/tokens into this index and how to
perform more complex searches. In Part 1, the example was written for a Windows
Mobile device. This time, in Part 2, the examples are written for Linux
devices.
Create
Search Index
The
following example provides the Empress C/C++ Kernel Level API – mr Routines
program code (make_database.c) to show how to create the search index on
the table songs.
/*The following example provides the actual mr program code to
show how to create the table songs and the search index on the same table.
When translated into SQL the example does something like:
CREATE TABLE songs (id INTEGER, title NLSCHAR (80,1))
CREATE TEXTSEARCH INDEX ON songs (title)
*/
#include              < mscc.h>
#include 
 
 
" tsi_api.hx"
int 
 
 
 
 
msmain (int argc, char* argv[])
{
 
 
 
 
 
 
 
 
 
void*                     
 
 
 
 
 
 
 
 
 
void*                     
                  mscall (" -" , " CREATE DATABASE karaokedb" )
 
 
 
 
 
 
 
 
 
mscall (" karaokedb"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
" CREATE TABLE songs (id INTEGER, title
NLSCHAR(80,1))"
)
 
 
 
 
 
 
 
 
 
songs_tabdesc
= mropen (" karaokedb" songs" , 'u')
 
 
 
 
 
 
 
 
 
title_attrdesc
= mrngeta (songs_tabdesc, " title"
 
 
 
 
 
 
 
 
 
/* create search index on songs(title) */
 
 
 
 
 
 
 
 
 
if (!mstsi_create
(songs_tabdesc , title_attrdesc))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
Failed to create text search index\n" )
 
 
 
 
 
 
 
 
 
else
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
Succeeded in creating text search index\n"
 
 
 
 
 
 
 
 
 
mrclose
(songs_tabdesc)
 
 
 
 
 
 
 
 
 
return 0
The
created table songs has the following definition:
CREATE TABLE songs
(id INTEGER, title NLSCHAR(80,1))
Insert IntoSearch Index
The
following example provides the Empress C/C++ Kernel Level API – mr Routines
program code (insert_text.c) to show how to insert records into the
table songs and into the search index on the same table.
/*The following example provides the actual mr program code to
show how to insert records into the table songs and into the search index on
the same table.
When translated into pseudo code the example does something
like:
INSERT INTO songs VALUES (1, “I Want To Hold Your Hand”)
INSERT INTO TEXTSEARCH INDEX ON songs(title) VALUES (“I Want
To Hold Your Hand”, “Want”, “Hold”, “Hand”)
INSERT INTO songs VALUES (2, “You Really Got A Hold On Me”)
INSERT INTO TEXTSEARCH INDEX ON songs(title) VALUES
(“Really”,”Got”,”Hold”,”Me”)
*/
#include 
 
 
< mscc.h>
#include 
 
 
< tsi_api.hx>
#define 
 
 
 
DATABASE   
"
karaokedb"
#define 
 
 
 
LIST_MAX   
20
int 
 
 
 
 
msmain (int argc,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
char*  argv[])
{
 
 
 
 
 
 
 
void*                         
                                      songs_recdesc
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
char*                        token_list[LIST_MAX]
              =
mropen (DATABASE, " songs" , 'u'
 
 
 
 
 
 
 
id_attrdesc =
mrngeta (songs_tabdesc, " id" )
 
 
 
 
 
 
 
title_attrdesc
= mrngeta (songs_tabdesc, " title"
 
 
 
 
 
 
 
index_handle= mstsi_open
(songs_tabdesc , title_attrdesc,
 
 
 
 
 
 
 
 
 
  if (index_handle == (void*) 0)
 
 
 
 
 
 
 
 
 
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
Failed to open text search index\n"
                                    return 1
 
 
 
 
 
 
 
 
 
  }
 
 
 
 
 
 
 
songs_recdesc =
mrmkrec (songs_tabdesc)
 
 
 
 
 
 
 
/* insert the first record */
 
 
 
 
 
 
 
mrputvs
(songs_recdesc, id_attrdesc, " 1"
 
 
 
 
 
 
 
mrputvs
(songs_recdesc, title_attrdesc, I
Want To Hold Your Hand" )
 
 
 
 
 
 
 
mradd
(songs_recdesc)
 
 
 
 
 
 
 
token_list[0] =
"
I Want To Hold Your Hand"
 
 
 
 
 
 
 
token_list[1] =
"
Want"
 
 
 
 
 
 
 
token_list[2] =
"
Hold"
 
 
 
 
 
 
 
token_list[3] =
"
Hand"
 
 
 
 
 
 
 
token_list[4] =
0
 
 
 
 
 
 
 
if (!mstsi_add
(index_handle, token_list, 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mrgetptr(songs_recdesc)))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
[1] Failed to insert into text index\n" )
 
 
 
 
 
 
 
 
 
  /*
insert the second record */
 
 
 
 
 
 
 
mrputvs
(songs_recdesc, id_attrdesc, " 2"
 
 
          mrputvs (songs_recdesc, title_attrdesc," You Really Got A Hold On Me"
 
 
 
 
 
 
 
mradd
(songs_recdesc)
 
 
 
 
 
 
 
token_list[0] =
"
Really"
 
 
 
 
 
 
 
token_list[1] =
"
Got"
 
 
 
 
 
 
 
token_list[2] =
"
Hold"
 
 
 
 
 
 
 
token_list[3] =
"
Me"
 
 
 
 
 
 
 
token_list[4] =
0
 
 
 
 
 
 
 
if (!mstsi_add
(index_handle, token_list, 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mrgetptr(songs_recdesc)))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
[2] Failed to insert into text index\n" )
 
 
 
 
 
 
 
mraddend
(songs_recdesc)
 
 
 
 
 
 
 
mrfrrec
(songs_recdesc)
 
 
 
 
 
 
 
mstsi_close
(index_handle)
 
 
 
 
 
 
 
mrclose
(songs_tabdesc)
 
 
 
 
 
 
 
 
 
  printf ( Insert
completed\n"
)
 
 
 
 
 
 
 
return 0
}
Complex
Retrieval Using Search Index With Multiple Tokens
We will
now demonstrate the more complex retrieval example .  The Empress C/C++ Kernel Level API – mr
Routines program code (select2_text.c) shows how to perform a complex
retrieval with multiple tokens from the table songs using the
search index. The example performs the retrieval of all the records from the
song table that contain tokens “Want” and “Hand” and all the records that
contain token “Really”. When translated into SQL the example does something
like:
SELECT
id, title FROM songs WHERE title LIKE “%Want%” AND title LIKE “%Hand%” OR title
LIKE “%Really%”
  /*The following example provides the actual mr program code to
show how to perform a complex retrieval with multiple tokens from the table
songs using the search index. The example performs the retrieval of all the
records from the song table that contain tokens Want and Hand and all the
records that contain token Really. When translated into SQL the example does
something like:
SELECT id, title FROM songs WHERE title LIKE “%Want%”
AND 
title LIKE “%Hand%” OR title LIKE
“%Really%” */
#include      < mscc.h>
#include 
 
 
< tsi_api.hx>
#define 
 
 
 
DATABASE   
"
karaokedb"
int 
 
 
 
 
main (int              argc,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
char*  argv[])
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
char*                        id_value
 
 
 
 
 
 
  char*                       
 
 
 
 
 
 
 
void*                         
 
 
 
 
 
 
 
long*                         
 
 
 
 
 
 
 
long*                         
 
 
 
 
 
 
 
 
 
  char                   
 
 
 
 
 
 
 
 
 
  if (!msinit ())
 
 
 
 
 
 
 
 
 
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
Unable to initialize Empress\n" )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
return 1
 
 
 
 
 
 
 
 
 
  }
 
 
 
 
 
 
 
songs_tabdesc =
mropen (DATABASE, " songs" , 'r'
 
 
 
 
 
 
 
songs_recdesc
= mrmkrec (songs_tabdesc)
 
 
 
 
 
 
 
id_attrdesc
= mrngeta (songs_tabdesc, " id"
 
 
 
 
 
 
 
title_attrdesc
= mrngeta (songs_tabdesc, " title"
 
 
 
 
 
 
 
index_handle= mstsi_open
(songs_tabdesc , title_attrdesc,
 
 
 
 
 
 
 
id_value =
mrspv (id_attrdesc)
 
 
 
 
 
 
 
title_value =
mrspv (title_attrdesc)
 
 
 
 
 
 
 
printf (" Songs that contain Hold and Want or Really\n\n" )
 
 
 
 
 
 
 
 
 
 
printf (" %-12s
%-22s\n"
, " Id" Title" )        
 
 
 
 
 
 
 
record_list1 =
(index_handle, " Hold" Want" 0)
 
 
 
 
 
 
 
if (record_list1 ==
0)
 
 
 
 
 
 
 
{
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
No songs with token Hold or Want\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
return 0
 
 
 
 
 
 
 
}   
 
 
 
 
 
 
 
record_list2 =
(index_handle, " Really" , 0)
 
 
 
 
 
 
 
if (record_list1 ==
0)
 
 
 
 
 
 
 
{
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
printf
("
No songs with token Really\n" )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
return 0
 
 
 
 
 
 
 
}   
 
 
 
 
 
 
 
 
 
  record_list1 = mstsi_union_reclist
(record_list1, record_list2)
 
 
 
 
 
 
 
qual = mrqlst
(songs_tabdesc, record_list1)
 
 
 
 
 
 
 
retrieve_desc =
mrgetbegin (qual, songs_recdesc, (void*) 0)
 
 
 
 
 
 
 
while (mrget
(retrieve_desc))
 
 
 
 
 
 
 
{
 
 
 
 
 
 
 
 
 
 
mrcopyv (songs_recdesc, id_attrdesc, id_value)
 
 
 
 
 
 
 
 
 
 
mrcopyv
(songs_recdesc, title_attrdesc, title_value)
 
 
 
 
 
 
 
 
 
        sprintf (buf, " %-12s %-22s\n"
id_value, title_value)
 
 
 
 
 
 
 
 
 
        printf (" %s" , buf)
 
 
 
 
 
 
 
}
 
 
 
 
 
 
 
mrgetend
(retrieve_desc)
 
 
 
 
 
 
 
mrfree
(id_value)
 
 
 
 
 
 
 
mrfree
(title_value)
 
 
 
 
 
 
 
mrfrrec
(songs_recdesc)
 
 
 
 
 
 
 
mstsi_close
(index_handle)
 
 
 
 
 
 
 
 
 
mrclose
(songs_tabdesc)
 
 
 
 
 
 
 
 
 
 
msend ()
 
 
 
 
 
 
 
 
 
  printf ( select2_text
is done\n"
 
 
 
 
 
 
 
return 0
}
The output of the program is the
same as if you would run the following SQL statement:
SELECT * FROM songs WHERE title LIKE “%Hold%” AND
title LIKE “%Want%” OR title LIKE “%Really%”
Songs that contain Hold and Want or Really
Id                    Title
1 
 
 
 
 
 
 
 
 
 
 
I Want To
Hold Your Hand
2 
 
 
 
 
 
 
 
 
 
 
You Really
Got A Hold On Me
The results from the SQL statementare the same as from the Empress text search index but the difference is in
performance. 
If there are many records,
the search using the Empress text search index could be orders of magnitude
faster than the SQL query producing the same output.
Another advantage of using the
Empress text search index is in flexibility of using tokens/keywords that do
not even have to be present in the title of the text attribute.
Empress
Software Inc.
www.empress.com