Company

Products

Services

Partners

Media

 

 

 

Home

Empress Markets

News & Events


Presentations

Press Releases

Product Profile

Success Stories

Technical News


White Papers

Join Mailing List

For More Info

           

      Empress Technical News – March 2009

    Database Text SearchIndex – Fast Text Data Retrieval – Part 2

    Introduction

    Empress Ultra Embedded 10.20 offers many additional features to application developers. One of those features is a Text Search Index capability for fast text data retrieval.

    Empress Text Search Index

    Empress Ultra Embedded V10.20 Text Search Index capability empowers application developers to implement an efficient search for database records using keywords/tokens/phrases. The most typical usage would be to associate those keywords with particular character/text based attribute in a database table.

    The search index capability is developed as an additional set of C calls that are used in conjunction with Empress C/C++ Kernel Level API – mr Routines. The search index is a user maintained index – an index not maintained via Empress database engine calls.

    Application would supply the list (array) of tokens/keywords/phrases on insertion in the Empress database in order to create a text search index.

    In “Fast Text Data Retrieval - Part 1” a Text Search facility was demonstrated on a very simple retrieval example. In Part 2, we are going to demonstrate how to create a text search index, how to insert keywords/tokens into this index and how to perform more complex searches. In Part 1, the example was written for a Windows Mobile device. This time, in Part 2, the examples are written for Linux devices.

    Create Search Index

    The following example provides the Empress C/C++ Kernel Level API – mr Routines program code (make_database.c) to show how to create the search index on the table songs.

    /*The following example provides the actual mr program code to show how to create the table songs and the search index on the same table.

    When translated into SQL the example does something like:

    CREATE TABLE songs (id INTEGER, title NLSCHAR (80,1))

    CREATE TEXTSEARCH INDEX ON songs (title)

    */

    #include              < mscc.h>
    #include
          " tsi_api.hx"
    int
              msmain (int argc, char* argv[])
    {

                     
    void*                     
                     
    void*                     

                      mscall (" -" , " CREATE DATABASE karaokedb" )
                     
    mscall (" karaokedb"
                                       
    " CREATE TABLE songs (id INTEGER, title NLSCHAR(80,1))" )
                     
    songs_tabdesc = mropen (" karaokedb" songs" , 'u')
                     
    title_attrdesc = mrngeta (songs_tabdesc, " title"
                     
    /* create search index on songs(title) */
                     
    if (!mstsi_create (songs_tabdesc , title_attrdesc))
                                       
    printf (" Failed to create text search index\n" )
                     
    else
                                       
    printf (" Succeeded in creating text search index\n"
                     
    mrclose (songs_tabdesc)
                     
    return 0
    The created table songs has the following definition:

    CREATE TABLE songs (id INTEGER, title NLSCHAR(80,1))

    Insert IntoSearch Index

    The following example provides the Empress C/C++ Kernel Level API – mr Routines program code (insert_text.c) to show how to insert records into the table songs and into the search index on the same table.

    /*The following example provides the actual mr program code to show how to insert records into the table songs and into the search index on the same table.

    When translated into pseudo code the example does something like:
    INSERT INTO songs VALUES (1, “I Want To Hold Your Hand”)
    INSERT INTO TEXTSEARCH INDEX ON songs(title) VALUES (“I Want To Hold Your Hand”, “Want”, “Hold”, “Hand”)
    INSERT INTO songs VALUES (2, “You Really Got A Hold On Me”)
    INSERT INTO TEXTSEARCH INDEX ON songs(title) VALUES (“Really”,”Got”,”Hold”,”Me”)
    */

    #include
          < mscc.h>
    #include
          < tsi_api.hx>
    #define
            DATABASE    " karaokedb"
    #define
            LIST_MAX    20
    int
              msmain (int argc,
                                       
    char*  argv[])
    {

                 
    void*                         
                 
                            songs_recdesc
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    char*                        token_list[LIST_MAX]

                  = mropen (DATABASE, " songs" , 'u'
                 
    id_attrdesc = mrngeta (songs_tabdesc, " id" )
                 
    title_attrdesc = mrngeta (songs_tabdesc, " title"
                 
    index_handle= mstsi_open (songs_tabdesc , title_attrdesc,
                     
      if (index_handle == (void*) 0)
                     
      {
                                       
    printf (" Failed to open text search index\n"

                                        return 1
                     
      }

                 
    songs_recdesc = mrmkrec (songs_tabdesc)
                 
    /* insert the first record */
                 
    mrputvs (songs_recdesc, id_attrdesc, " 1"
                 
    mrputvs (songs_recdesc, title_attrdesc, I Want To Hold Your Hand" )
                 
    mradd (songs_recdesc)
                 
    token_list[0] = " I Want To Hold Your Hand"
                 
    token_list[1] = " Want"
                 
    token_list[2] = " Hold"
                 
    token_list[3] = " Hand"
                 
    token_list[4] = 0
                  if (!mstsi_add (index_handle, token_list, 
                                                           
    mrgetptr(songs_recdesc)))
                                       
    printf (" [1] Failed to insert into text index\n" )
                     
      /* insert the second record */
                 
    mrputvs (songs_recdesc, id_attrdesc, " 2"
       
              mrputvs (songs_recdesc, title_attrdesc," You Really Got A Hold On Me"
                 
    mradd (songs_recdesc)
                 
    token_list[0] = " Really"
                 
    token_list[1] = " Got"
                 
    token_list[2] = " Hold"
                 
    token_list[3] = " Me"
                 
    token_list[4] = 0
                 
    if (!mstsi_add (index_handle, token_list, 
                                                           
    mrgetptr(songs_recdesc)))
                                       
    printf (" [2] Failed to insert into text index\n" )
                 
    mraddend (songs_recdesc)
                 
    mrfrrec (songs_recdesc)
                 
    mstsi_close (index_handle)
                 
    mrclose (songs_tabdesc)
                     
      printf ( Insert completed\n" )
                 
    return 0
    }

    Complex Retrieval Using Search Index With Multiple Tokens

    We will now demonstrate the more complex retrieval example .  The Empress C/C++ Kernel Level API – mr Routines program code (select2_text.c) shows how to perform a complex retrieval with multiple tokens from the table songs using the search index. The example performs the retrieval of all the records from the song table that contain tokens “Want” and “Hand” and all the records that contain token “Really”. When translated into SQL the example does something like:

    SELECT id, title FROM songs WHERE title LIKE “%Want%” AND title LIKE “%Hand%” OR title LIKE “%Really%”

      /*The following example provides the actual mr program code to show how to perform a complex retrieval with multiple tokens from the table songs using the search index. The example performs the retrieval of all the records from the song table that contain tokens Want and Hand and all the records that contain token Really. When translated into SQL the example does something like:

    SELECT id, title FROM songs WHERE title LIKE “%Want%” AND  title LIKE “%Hand%” OR title LIKE “%Really%” */

    #include      < mscc.h>
    #include
          < tsi_api.hx>
    #define
            DATABASE    " karaokedb"
    int
              main (int              argc,
                                       
    char*  argv[])
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    char*                        id_value
               
      char*                       
                 
    void*                         
                 
    long*                         
                 
    long*                         
                     
      char                   


                     
      if (!msinit ())
                     
      {
                                       
    printf (" Unable to initialize Empress\n" )
                                       
    return 1
                     
      }
                 
    songs_tabdesc = mropen (DATABASE, " songs" , 'r'
                 
    songs_recdesc = mrmkrec (songs_tabdesc)
                 
    id_attrdesc = mrngeta (songs_tabdesc, " id"
                 
    title_attrdesc = mrngeta (songs_tabdesc, " title"
                 
    index_handle= mstsi_open (songs_tabdesc , title_attrdesc,
                 
    id_value = mrspv (id_attrdesc)
                 
    title_value = mrspv (title_attrdesc)
                 
    printf (" Songs that contain Hold and Want or Really\n\n" )
                     
      printf (" %-12s %-22s\n" , " Id" Title" )        
                 
    record_list1 = (index_handle, " Hold" Want" 0)
                 
    if (record_list1 == 0)
                 
    {
                                       
    printf (" No songs with token Hold or Want\n"
                                       
    return 0
                 
    }   
                 
    record_list2 = (index_handle, " Really" , 0)
                 
    if (record_list1 == 0)
                 
    {
                                       
    printf (" No songs with token Really\n" )
                                       
    return 0
                 
    }   
                     
      record_list1 = mstsi_union_reclist (record_list1, record_list2)
                 
    qual = mrqlst (songs_tabdesc, record_list1)
                 
    retrieve_desc = mrgetbegin (qual, songs_recdesc, (void*) 0)
                 
    while (mrget (retrieve_desc))
                 
    {
                       
    mrcopyv (songs_recdesc, id_attrdesc, id_value)
                       
    mrcopyv (songs_recdesc, title_attrdesc, title_value)
                     
            sprintf (buf, " %-12s %-22s\n" id_value, title_value)
                     
            printf (" %s" , buf)
                 
    }
                 
    mrgetend (retrieve_desc)
                 
    mrfree (id_value)
                 
    mrfree (title_value)
                 
    mrfrrec (songs_recdesc)
                 
    mstsi_close (index_handle)    
                 
    mrclose (songs_tabdesc)
                     
      msend ()
                     
      printf ( select2_text is done\n"
                 
    return 0
    }

    The output of the program is the same as if you would run the following SQL statement:

    SELECT * FROM songs WHERE title LIKE “%Hold%” AND
    title LIKE “%Want%” OR title LIKE “%Really%”

    Songs that contain Hold and Want or Really

    Id                    Title
    1
                          I Want To Hold Your Hand
    2
                          You Really Got A Hold On Me

    The results from the SQL statementare the same as from the Empress text search index but the difference is in performance.  If there are many records, the search using the Empress text search index could be orders of magnitude faster than the SQL query producing the same output.

    Another advantage of using the Empress text search index is in flexibility of using tokens/keywords that do not even have to be present in the title of the text attribute.

    Empress Software Inc.
    www.empress.com

                       
                             
                         

Company
Information
GSA Contracts
Careers
Privacy Policy 
Contact US

Products
Empress RDBMS
Empress Servers
Empress API
Free Trial 
Empress iPad Apps
 

Services
Consulting
Mobile Apps Development
Technical Support

Training Courses
Empress Extreme  

Partners
Distributors
Business Alliances
Partner Programs  
GSA Program
Distributors Program

Media 
Empress Markets
News and Events
Press releases
Technical News
Success Stories

        USA: 301-220-1919                 Canada & International:       905-513-8888         
Copyright © 2014           Empress Software Inc.           info@empress.com           Join Empress Mailing List