A CONTEXT Index is one of a class of Domain Indexes. It is used to build text-retrieval application.
Instead of a regular B-Tree index that captures the entire text of a VARCHAR2 column as a key, you can build an index that consists of "Tokens", words extracted from a long-ish text in the database column. Searching the index is based on the CONTAINS operator.
Here is a quick demo in 11.2.0.4 :
First I setup a new user for this demo. I could have used an existing user but my "HEMANT" user has DBA privileges and I want to demonstrate CONTEXT without such privileges.
Here the key grant is "CTXAPP" that is granted to the user.
Next, this user creates the demonstration table and index :
Note the call to CTX_DDL.SYNC_INDEX. Unlike a regular B-Tree index, the CONTEXT (Domain) Index is, by default, *not* updated in real-time when DML occurs against the base table. (Hopefully, in a future post, I may demonstrate real-time updates to a CTXCAT index, different from a CONTEXT Index).
Only remember that this CONTEXT index is not automatically updated. If new rows are added to the table, they are not visible through the index until a SYNC_INDEX operation is performed. The SYNC_INDEX can be scheduled as a job with another call to DBMS_JOB or DBMS_SCHEDULER or itself as part of the CREATE INDEX statement.
Now, let me demonstrate usage of the Index with the CONTAINS operator.
Yes, the CONTAINS operator is a bit awkward. It will be some time before you (or your developers) get used to the syntax !
Besides CTX_DDL, there are a number of other packages in the prebuilt CTXSYS schema that are available :
CTX_CLS
CTX_DDL
CTX_DOC
CTX_OUTPUT
CTX_QUERY
CTX_REPORT
CTX_THES
CTX_ULEXER
Since I have created a separate database user, I can also demonstrate the additional objects that are created when the INDEXTYPE IS CTXSYS.CONTEXT.
Yes, that is a large number of database objects besides MY_TEXT_TABLE and MY_TEXT_INDEX. SYS_C0017472 is, of course, the Primary Key Index on MY_TEXT_TABLE (on the ID_COLUMN column). The others are interesting.
The "Tokens" I mentioned in the first paragraph are, for the purpose of this table MY_TEXT_TABLE, in the DR$MY_TEXT_INDEX$I.
I have been busy in the last few months and have not published much this quarter. Hopefully, I will get more time in the coming weeks to explore CONTEXT, CTXCAT and Domain Indexes.
.
.
.
Instead of a regular B-Tree index that captures the entire text of a VARCHAR2 column as a key, you can build an index that consists of "Tokens", words extracted from a long-ish text in the database column. Searching the index is based on the CONTAINS operator.
Here is a quick demo in 11.2.0.4 :
First I setup a new user for this demo. I could have used an existing user but my "HEMANT" user has DBA privileges and I want to demonstrate CONTEXT without such privileges.
Here the key grant is "CTXAPP" that is granted to the user.
SQL> create user ctxuser identified by ctxuser default tablespace users;
User created.
SQL> grant create session, create table to ctxuser;
Grant succeeded.
SQL> grant ctxapp to ctxuser;
Grant succeeded.
SQL> alter user ctxuser quota unlimited on users;
User altered.
SQL>
Next, this user creates the demonstration table and index :
SQL> connect ctxuser/ctxuser
Connected.
SQL> create table my_text_table
2 (id_column number primary key,
3 my_text varchar2(2000));
Table created.
SQL> create index my_text_index
2 on my_text_table(my_text)
3 indextype is ctxsys.context;
Index created.
SQL>
SQL> insert into my_text_table
2 values (1,'This is a long piece of text written by Hemant');
1 row created.
SQL> insert into my_text_table
2 values (2,'Another long text to be captured by the index');
1 row created.
SQL> commit;
Commit complete.
SQL>
SQL> -- this update to the index would be a regular background job
SQL> exec ctx_ddl.sync_index('MY_TEXT_INDEX');
PL/SQL procedure successfully completed.
SQL>
Note the call to CTX_DDL.SYNC_INDEX. Unlike a regular B-Tree index, the CONTEXT (Domain) Index is, by default, *not* updated in real-time when DML occurs against the base table. (Hopefully, in a future post, I may demonstrate real-time updates to a CTXCAT index, different from a CONTEXT Index).
Only remember that this CONTEXT index is not automatically updated. If new rows are added to the table, they are not visible through the index until a SYNC_INDEX operation is performed. The SYNC_INDEX can be scheduled as a job with another call to DBMS_JOB or DBMS_SCHEDULER or itself as part of the CREATE INDEX statement.
Now, let me demonstrate usage of the Index with the CONTAINS operator.
SQL> select id_column as id,
2 my_text
3 from my_text_table
4 where contains (my_text, 'written by Hemant') > 0
5 /
ID
----------
MY_TEXT
--------------------------------------------------------------------------------
1
This is a long piece of text written by Hemant
SQL> select my_text
2 from my_text_table
3 where contains (my_text, 'Another long') > 0
4 /
MY_TEXT
--------------------------------------------------------------------------------
Another long text to be captured by the index
SQL>
Yes, the CONTAINS operator is a bit awkward. It will be some time before you (or your developers) get used to the syntax !
Besides CTX_DDL, there are a number of other packages in the prebuilt CTXSYS schema that are available :
CTX_CLS
CTX_DDL
CTX_DOC
CTX_OUTPUT
CTX_QUERY
CTX_REPORT
CTX_THES
CTX_ULEXER
Since I have created a separate database user, I can also demonstrate the additional objects that are created when the INDEXTYPE IS CTXSYS.CONTEXT.
SQL> select object_type, object_name, to_char(created,'DD-MON-RR HH24:MI') Crtd
2 from user_objects
3 order by object_type, object_name
4 /
OBJECT_TYPE OBJECT_NAME CRTD
------------------- ------------------------------ ------------------------
INDEX DR$MY_TEXT_INDEX$X 10-DEC-17 16:48
INDEX DRC$MY_TEXT_INDEX$R 10-DEC-17 16:48
INDEX MY_TEXT_INDEX 10-DEC-17 16:48
INDEX SYS_C0017472 10-DEC-17 16:48
INDEX SYS_IL0000045133C00006$$ 10-DEC-17 16:48
INDEX SYS_IL0000045138C00002$$ 10-DEC-17 16:48
INDEX SYS_IOT_TOP_45136 10-DEC-17 16:48
INDEX SYS_IOT_TOP_45142 10-DEC-17 16:48
LOB SYS_LOB0000045133C00006$$ 10-DEC-17 16:48
LOB SYS_LOB0000045138C00002$$ 10-DEC-17 16:48
TABLE DR$MY_TEXT_INDEX$I 10-DEC-17 16:48
TABLE DR$MY_TEXT_INDEX$K 10-DEC-17 16:48
TABLE DR$MY_TEXT_INDEX$N 10-DEC-17 16:48
TABLE DR$MY_TEXT_INDEX$R 10-DEC-17 16:48
TABLE MY_TEXT_TABLE 10-DEC-17 16:48
15 rows selected.
SQL>
SQL> select table_name, constraint_name, index_name
2 from user_constraints
3 where constraint_type = 'P'
4 order by table_name, constraint_name
5 /
TABLE_NAME CONSTRAINT_NAME
------------------------------ ------------------------------
INDEX_NAME
------------------------------
DR$MY_TEXT_INDEX$K SYS_IOT_TOP_45136
SYS_IOT_TOP_45136
DR$MY_TEXT_INDEX$N SYS_IOT_TOP_45142
SYS_IOT_TOP_45142
DR$MY_TEXT_INDEX$R DRC$MY_TEXT_INDEX$R
DRC$MY_TEXT_INDEX$R
MY_TEXT_TABLE SYS_C0017472
SYS_C0017472
SQL>
Yes, that is a large number of database objects besides MY_TEXT_TABLE and MY_TEXT_INDEX. SYS_C0017472 is, of course, the Primary Key Index on MY_TEXT_TABLE (on the ID_COLUMN column). The others are interesting.
The "Tokens" I mentioned in the first paragraph are, for the purpose of this table MY_TEXT_TABLE, in the DR$MY_TEXT_INDEX$I.
SQL> desc DR$MY_TEXT_INDEX$I
Name Null? Type
----------------------------------------- -------- ----------------------------
TOKEN_TEXT NOT NULL VARCHAR2(64)
TOKEN_TYPE NOT NULL NUMBER(10)
TOKEN_FIRST NOT NULL NUMBER(10)
TOKEN_LAST NOT NULL NUMBER(10)
TOKEN_COUNT NOT NULL NUMBER(10)
TOKEN_INFO BLOB
SQL> select token_text, token_count
2 from dr$my_text_index$i
3 /
TOKEN_TEXT TOKEN_COUNT
---------------------------------------------------------------- -----------
ANOTHER 1
CAPTURED 1
HEMANT 1
INDEX 1
LONG 2
PIECE 1
TEXT 2
WRITTEN 1
8 rows selected.
SQL>
I have been busy in the last few months and have not published much this quarter. Hopefully, I will get more time in the coming weeks to explore CONTEXT, CTXCAT and Domain Indexes.
.
.
.