Switch to: V9V8V7V6V5

5. Hash Indexes

[NEW in v5.6]

Hash index is a good feature for indexing long string and binary fields. Long means that they should be few times longer than hash itself, which is 4 bytes. Taking this into account, Valentina DB allows hash index only for String, VarChar, FixedBinary and VarBinary fields.

1) Some of the databases that have Hash Index does not have such limitation, and allow you to specify hash index e.g for LONG type field. Later they suggest in documentation to avoid Hash Index for such cases. It is really meaningless combination, so we just excluded it.

2) We could, in theory, support hash index for BLOB/TEXT types also, but taking into account the extraction of original values for compare, we have decided to avoid this for now. If somebody in future will show us the need in such feature, then we can add it.

When to Use Hash Index

Hash Index can be a good choice in following cases:

  • you have PRIMARY KEY of string type. Since it is PK, we will do only EQUAL '=' and NOT_EQUAL '<>' searches. This is exactly what can do Hash Index.
  • you have to index some strings longer than 255 bytes. Valentina uses up to 255 bytes to index strings. If this is not enough for you and if you do EQUAL searches only, you can use Hash Index also.
  • you have Binary/VarBinary values. Usually they cannot be indexed. But Hash Index allows you to hash them and do EQUAL searches.

How it Works

  • Valentina DB allows for hash index to be both UNIQUE and NOT UNIQUE.
  • Hash of NULL is NULL, so it is not stored in the hash index.
  • Hash Index does not store the original value in the index. In case of hash-collision hash index jumping into an indexed column to extract the corresponding Value.
  • Interesting to note, that in Valentina DB hash index is the most natural and effective when it is UNIQUE. Because for unique values it is rare case when 2 different unique values have same hash and it is required to check the original value from the column. If the index is NOT UNIQUE, then in the worse case we can have N same values, which produce the same N hashes, so index have only 1 same item with the list of N recIDs.

API

Use 'fIndexed' + 'fHash' flags of VField

fld = tbl.CreateVarCharField( "fld_name", 2044, fIndexed + fHash )
fld = tbl.CreateVarCharField( "fld_name", 2044, fIndexed + fHash + fUnique )

SQL

In SQL you can define indexing of a field(s) in few ways.

-- both forms works the same (almost).
-- The only difference is - "INDEXED" forces to create index immediately.
CREATE TABLE T1( 
    f1 VARCHAR(2044) INDEXED HASH 
    f2 VARCHAR(2044) HASH 
)
CREATE INDEX index_name ON table_name USING HASH ( COLUMN, ... )