Table of Contents
5. Hash Indexes
[NEW in v5.6]
Hash index is a good feature for indexing long string and binary fields. Long means that they should be few times longer than hash itself, which is 4 bytes. Taking this into account, Valentina DB allows hash index only for String, VarChar, FixedBinary and VarBinary fields.
1) Some of the databases that have Hash Index does not have such limitation, and allow you to specify hash index e.g for LONG type field. Later they suggest in documentation to avoid Hash Index for such cases. It is really meaningless combination, so we just excluded it.
2) We could, in theory, support hash index for BLOB/TEXT types also, but taking into account the extraction of original values for compare, we have decided to avoid this for now. If somebody in future will show us the need in such feature, then we can add it.
When to Use Hash Index
Hash Index can be a good choice in following cases:
- you have PRIMARY KEY of string type. Since it is PK, we will do only EQUAL '=' and NOT_EQUAL '<>' searches. This is exactly what can do Hash Index.
- you have to index some strings longer than 255 bytes. Valentina uses up to 255 bytes to index strings. If this is not enough for you and if you do EQUAL searches only, you can use Hash Index also.
- you have Binary/VarBinary values. Usually they cannot be indexed. But Hash Index allows you to hash them and do EQUAL searches.
How it Works
- Valentina DB allows for hash index to be both UNIQUE and NOT UNIQUE.
- Hash of NULL is NULL, so it is not stored in the hash index.
- Hash Index does not store the original value in the index. In case of hash-collision hash index jumping into an indexed column to extract the corresponding Value.
- Interesting to note, that in Valentina DB hash index is the most natural and effective when it is UNIQUE. Because for unique values it is rare case when 2 different unique values have same hash and it is required to check the original value from the column. If the index is NOT UNIQUE, then in the worse case we can have N same values, which produce the same N hashes, so index have only 1 same item with the list of N recIDs.
API
Use 'fIndexed' + 'fHash' flags of VField
fld = tbl.CreateVarCharField( "fld_name", 2044, fIndexed + fHash ) fld = tbl.CreateVarCharField( "fld_name", 2044, fIndexed + fHash + fUnique )
SQL
In SQL you can define indexing of a field(s) in few ways.
-- both forms works the same (almost). -- The only difference is - "INDEXED" forces to create index immediately. CREATE TABLE T1( f1 VARCHAR(2044) INDEXED HASH f2 VARCHAR(2044) HASH )
- Using CREATE INDEX command.
CREATE INDEX index_name ON TABLE_NAME USING HASH ( COLUMN, ... )
- Use DROP IDEX to remove hash index.