Switch to: V12V11V10V9V8V7V6V5

2. Index Types

Valentina DB provides few index types:

  • Default - which is kind of inverted list (not B-tree).
  • Hash - calculates hash of a given value.

Each index types uses its own algorithm and storage format.

Besides, there can be modifications for each index type. For example UNIQUE, which additionally apply requirement that values are unique.

Default Index

Default index can handle equality and range queries on data that can be sorted into some ordering. This index can handle such types of comparison operators as: >, >=, <, ⇐, =, <> and SQL operators BETWEEN and IN.

Index also can be used for queries 'LIKE' if the pattern is a constant or is anchored to the beginning of string, for example, WHERE f1 LIKE 'foo%'.

You can imagine indexes like array of following structures sorted by fldValue:

pair {fldValue, ValueCount} + array{ RecID1, RecID2, ...}

For string/varchar values default index uses only first 255 bytes, because it keeps values in the pascal string format with 1-byte length prefix.

Indexes and NULLs

Valentina DB, in contrast to other DBMS, does not use indexes for 'IS NULL' or 'IS NOT NULL' searches, because of its columnar storage format of Tables. Because of this Valentina does not put NULL values into index at all.

Unique Index Description

Unique Index “knows” that each fldValue can exist only once. Thanks to this we can optimize structure of index:

  • no need to have ValueCount
  • we will have not list of RecIDs but just one RecID.

so you can think about Unique Index as array of following structures sorted by fldValue:

pair {fldValue, RecID}

Index By Words Description

Index By Words is the same as default index on strings, except it splits the given string into words using ICU BreakIterator, then adds each word as a separate item of the index with the same RecID.

In general case Index By Words is bigger than the default string Index because:

  • the default index truncates long strings to the first 255 bytes, while IndexByWords saves each word even of very long strings.
  • the default index associates the given string with one RecID, while IndexbyWords stores this RecID many times for each word.

Also it is easy to see that the time of adding/removing of the string to index by words is up to N times longer of the default index, where N is the number of words in the string. Of course some optimizations are possible and Valentina DB does them.

IndexByWords cannot be unique, because there is no sense in this.

IndexByWords can be used for long VarChars and TEXT fields.

Note, that this is not a full-text index.

Hash Index

Hash index can only handle simple equality comparisons. It can be used with = or <> operators only.

Valentina DB allows this type of index only for String, VarChar, FixedBinary and VarBinary field types.