DISTINCTROW in Valentina SQL
Learn about how to select distinct rows from a join result with Valentina SQL.
MS Access offers its own additional keyword - DISTINCTROW to get rid of row-duplicates in the result (see the difference between row-duplicates and value-duplicates below).
Valentina offers another approach based on “GROUP BY” and aggregative functions like FIRST()/LAST().
Let's consider an example: there are two tables where the first one is for persons and the second one is for their phones.
CREATE TABLE Person ( id LONG, fName STRING(20), lName STRING(20) ); CREATE TABLE Phone ( id LONG, person_id LONG, number LONG ); INSERT INTO Person VALUES ( 1, 'John', 'Doe' ); INSERT INTO Person VALUES ( 2, 'Ivan', 'Ivanov' ); INSERT INTO Person VALUES ( 3, 'John', 'Doe' ); INSERT INTO Phone VALUES ( 1, 1, 555111 ); INSERT INTO Phone VALUES ( 2, 1, 555222 ); INSERT INTO Phone VALUES ( 3, 2, 555333 ); INSERT INTO Phone VALUES ( 4, 3, 555444 );
Now we want to select all persons who have phone(s). If we do it as simple as:
SELECT fName, lName FROM Person JOIN Phone ON Person.id=Phone.person_id;
we will get a correct result but there will be some duplicated records because of 2 phones belonged to the first “John Doe”.
We can use DISTINCT to remove identical values from the result of any query:
SELECT DISTINCT fName, lName FROM Person JOIN Phone ON Person.id=Phone.person_id;
One of the problems is the performance - such a DISTINCT cause sorting all the fields in the SELECT list and removing all (lName+fName) duplicates. Note, that it could be the number of fields including lengthy ones, like a TEXT or String(2000) - so sorting out such data may take a lot of time and requires more memory.
Another problem with such a query is the possibility of removing unintended rows - e.g. one of the “John Doe” person was wrongfully removed (pay attention that we have two different persons named “John Doe”). In other words “DISTINCT” is about to remove value-duplicates.
Sure, this problem can be solved this way:
SELECT DISTINCT Person.id, fName, lName FROM Person JOIN Phone ON Person.id=Phone.person_id;
But having an additional field in the select list might be inconvenient.
MS Access offers own additional keyword - DISTINCTROW, which can be used in place of DISTINCT. This keeps resulting rows which have identical values but come from different records from collapsing into a single row. With DISTINCTROW they don't have to include the key in the output.
Valentina offers another way based on grouping by “key”:
SELECT FIRST(fName), FIRST(lName) FROM Person JOIN Phone ON Person.id=Phone.person_id GROUP BY Person.RecID;
Sure, you can use your own primary key like id field:
SELECT FIRST(fName), FIRST(lName) FROM Person JOIN Phone ON Person.id=Phone.person_id GROUP BY Person.id;
but grouping by RecID much more effective and you should prefer this way.