1. Beatrix Willius
  2. Valentina Database ADK
  3. Friday, April 28 2017, 12:54 PM
  4.  Subscribe via email
I'm having a speed problem with SQL and need some ideas.

I have a mail table for the mails. Each mail can have multiple attachments. When searching for attachment names I use an extra table with the following SQL:

dim theSQL as String = "insert into AttachmentIDs select mailid from attachment where attachment.attachmentname REGEX '(?i)" + _
Valentina.EscapeString(SearchString) + "' group by MailID"

to speed the main SQL up. This has worked so far until someone got wayyyyy too many mails and attachments.

A search for another field

SELECT Message.MailboxID,Message.RecID,Message.OriginatorFrom as [From],Message.ReceiverTo as [To],Message.MainSubject as [Subject],Message.MainDateTime as [Date],Message.AttachmentCount as [Att],MailboxName(Mailbox.Mailboxpath) as [Mailbox],lower(Message.OriginatorFrom) as [Order_From],lower(Message.ReceiverTo) as [Order_To],lower(Message.MainSubject) as [Order_Subject] 

FROM ((Message INNER JOIN Mailbox ON Message.MailboxID = Mailbox.RecID) LEFT JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID)

WHERE ( Message.ReceiverTo REGEX '(?i)staudenshop' )

ORDER By [Att] ASC, [Order_Subject] ASC


executed in 29milli seconds. If I search for attachments with


SELECT Message.MailboxID,Message.RecID,Message.OriginatorFrom as [From],Message.ReceiverTo as [To],Message.MainSubject as [Subject],Message.MainDateTime as [Date],Message.AttachmentCount as [Att],MailboxName(Mailbox.Mailboxpath) as [Mailbox],lower(Message.OriginatorFrom) as [Order_From],lower(Message.ReceiverTo) as [Order_To],lower(Message.MainSubject) as [Order_Subject]

FROM ((Message INNER JOIN Mailbox ON Message.MailboxID = Mailbox.RecID) LEFT JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID)

WHERE ( Message.RecID = AttachmentIDs.MailID )

ORDER By [Att] ASC, [Order_Subject] ASC


the search takes 3.5 seconds.

I could get the IDs out of the AttachmentIDs table and do an "in" search. Does anyone have a better idea?
Comment
There are no comments made yet.
Hi Beatrix,

Why do you first
"LEFT JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID)"
and next
"WHERE ( Message.RecID = AttachmentIDs.MailID )" ?
Both operations seem unnecessary to me, no ?

I'd write :

SELECT [FieldList]
FROM Message
JOIN Mailbox ON Message.MailboxID = Mailbox.RecID
JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID
ORDER By [Att] ASC, [Order_Subject] ASC


François
Comment
It must be "LEFT JOIN" because some messages have no attachments
  1. Ivan Smahin
  2. 3 years ago
There are no comments made yet.
  1. more than a month ago
  2. Valentina Database ADK
  3. # 1
Additional comments :
- I suppose Message.MailboxID and AttachmentIDs.MailID are both indexed
- why do you first populate the AttachmentIDs table and next issue de select query ? Isn't if faster to just ask directly in one step ?

SELECT [FieldList]
FROM Message
JOIN Mailbox ON Message.MailboxID = Mailbox.RecID
JOIN Attachment ON Attachment.mailid = Message.RecID
WHERE Attachment.attachmentname REGEX '(?i)" + Valentina.EscapeString(SearchString) + "'
ORDER By [Att] ASC, [Order_Subject] ASC

François
Comment
There are no comments made yet.
  1. more than a month ago
  2. Valentina Database ADK
  3. # 2
Ivan Smahin Accepted Answer

WHERE ( Message.RecID = AttachmentIDs.MailID )


slower than

WHERE ( Message.ReceiverTo REGEX '(?i)staudenshop' )


because it is correlated condition (looks like additional join predicate), and such predicate could not be performed once. It works like a filter for each join-result-candidate record.

but I think here is some error - look at this:


LEFT JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID)
WHERE ( Message.RecID = AttachmentIDs.MailID )


You do outer join first and then, checking each record to be equal on the same condition.
It looks like "overcoded" INNER JOIN:


INNER JOIN AttachmentIDs ON Message.RecID = AttachmentIDs.MailID)
Comment
There are no comments made yet.
  1. more than a month ago
  2. Valentina Database ADK
  3. # 3
Beatrix Willius Accepted Answer
@Ivan: you were correct. The Where clause wasn't needed.

The time went only down from 3.5 seconds to 1.8. An improvement of 100%. At the customer the time was a cringeworthy 50 minutes. Bringing this down a 100% will not be enough.

Changing From and Where to


FROM (Message INNER JOIN Mailbox ON Message.MailboxID = Mailbox.RecID)

WHERE Message.RecID in (select mailid from attachment where attachment.attachmentname REGEX '(?i)crashlog' group by MailID)


makes the query as slow as the original one.

Any other ideas?
Comment
There are no comments made yet.
  1. more than a month ago
  2. Valentina Database ADK
  3. # 4
Beatrix Willius Accepted Answer
Ugly as heck hack for now:

1. Write the result of


select mailid from attachment where attachment.attachmentname REGEX '(?i)" + _
Valentina.EscapeString(SearchString) + "' group by MailID"


into an array.

2. Do a


"Message.RecID in (" + Join(AttachmentIDs, ", ") + ")"


in the main SQL for the Where clause.

I'm pretty sure I asked before: what is the limit regarding length for SQL statements?
Comment
There are no comments made yet.
  1. more than a month ago
  2. Valentina Database ADK
  3. # 5
  • Page :
  • 1


There are no replies made for this post yet.
However, you are not allowed to reply to this post.

Categories

Announcements & News
  1. 0 subcategories
Valentina Studio
  1. 2 subcategories
Valentina Server
  1. 4 subcategories
Valentina Database ADK
  1. 0 subcategories
Valentina Reports ADK
  1. 0 subcategories
Other Discussions
  1. 2 subcategories
BETA Testing
  1. 0 subcategories
Education & Research
  1. 0 subcategories