Salesforce Developer Soql Optimizer

salesforce-developer-soql

https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_SOQL_VLSQ.htm
https://developer.salesforce.com/page/Best_Practices_for_Deployments_with_Large_Data_Volumes
https://developer.salesforce.com/page/Architect_Core_Resources
https://developer.salesforce.com/blogs/engineering/2014/11/force-com-soql-performance-tips-systemmodstamp-vs-lastmodifieddate-2.html
https://developer.salesforce.com/blogs/engineering/2014/02/the-salesforce-bulk-api-maximizing-parallelism-and-throughput-performance-when-integrating-or-loading-large-data-volumes.html
https://developer.salesforce.com/page/Webinar:_Inside_the_Force.com_Query_Optimizer_(2013-Apr)
https://developer.salesforce.com/blogs/engineering/2013/05/dealing-with-exception-filters-in-force-com.html

The Force.com query optimizer is an engine that sits between your SOQL, reports, 
and list views and the database itself. Because of salesforce.com’s multitenancy, 
the optimizer gathers its own statistics instead of relying on the underlying 
database statistics. Using both these statistics and pre-queries, the optimizer 
generates the most optimized SQL to fetch your data. It looks at each filter in 
your WHERE clause to determine which index, if any, should drive your query.

To determine if an index should be used to drive a query, the Force.com query 
optimizer checks the number of records targeted by the filter against 
selectivity thresholds. For a standard index, the threshold is 30 percent of the 
first million targeted records and 15 percent of all records after that first 
million. In addition, the selectivity threshold for a standard index maxes out 
at 1 million total targeted records, which you could reach only if you had more 
than 5.6 million total records.

So if you had 2.5 million accounts, and your SOQL contained a filter on a 
standard index, that index would drive your query if the filter targeted fewer 
than 525,000 accounts.

SELECT id FROM Account WHERE CreatedDate  > 2013-01-01T00:00:00Z

(30% of 1 to 1 million targeted records) + (15% of 1 million to 2.5 million 
targeted records) = 300,000 + 225,000  =  525,000

For a custom index, the selectivity threshold is 10 percent of the first million 
targeted records and 5 percent all records after that first million. In addition, 
the selectivity threshold for a custom index maxes out at 333,333 targeted 
records, which you could reach only if you had more than 5.6 million records.

If the previous query were changed so that it used a filter on a field with a 
custom index, the threshold for 2.5 million accounts would be 175,000.

SELECT id FROM Account WHERE CustomIndexedDate__c  > 2013-01-01T00:00:00Z

(10% of 1 to 1 million targeted records) + (5% of 1 million to 2.5 million 
targeted records) = 100,000 + 75,000 = 175,000

In these standard index and custom index examples, the Force.com query optimizer 
does use the standard and custom indexes, as each number of targeted records 
falls below the appropriate selectivity threshold. If, on the other hand, the 
number of targeted records exceeds an index’s selectivity threshold, the 
Force.com query optimizer does not use that index to drive the query.

Common Causes of Non-Selective SOQL Queries:

1. Having Too Much Data.  Whether you’re displaying a list of records through a 
   Visualforce page or through a list view, it’s important to consider the user 
   experience. Pagination can help, but will your users really go through a list 
   with thousands of records? You might not have this much data in your current 
   implementation, but if you don’t have enough selective filters, these long 
   lists can easily become an issue as your data grows. Design your SOQL, 
   reports, and list views with large data volumes in mind.

2. Performing Large Data Loads.  Large data loads and deletions can affect query 
   performance. The Force.com query optimizer uses the total number of records 
   as part of the calculation for its selectivity threshold.  This number takes 
   into account your recently deleted records. A deleted record remains in the 
   Recycle Bin for 15 days—or even less time if you exceed your storage limit, 
   and the record has been in the Recycle Bin for at least two hours—and then 
   that record is actually removed from the Recycle Bin or flagged for a 
   physical delete. When the Force.com query optimizer judges returned records 
   against its thresholds, all of the records that appear in the Recycle Bin or 
   are marked for physical delete do still count against your total number of 
   records.

   From our earlier example of accounts and a custom indexed field, the 
   selectivity threshold was 175,000, and the total number of records was 2.5 
   million.  Let’s say that a Bulk API job runs and deletes all records before 
   January 1, 2013, and those records total 2.4 million. That leaves us with 
   100,000 non-deleted account records. If the deleted records are still in the 
   Recycle Bin, the Force.com optimizer mistakenly thinks that the 100,000 
   non-deleted records fall under and meet a 2.5 million-record selectivity 
   threshold, and it generates a query that isn’t optimized. In reality, the 
   threshold is 10,000 targeted records (10 percent of 100,000 targeted 
   records).

   If the deleted records do not need to go to the Recycle Bin, use the hard 
   delete option in the Bulk API or contact salesforce.com Customer Support to 
   physically delete the records.  If your data loads cause the records targeted 
   by your filter to exceed the selectivity threshold, you might need to include 
   additional filters to make your queries selective again.

3. Using Leading % Wildcards. A LIKE condition with a leading % wildcard does 
   not use an index.  

   SELECT id FROM Account WHERE Name LIKE ‘%Acme%’

   This is the type of query that would normally work better with SOSL. However, 
   if you need real-time results, an alternative is to create a custom search 
   page, which restricts leading % wildcards and adds governance on the search 
   string(s).  Within a report/list view, the CONTAINS clause translates into 
   ‘%string%’.

4. Using NOT and !=.  When your filter uses != or NOT—which includes using NOT 
   EQUALS/CONTAINS for reports, even if the field is indexed—the Force.com query 
   optimizer can’t use the index to drive the query. For better performance, 
   filter using = or IN, and the reciprocal values.  Use the reciprocal values 
   instead. 

   SELECT id FROM Case WHERE Status != ‘Closed’
   SELECT id FROM Case WHERE Status IN (‘New’, ‘On Hold’, ‘Pending’, ‘ReOpened’)

   The following query won’t use the index on the Status field because of the 
   NOT. The problem with this query is that even if you change it to use the 
   reciprocal values, the index on the Status field won’t meet the selectivity 
   threshold because of the Closed value. In this case, you do want to add 
   additional filters to reduce the number of records retrieved.

   SELECT id FROM Case WHERE NOT (Status IN (‘On Hold’, ‘Pending’, ‘ReOpened’))
   SELECT id FROM Case WHERE Status IN ('New', 'Closed') AND Priority = ‘High’

   Using a filter on an indexed field such as CreatedDate is always recommended, 
   but this field was not included in the original query so that we could make 
   a point about the selectivity threshold.

5. Using Complex Joins.  Complex AND/OR conditions and sub-queries require the 
   Force.com query optimizer to produce a query that is optimized for the join, 
   but might not perform as well as multiple issued queries would. This is 
   especially true with the OR condition. For Force.com to use an index for an 
   OR condition, all of the fields in the condition must be indexed and meet the 
   selectivity threshold. If the fields in the OR condition are in multiple 
   objects, and one or more of those fields does not meet a selectivity 
   threshold, the query can be expensive.

   Filters on formula fields that are non-deterministic can’t be indexed and 
   result in additional joins. Common formula field practices include 
   transforming a numeric value in a related object into a text string or using 
   a complex transformation involving multiple related objects. In both cases, 
   if you filter on this formula field, the Force.com query optimizer must join 
   the related objects.

   If you have large data volumes and are planning to use this formula field in 
   several queries, creating a separate field to hold the value will perform 
   better than following either of the previous common practices. You’ll need to 
   create a workflow rule or trigger to update this second field, have this new 
   field indexed, and use it in your queries.

If the number of records returned for an index is larger than a threshold, query 
performance wouldn’t improved by the use of that index.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License