Certain optimizations are applicable to comparisons that use the
        IN operator to test subquery results (or that
        use =ANY, which is equivalent). This section
        discusses these optimizations, particularly with regard to the
        challenges that NULL values present.
        Suggestions on what you can do to help the optimizer are given
        at the end of the discussion.
      
Consider the following subquery comparison:
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
        MySQL evaluates queries “from outside to inside.”
        That is, it first obtains the value of the outer expression
        outer_expr, and then runs the
        subquery and captures the rows that it produces.
      
        A very useful optimization is to “inform” the
        subquery that the only rows of interest are those where the
        inner expression inner_expr is equal
        to outer_expr. This is done by
        pushing down an appropriate equality into the subquery's
        WHERE clause. That is, the comparison is
        converted to this:
      
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDouter_expr=inner_expr)
After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.
        More generally, a comparison of N
        values to a subquery that returns
        N-value rows is subject to the same
        conversion. If oe_i and
        ie_i represent corresponding outer
        and inner expression values, this subquery comparison:
      
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_NFROM ... WHEREsubquery_where)
Becomes:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDoe_1=ie_1AND ... ANDoe_N=ie_N)
The following discussion assumes a single pair of outer and inner expression values for simplicity.
        The conversion just described has its limitations. It is valid
        only if we ignore possible NULL values. That
        is, the “pushdown” strategy works as long as both
        of these two conditions are true:
      
When either or both of those conditions do not hold, optimization is more complex.
        Suppose that outer_expr is known to
        be a non-NULL value but the subquery does not
        produce a row such that outer_expr =
        inner_expr. Then
        outer_expr IN (SELECT
        ...)
        In this situation, the approach of looking for rows with
        outer_expr =
        inner_exprinner_expr is
        NULL. Roughly speaking, the subquery can be
        converted to:
      
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND (outer_expr=inner_exprORinner_exprIS NULL))
        The need to evaluate the extra IS
        NULL condition is why MySQL has the
        ref_or_null access method:
      
mysql>EXPLAIN->SELECT->outer_exprIN (SELECT t2.maybe_null_keyFROM t2, t3 WHERE ...)-> FROM t1; *************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_null possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index ...
        The unique_subquery and
        index_subquery
        subquery-specific access methods also have or-null variants.
        However, they are not visible in
        EXPLAIN output, so you must use
        EXPLAIN
        EXTENDED followed by SHOW
        WARNINGS (note the checking NULL in
        the warning message):
      
mysql>EXPLAIN EXTENDED->SELECT*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using index mysql>outer_exprIN (SELECT maybe_null_key FROM t2) FROM t1\GSHOW WARNINGS\G*************************** 1. row *************************** Level: Note Code: 1003 Message: select (`test`.`t1`.`outer_expr`, (((`test`.`t1`.`outer_expr`) in t2 on maybe_null_key checking NULL))) AS `outer_expr IN (SELECT maybe_null_key FROM t2)` from `test`.`t1`
        The additional OR ... IS NULL condition makes
        query execution slightly more complicated (and some
        optimizations within the subquery become inapplicable), but
        generally this is tolerable.
      
        The situation is much worse when
        outer_expr can be
        NULL. According to the SQL interpretation of
        NULL as “unknown value,”
        NULL IN (SELECT  should evaluate to:
      inner_expr
        ...)
        For proper evaluation, it is necessary to be able to check
        whether the SELECT has produced
        any rows at all, so
        outer_expr =
        inner_expr
        Essentially, there must be different ways to execute the
        subquery depending on the value of
        outer_expr.
      
        The optimizer chooses SQL compliance over speed, so it accounts
        for the possibility that outer_expr
        might be NULL.
      
        If outer_expr is
        NULL, to evaluate the following expression,
        it is necessary to run the SELECT
        to determine whether it produces any rows:
      
NULL IN (SELECTinner_exprFROM ... WHEREsubquery_where)
        It is necessary to run the original
        SELECT here, without any
        pushed-down equalities of the kind mentioned earlier.
      
        On the other hand, when outer_expr is
        not NULL, it is absolutely essential that
        this comparison:
      
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDouter_expr=inner_expr)
Without this conversion, subqueries will be slow. To solve the dilemma of whether to push down or not push down conditions into the subquery, the conditions are wrapped in “trigger” functions. Thus, an expression of the following form:
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
is converted into:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND trigcond(outer_expr=inner_expr))
More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_NFROM ... WHEREsubquery_where)
and converts it to this expression:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND trigcond(oe_1=ie_1) AND ... AND trigcond(oe_N=ie_N) )
        Each trigcond(
        is a special function that evaluates to the following values:
      X)
            X when the “linked”
            outer expression oe_i is not
            NULL
          
            TRUE when the “linked” outer
            expression oe_i is
            NULL
          
        Note that trigger functions are not
        triggers of the kind that you create with
        CREATE TRIGGER.
      
        Equalities that are wrapped into trigcond()
        functions are not first class predicates for the query
        optimizer. Most optimizations cannot deal with predicates that
        may be turned on and off at query execution time, so they assume
        any trigcond( to
        be an unknown function and ignore it. At the moment, triggered
        equalities can be used by those optimizations:
      X)
            Reference optimizations:
            trigcond( can be
            used to construct X=Y
            [OR Y IS NULL])ref,
            eq_ref, or
            ref_or_null table
            accesses.
          
            Index lookup-based subquery execution engines:
            trigcond(
            can be used to construct
            X=Y)unique_subquery or
            index_subquery accesses.
          
Table-condition generator: If the subquery is a join of several tables, the triggered condition will be checked as soon as possible.
        When the optimizer uses a triggered condition to create some
        kind of index lookup-based access (as for the first two items of
        the preceding list), it must have a fallback strategy for the
        case when the condition is turned off. This fallback strategy is
        always the same: Do a full table scan. In
        EXPLAIN output, the fallback
        shows up as Full scan on NULL key in the
        Extra column:
      
mysql>EXPLAIN SELECT t1.col1,->t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key
        If you run EXPLAIN
        EXTENDED followed by SHOW
        WARNINGS, you can see the triggered condition:
      
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
         <in_optimizer>(`test`.`t1`.`col1`,
         <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
         on key1 checking NULL
         where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
         trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
         `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
         from `test`.`t1`
        The use of triggered conditions has some performance
        implications. A NULL IN (SELECT ...)
        expression now may cause a full table scan (which is slow) when
        it previously did not. This is the price paid for correct
        results (the goal of the trigger-condition strategy was to
        improve compliance and not speed).
      
        For multiple-table subqueries, execution of NULL IN
        (SELECT ...) will be particularly slow because the
        join optimizer doesn't optimize for the case where the outer
        expression is NULL. It assumes that subquery
        evaluations with NULL on the left side are
        very rare, even if there are statistics that indicate otherwise.
        On the other hand, if the outer expression might be
        NULL but never actually is, there is no
        performance penalty.
      
To help the query optimizer better execute your queries, use these tips:
            A column must be declared as NOT NULL if
            it really is. (This also helps other aspects of the
            optimizer.)
          
            If you don't need to distinguish a NULL
            from FALSE subquery result, you can
            easily avoid the slow execution path. Replace a comparison
            that looks like this:
          
outer_exprIN (SELECTinner_exprFROM ...)
with this expression:
(outer_exprIS NOT NULL) AND (outer_exprIN (SELECTinner_exprFROM ...))
            Then NULL IN (SELECT ...) will never be
            evaluated because MySQL stops evaluating
            AND parts as soon as the
            expression result is clear.
          


User Comments
Yes, "IN" sub-query is really expensive. You can also use a dummy table like
select outer.f1,outer.f2 from outer_table outer, (select inner.f1 as inf1 from inner_table inner WHERE inner.f2="<<expression>>") as dummy where outer.f1= dummy.inf1;
It seems easy and works for me.
Everybody should look closely at Vimal's suggestion. It worked amazingly well (fast) for me.
Add your own comment.