In some cases, MySQL can use an index to satisfy an
        ORDER BY clause without doing any extra
        sorting.
      
        The index can also be used even if the ORDER
        BY does not match the index exactly, as long as all of
        the unused portions of the index and all the extra
        ORDER BY columns are constants in the
        WHERE clause. The following queries use the
        index to resolve the ORDER BY part:
      
SELECT * FROM t1 ORDER BYkey_part1,key_part2,... ; SELECT * FROM t1 WHEREkey_part1=constantORDER BYkey_part2; SELECT * FROM t1 ORDER BYkey_part1DESC,key_part2DESC; SELECT * FROM t1 WHEREkey_part1=1 ORDER BYkey_part1DESC,key_part2DESC;
        In some cases, MySQL cannot use indexes to
        resolve the ORDER BY, although it still uses
        indexes to find the rows that match the WHERE
        clause. These cases include the following:
      
            You use ORDER BY on different keys:
          
SELECT * FROM t1 ORDER BYkey1,key2;
            You use ORDER BY on nonconsecutive parts
            of a key:
          
SELECT * FROM t1 WHEREkey2=constantORDER BYkey_part2;
            You mix ASC and DESC:
          
SELECT * FROM t1 ORDER BYkey_part1DESC,key_part2ASC;
            The key used to fetch the rows is not the same as the one
            used in the ORDER BY:
          
SELECT * FROM t1 WHEREkey2=constantORDER BYkey1;
            You use ORDER BY with an expression that
            includes terms other than the key column name:
          
SELECT * FROM t1 ORDER BY ABS(key); SELECT * FROM t1 ORDER BY -key;
            You are joining many tables, and the columns in the
            ORDER BY are not all from the first
            nonconstant table that is used to retrieve rows. (This is
            the first table in the
            EXPLAIN output that does not
            have a const join type.)
          
            You have different ORDER BY and
            GROUP BY expressions.
          
            You index only a prefix of a column named in the
            ORDER BY clause. In this case, the index
            cannot be used to fully resolve the sort order. For example,
            if you have a CHAR(20)
            column, but index only the first 10 bytes, the index cannot
            distinguish values past the 10th byte and a
            filesort will be needed.
          
            The type of table index used does not store rows in order.
            For example, this is true for a HASH
            index in a MEMORY table.
          
        Availability of an index for sorting may be affected by the use
        of column aliases. Suppose that the column
        t1.a is indexed. In this statement, the name
        of the column in the select list is a. It
        refers to t1.a, so for the reference to
        a in the ORDER BY, the
        index can be used:
      
SELECT a FROM t1 ORDER BY a;
        In this statement, the name of the column in the select list is
        also a, but it is the alias name. It refers
        to ABS(a), so for the reference to
        a in the ORDER BY, the
        index cannot be used:
      
SELECT ABS(a) AS a FROM t1 ORDER BY a;
        In the following statement, the ORDER BY
        refers to a name that is not the name of a column in the select
        list. But there is a column in t1 named
        a, so the ORDER BY uses
        that, and the index can be used. (The resulting sort order may
        be completely different from the order for
        ABS(a), of course.)
      
SELECT ABS(a) AS b FROM t1 ORDER BY a;
        By default, MySQL sorts all GROUP BY
         queries as if you
        specified col1,
        col2, ...ORDER BY  in the query as
        well. If you include an col1,
        col2, ...ORDER BY clause
        explicitly that contains the same column list, MySQL optimizes
        it away without any speed penalty, although the sorting still
        occurs. If a query includes GROUP BY but you
        want to avoid the overhead of sorting the result, you can
        suppress sorting by specifying ORDER BY NULL.
        For example:
      
INSERT INTO foo SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
        With EXPLAIN SELECT ... ORDER BY, you can
        check whether MySQL can use indexes to resolve the query. It
        cannot if you see Using filesort in the
        Extra column. See
        Section 7.2.1, “Optimizing Queries with EXPLAIN”.
      
        MySQL has two filesort algorithms for sorting
        and retrieving results. The original method uses only the
        ORDER BY columns. The modified method uses
        not just the ORDER BY columns, but all the
        columns used in the query.
      
        The optimizer selects which filesort
        algorithm to use. It normally uses the modified algorithm except
        when BLOB or
        TEXT columns are involved, in
        which case it uses the original algorithm.
      
        The original filesort algorithm works as
        follows:
      
            Read all rows according to key or by table scanning. Rows
            that do not match the WHERE clause are
            skipped.
          
            For each row, store a pair of values in a buffer (the sort
            key and the row pointer). The size of the buffer is the
            value of the
            sort_buffer_size system
            variable.
          
When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.)
Repeat the preceding steps until all rows have been read.
            Do a multi-merge of up to MERGEBUFF (7)
            regions to one block in another temporary file. Repeat until
            all blocks from the first file are in the second file.
          
            Repeat the following until there are fewer than
            MERGEBUFF2 (15) blocks left.
          
On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.
            Read the rows in sorted order by using the row pointers in
            the result file. To optimize this, we read in a big block of
            row pointers, sort them, and use them to read the rows in
            sorted order into a row buffer. The size of the buffer is
            the value of the
            read_rnd_buffer_size system
            variable. The code for this step is in the
            sql/records.cc source file.
          
        One problem with this approach is that it reads rows twice: One
        time when evaluating the WHERE clause, and
        again after sorting the pair values. And even if the rows were
        accessed successively the first time (for example, if a table
        scan is done), the second time they are accessed randomly. (The
        sort keys are ordered, but the row positions are not.)
      
        The modified filesort algorithm incorporates
        an optimization such that it records not only the sort key value
        and row position, but also the columns required for the query.
        This avoids reading the rows twice. The modified
        filesort algorithm works like this:
      
            Read the rows that match the WHERE
            clause.
          
For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.
Sort the tuples by sort key value
Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.
        Using the modified filesort algorithm, the
        tuples are longer than the pairs used in the original method,
        and fewer of them fit in the sort buffer (the size of which is
        given by sort_buffer_size). As
        a result, it is possible for the extra I/O to make the modified
        approach slower, not faster. To avoid a slowdown, the
        optimization is used only if the total size of the extra columns
        in the sort tuple does not exceed the value of the
        max_length_for_sort_data system
        variable. (A symptom of setting the value of this variable too
        high is that you should see high disk activity and low CPU
        activity.)
      
        For slow queries for which filesort is not
        used, you might try lowering
        max_length_for_sort_data to a
        value that is appropriate to trigger a
        filesort.
      
        If you want to increase ORDER BY speed, check
        whether you can get MySQL to use indexes rather than an extra
        sorting phase. If this is not possible, you can try the
        following strategies:
      
            Increase the size of the
            sort_buffer_size variable.
          
            Increase the size of the
            read_rnd_buffer_size
            variable.
          
            Use less RAM per row by declaring columns only as large as
            they need to be to hold the values stored in them. For
            example, CHAR(16) is better than
            CHAR(200) if values never exceed 16
            characters.
          
            Change tmpdir to point to a
            dedicated file system with large amounts of free space.
            Also, this option accepts several paths that are used in
            round-robin fashion, so you can use this feature to spread
            the load across several directories. Paths should be
            separated by colon characters
            (“:”) on Unix and semicolon
            characters (“;”) on Windows,
            NetWare, and OS/2. The paths should be for directories in
            file systems that are located on different
            physical disks, not different
            partitions on the same disk.
          


User Comments
To be certain that this optimization is done you often need to force the correct index to be used with FORCE INDEX.
Another problem is that the optimization is lost when you have an OR on a prefix of the index:
SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=1 OR key1=2 ORDER BY key2,key3 LIMIT 5, 5;
This can be worked around by using UNION:
(SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=1 ORDER BY key2,key3 LIMIT 10)
UNION
(SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=2 ORDER BY key2,key3 LIMIT 10)
ORDER BY key2,key3 LIMIT 5, 5
This rewrite can make the query go ~20000 times faster on a table with ~2M rows.
===========================================================
Posted by Ravenous Bugblatter Beast on August 15 2006
If you are selecting a wide result set and using ORDER BY RAND() with a LIMIT, you can often speed things up by changing a query of the form:
SELECT id, col1, col2, ... , colN FROM tab WHERE conditions ORDER BY RAND() LIMIT m
To a query of the form:
SELECT id, col1, col2, ... , colN FROM tab WHERE id IN (SELECT id FROM tab WHERE conditions ORDER BY RAND() LIMIT m)
Although the second query has to perform an additional select, it only has to sort a result set containing the single id column, rather than the full result set you are returning from the query.
===========================================================
it's true, but if we have a big database with 1000ths of table rows and we must to join them ...... so oooops ;)
Lately I saw the following link http://jan.kneschke.de/projects/mysql/order-by-rand/ .....you can read here how to get more speed from your executing query in some cases. First read the explanation and see bottom of the page:
============================================================
Performance
Now let's see what happends to our performance. We have 3 different queries for solving our problems.
* Q1. ORDER BY RAND()
* Q2. RAND() * MAX(ID)
* Q3. RAND() * MAX(ID) + ORDER BY ID
Q1 is expected to cost N * log2(N), Q2 and Q3 are nearly constant.
The get real values we filled the table with N rows ( one thousand to one million) and executed each query 1000 times.
----------------------------------------------------------
Rows ||100 ||1.000 ||10.000 ||100.000 ||1.000.000
----------------------------------------------------------
Q1||0:00.718s||0:02.092s||0:18.684s||2:59.081s||58:20.000s
Q2||0:00.519s||0:00.607s||0:00.614s||0:00.628s||0:00.637s
Q3||0:00.570s||0:00.607s||0:00.614s|0:00.628s ||0:00.637s
----------------------------------------------------------
As you can see the plain ORDER BY RAND() is already behind the optimized query at only 100 rows in the table.
============================================================
All this is much faster than an ORDER BY RAND(), the problem is, it does not do the same.
Imagine you have a table with two columns (id, name) of thousands of names of people.
If you want to produce a result of only one record, then this is by far the fastest way to do it. You select a random ID from the table, and query it directly.
The problem is, if you want to retrieve two random records from the table.
ORDER BY RAND() will give you, for example, ID 390 and ID 4957 in a result, but the sugested way, will calculate a random index (ex. 390) and return two records based on that index, ID 390 and ID 391...
This means, that the order of the result is not random, only the first "fetched" ID. If you query for more than two records, it will give you ID 390, 391, 392, 393...
In other words, this does not produce an actual random selected result on more than one record.
When performing an ORDER BY col1 ASC - where col1 is an ENUM data type - the ORDER BY clause will not order by the natural language order, but the order of the values in the definition of the ENUM field.
To order by an ENUM field make sure the ENUM values are in alphabetical order or reverse-alphabetical order and then perform the ORDER BY clause.
If you need to "hard" sort a table, i.e. you want the table to be sorted in some predefined order, so that your query doesn't have to do it, use the following syntax:
alter table `yourtablename` order by `yourfieldname` desc;
In this case, when you execute:
select * from `yourtablename` limit 1;
you get the record for which yourfieldname is the largest.
* Q2. RAND() * MAX(ID)
* Q3. RAND() * MAX(ID) + ORDER BY ID
You should NOT use these examples, because it could return invalid data:
With that you receive the highest stored index. Then a random number between 0 and MAX(id) is returned. 0 is a unwanted result. Imagine ID=3 is returned and this row has been deleted. Then this is a unwanted result, too.
If you need only 1 row in case of quotes or similar things then you ever should use the LIMIT 1 at the end of the query.
Add your own comment.