1

UPDATE Here's a sqlfiddle http://sqlfiddle.com/#!2/e0822/1/0

I have a MySQL database of apps (itunes_id), each app id has a comments field. To preserve a history, every time a comment is changed, a new row of data is added. In the query below, I just want a list of the latest entry (highest id) of every app (itunes_id).

Here are the headers of my db:

id  (key and auto increment)
itunes_id
comments 
date

This query is getting the latest entry for a given itunes_id. How can I make this query more efficient?

SELECT * FROM (
    SELECT * FROM (
        SELECT * FROM Apps
        ORDER BY id DESC
    ) AS apps1
    GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25
Dev01
  • 12,199
  • 19
  • 68
  • 116
  • 1
    can you create a sql fiddle for us? – Woot4Moo Feb 25 '13 at 16:48
  • Stop selecting `*`. That's like going to the grocery store and filling your shopping cart with every product in food group you want. – Kermit Feb 25 '13 at 16:48
  • @AarolamaBluenk awful advice, sometimes `*` is necessary – Woot4Moo Feb 25 '13 at 16:49
  • Actually WootWoot, the advise based on your example might be good advise to not use so many `Select *` you don't need that many Selects personally – MethodMan Feb 25 '13 at 16:51
  • @Woot4Moo Give me a sec, I'll build a sql fiddle. Thanks. – Dev01 Feb 25 '13 at 16:51
  • 2
    @Woot4Moo The question is about efficiency. Selecing all is [not efficient](http://stackoverflow.com/questions/65512/which-is-faster-best-select-or-select-column1-colum2-column3-etc). – Kermit Feb 25 '13 at 16:51
  • @DJKRAZE That's what I was thinking. Way too many SELECTS but how do I reduce them. – Dev01 Feb 25 '13 at 16:52
  • @AarolamaBluenk myopic view in my experienced opinion. – Woot4Moo Feb 25 '13 at 16:52
  • In my experience, `select *` isn't lazy, it is sloppy. @Woot4Moo, would you please post an example where `select *` is necessary? I'd love to be proven wrong. – BryanH Feb 25 '13 at 16:55
  • @Woot4Moo Not sure how to respond to that... ***Any*** experienced DBA will tell you it's not efficient. – Kermit Feb 25 '13 at 16:56
  • @BryanH sure. I create a materialized view that requires all of the columns in my table, but also does a very lengthy computation. I then proceed to create a regular view on top of this materialized view to facilitate user queries so they don't know the structure of my table. Let me see if I can make a fiddle to illustrate what I mean (assume a placeholder function that takes an hour to execute). – Woot4Moo Feb 25 '13 at 16:57
  • @Woot4Moo Once you do that, I will create an equal view that uses a column list. Then, we can both add 50 columns to the parent tables and compare how efficient they are. – Kermit Feb 25 '13 at 16:59
  • 1
    @Woot4Moo . . . The order by in the innermost query is not correct coding practice. The results are indeterminate. You can read the MySQL documentation for more information about this . . .http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html. – Gordon Linoff Feb 25 '13 at 16:59
  • @AarolamaBluenk and yet my explain plan is THE EXACT SAME in every query I have ever written in Oracle – Woot4Moo Feb 25 '13 at 16:59
  • @Woot4Moo You're moving more data... so how is moving more (unnecessary) data more efficient? – Kermit Feb 25 '13 at 17:04
  • @AarolamaBluenk . . . Thank you. I happen to know that my answer *is* correct. But there is a lot of misinformation and misuse of MySQL. The problem is that the implementation of MySQL supports something which the documentation explicitly warns against. So, the original query is simply unsafe SQL. – Gordon Linoff Feb 25 '13 at 17:07

4 Answers4

1

This query uses a subquery which separately gets the maximum ID for every itunes_ID. The result of the subquery is then join back on the original table provided that it matches on two columns: itunes_ID and ID.

SELECT  a.*
FROM    Apps a
        INNER JOIN
        (
            SELECT  itunes_id, MAX(ID) max_id
            FROM    Apps
            GROUP   BY itunes_id
        ) b ON a.itunes_id = b.itunes_id AND
                a.ID = b.max_ID
LIMIT   0, 25

For faster performance, create a compound column INDEX on columns itunes_ID and ID. EG,

ALTER TABLE Apps ADD INDEX (itunes_ID, ID)
John Woo
  • 249,283
  • 65
  • 481
  • 481
  • Nice. I want to make this more efficient because I'm getting a lot of traffic. I'm willing to make changes to the db as needed to make this as efficient as possible (even willing to change db types). I added the index column, how do I use it? http://sqlfiddle.com/#!2/0ff75/1/0 – Dev01 Feb 25 '13 at 17:36
  • You have nothing to do with it..it's the server that will use it :D see the executiom plan, youll see tht it uses the index – John Woo Feb 25 '13 at 17:42
  • If you are able to make changes to the DB, I would recommend changing your DB structure to keep the history in a separate table. (See my answer below.) – Matthias Wuttke Feb 25 '13 at 18:01
1

For a similar approach, I use a "recent" boolean field to mark records containing the latest version. This requires an UPDATE query on every insert (deactivate the previous recent record), but allows for a quick select query. Alternatively, you could maintain two tables, one with the recent records, the other one with the history for each app.

EDIT: Maybe you can try a table similar to this:

  • id int not null auto_increment primary key
  • version int not null
  • main_id int null
  • recent boolean not null
  • app varchar(32) not null
  • comment varchar(200) null

You can use the column "main_id" to point to the record with version 1.

Matthias Wuttke
  • 1,927
  • 2
  • 21
  • 37
0

I just want the latest entry (highest id) for a given app (itunes_id)

This will do it

SELECT MAX(id), comments FROM Apps WHERE itunes_id = "iid";

or

SELECT id, comments FROM Apps WHERE itunes_id = "iid" ORDER BY id DESC LIMIT 1;

Where iid is the itunes id for which you want the latest comment.

Make sure id and itunes_id are indexed in a composite index for maximum efficiency.

BryanH
  • 5,626
  • 3
  • 35
  • 47
  • Sorry, I meant to write "I just want a list of the latest entry (highest id) of every app (itunes_id)." I updated the text int he question above. – Dev01 Feb 25 '13 at 17:04
0
SELECT * FROM (
    SELECT * FROM (
        SELECT * FROM Apps
        ORDER BY id DESC
    ) AS apps1
    GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25

will not select the oldest record (you cannot assume the generated key will always be the "oldest"). What you want is something like this:

SELECT * FROM (
    SELECT * FROM (
        SELECT * FROM Apps
        where some_date = (select max(some_date) from Apps limit 1)
        ORDER BY id DESC
    ) AS apps1
    GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25
Woot4Moo
  • 23,491
  • 13
  • 90
  • 146
  • Good point. But you also added another select statement. Doesn't that also make the query even slower? – Dev01 Feb 25 '13 at 17:05