1

I have a table that isn't very large row wise (in the 100ks range), but that contains a lot of raw data that is very large in size. Despite having a relatively little number of rows, it's around 1.5GB.

So it's quite important to know if MySQL loads the entire row into memory, or just the columns used in WHERE, ORDER BY and GROUP BY and indexes when performing the query, and the rest of the columns at the very end?

An example query:

SELECT HugeDataTable.*, Table2.Name
FROM Table1
LEFT JOIN Table2 ON Table1.`ID` = Table2.`Table1ID` 
LEFT JOIN HugeDataTable FORCE INDEX(RowOrder) ON Table2.`ID` = HugeDataTable.`Table2ID` 
WHERE HugeDataTable.Category = 5 AND Table1.ExampleColumn = integerExampleValue
AND HugeDataTable.RowOrder >10000 AND HugeDataTable.ID <> "h4324h534"
ORDER BY HugeDataTable.`RowOrder` DESC LIMIT 18 ;

Using Explain SELECT I've managed to find that MySQL scans around 70k rows per query. The query is rather fast, but I'm not sure if it's due to row caching, as I can't simulate a heavy load on the server.

So, my question is, will the columns containing the large raw data be loaded after the query limits the result to 18 rows, and thus loading only the little raw data needed in the end?

Or will they be loaded before the limit, and so 70k rows, which are around 1GB's worth of data be loaded before the limit? And if it's the latter case, what can be made to prevent such a thing, since the server only has 1GBs of RAM.

Edit: I've added the EXPLAIN.

id  select_type  table          type    possible_keys                       key               key_len  ref                           rows   Extra                     
1   SIMPLE       HugeDataTable  range   Table2ID,Category,RowOrder          RowOrder          9                                      49591  Using where               
1   SIMPLE       Table2         eq_ref  PRIMARY                             PRIMARY           10       const,HugeDataTable.Table2ID  1      Using where; Using index  
1   SIMPLE       Table1         ref     PRIMARY                             PRIMARY           2        Table2.Table1ID               1                       
Pat
  • 1,103
  • 1
  • 10
  • 33
  • can you provide the table structure or at least their indexes ? and .. LEFT JOIN HugeDataTable is actually INNER JOIN HugeDataTable if you have in your WHERE HugeDataTable.Category = 5 – aconrad Jan 14 '14 at 16:44
  • The indexes are RowOrder, Category, and the primary key is the ID. Also the tables have the ID as primary key and Table1ID and Table2ID as indexes/foreign keys. I don't have access to the database at the moment so I can't provide the Explain.. sorry.. :( – Pat Jan 14 '14 at 21:33

1 Answers1

1

Next time you ask a question like this could you include the EXPLAIN plan.

Typically with a LEFT JOIN, mysql will resolve the table before the join before th table after (although not always) hence it will read the rows from table1 first, then table2 then HugeDataTable.

But your question is rather confused - you don't need to include the columns in the WHERE, ORDER BY and GROUP BY in the output of your query. SELECT * is never a good idea. The only caveat with this is that if you have multiple combination of the columns you need from a query then you won't make best use of the query cache.

But this query is very innefficient in lots of other ways. Why sort and limit by a value in a table which double left joined? Even if your data integrity is completely knackered there are better solutions than this. As aconrad says, you've impledmented a very expensive INNER JOIN - not a LEFT JOIN here.

Forcing an index is usually a good way to get a bad execution plan.

symcbean
  • 46,644
  • 6
  • 56
  • 89
  • I know that SELECT * is pretty bad, but in this case, I need all the columns returned by the HugeDataTable, and it's the reason why I'm using it. It's why I was asking if only some of the columns in the where statement are used and the rest are loaded later. Is there a way to avoid SELECT * but get all the colums from the HugeDataTable? – Pat Jan 14 '14 at 21:23
  • As for the ordering and sorting and limit of the data, the first two tables act as filters, while the data is retrieved from the last table. The ordering and limiting is done for the pagination in the app I'm using, to return the top 18 rows from a certain offset. (It's where the RowOrder > xxxx comes in) However, I don't quite understand why it is considered an inner join, and an expensive one at that.. Might you please elaborate? – Pat Jan 14 '14 at 21:25
  • @left=inner join: http://stackoverflow.com/questions/448023/what-is-the-difference-between-left-right-outer-and-inner-joins a left join b matches rows in tables a and b or only in a. If your WHERE has b.field = something else than NULL, it's pointless – aconrad Jan 15 '14 at 00:56
  • Alright, I read the link. So then, I should use a normal Inner Join rather than a left join? Also any way around using SELECT * but retrieving all the data from HugeDataTable? – Pat Jan 15 '14 at 07:33