0

I started the process of inserting returned results to another table. The query groups the rows in respect of indexed IDs. This causes 149,000,000 rows to be decreased to 460,000 rows.

The query includes 3 table INNER JOINs, with each table having about 20,000,000 rows.

Further information, the process completes in about 12 seconds for a test file which has 1000 input rows, and returns 703 rows.

I started the query earlier ### we don't know when earlier is ###, but it is still running in the state: "Copying to temp table on disk" after 38000 seconds (10 and a half hours).

I think there is a problem during the insertion process. What am I probably doing wrong here? If it helps, the operating system of the computer is Windows 7, it has 3 GB RAM, an Intel Core2Duo 2.27GHz processor. ### you forgot to tell us details on the hard drive. One partition in, one out, same disk, same partitions, etc ###

Here's my query as it currently reads:

INSERT INTO kdd.contents 
            (adid, 
             descriptionwords, 
             purchasedkeywordwords, 
             titlewords) 
SELECT t.adid, 
       dt.tokensid, 
       pkt.tokensid, 
       tt.tokensid 
FROM   kdd.training t
       INNER JOIN kdd.purchasedkeywordid_tokensid pkt
               ON t.keywordid = pkt.purchasedkeywordid 
       INNER JOIN kdd.titleid_tokensid tt
               ON t.titleid = tt.titleid 
       INNER JOIN kdd.descriptionid_tokensid dt
               ON t.descriptionid = dt.descriptionid 
GROUP  BY adid; 
András Váczi
  • 31,278
  • 13
  • 101
  • 147

1 Answers1

3

When I see that expression Copying to temp table on disk, I see a cascading problem.

First of all, what would precede Copying to temp table on disk? It would be Copying to temp table.

Why is this the case?

There are two variables that dictate how big a temp table in RAM can be

Imagine processing a query where mysqld has tmp_table_size configured at 32M. If the query being processed needs more that 32M to populate a temp table, look at what mysqld must do to the query and the temp table:

  1. Suspend the query's execution
  2. Open a temp table on disk
  3. Migrate the 32M data already in the temp table in RAM over to the new temp table on disk
  4. Continue the query's execution from where it left off, populating the temp table on disk

That's when the message Copying to temp table on disk appears (during step 3). Based on this and the step given, my guess is that you have tmp_table_size set way too high. The bigger the temp table is that is not enough to finish any phase of the query, the longer step 3 must take in the interim.

SUGGESTION #1

I would set tmp_table_size and max_heap_table_size as low as possible. That would force a temp table to go to disk faster.

SUGGESTION #2

If you have the ability to setup a RAM disk in Windows, try mapping tmpdir to that RAM disk.

I once suggested these things in StackOverflow.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • 3
    I'd add (Suggestion #0) "Make sure that indexes are used and the SELECT query has a sane execution plan" – ypercubeᵀᴹ Oct 09 '12 at 19:10
  • So, how can I change these two variables? As I understood, I have to decrease them. My tmp_table_size=14M and the other is greater than this. – JoshuaJeanThree Oct 09 '12 at 19:18
  • Please look in http://dba.stackexchange.com/a/7817/877 on how to set them. In your case, I would set them very low. In your case, I would set them to 2K (not 2M). – RolandoMySQLDBA Oct 09 '12 at 19:30