3

I'm trying to get an id from a companies table where the id is not yet in the crawlLog table. Then I need to insert that companyId into the crawlLog table.

I need to do this in one call so that parallel crawlers don't pull the same url after some other crawler has selected a url, but hasn't inserted it into the crawl log yet. I don't want to lock tables because of other problems that generates.

I get this error from both queries below:

You can't specify target table 'crawlLog' for update in FROM clause

Here are two queries i've tried to do the same thing.

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT companies.id FROM companies
        LEFT OUTER JOIN crawlLog
        ON companies.id = crawlLog.companyId
        WHERE crawlLog.companyId IS NULL
        LIMIT 1
    ),
    now()
)

I've also tried this, but get the same error:

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT id
        FROM companies
        WHERE id NOT IN
        (
            SELECT companyId
            FROM crawlLog
        )
        LIMIT 1
    ),
    now()
)
OMG Ponies
  • 314,254
  • 77
  • 507
  • 490
T. Brian Jones
  • 12,350
  • 22
  • 73
  • 112

4 Answers4

3

Why use a Subselect? INSERT INTO ... SELECT exists:

INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT companies.id, NOW()
FROM companies
LEFT OUTER JOIN crawlLog
ON companies.id = crawlLog.companyId
WHERE crawlLog.companyId IS NULL
LIMIT 1

And that way it should not complain about using a table both in the INSERT and SELECT part

wonk0
  • 12,881
  • 1
  • 19
  • 14
1

You can't update rows which you are querying. There is a way to force MySQL to use a temporary table implicitly:

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    SELECT id, when FROM
    (
    SELECT companies.id AS id, now() AS when FROM companies
    LEFT OUTER JOIN crawlLog
    ON companies.id = crawlLog.companyId
    WHERE crawlLog.companyId IS NULL
    LIMIT 1
    )
)
Community
  • 1
  • 1
André Laszlo
  • 14,497
  • 3
  • 63
  • 75
1

This works and seems like the simplest solution:

Using the simpler of the two statements in my question, I created an alias for the inner crawlLog table as suggested by @Tocco in the comments, and then removed the necessary encapsulation in VALUES().

INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT id, now()
FROM companies
WHERE id NOT IN
(
    SELECT companyId
    FROM crawlLog AS crawlLogAlias
)
LIMIT 1
T. Brian Jones
  • 12,350
  • 22
  • 73
  • 112
  • 1
    Not sure if it's a problem or not, but putting a SELECT inside of VALUES seems odd (unnecessary). – Chains Jul 08 '11 at 21:06
0

Do the select into a temp table, then insert selecting from the temp table. You can't insert into a table and select from it in the same statement, so use a temp table and two statements.

antlersoft
  • 14,383
  • 4
  • 31
  • 54