Fastest way to check if products exist

Question

I have an external file containing product details, I'm looping through this file and for each product in said file I have to check if the product already exists in magento.

The value to evaluate if the product already exists is an attributename and it's value.

So I check this by using (per product)

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This selection itself doesn't seem to take much time:

            echo 'check: ',round(microtime(true) - $time, 2) , "s<br/>\n";

reports 0.00 s,

however to check if it's an empty collection (ie the product exists in my magento database takes about 0.34-0.40 s

$collection->getSize()

Considering I've got several hundred of thousands of products to check. This will add up quickly. Very quickly. I was hoping for something more akin to 0.01 or lower in time.

So I'm looking for the fastest way to check if a product exists. I have quite a bit of freedom in how to change and implement the code so if there's an completely different way of appreaching this problem I'd like to hear about it.

Update:

I've changed it slightly so that instead of checking with magento if a product exists product by product, I instead get an array of all products that have the attribute to check against. I use this array to check if the attribute to check against exists.

This is way faster, but I fear for the overhead impact (primarily ram or cpu) this will have once the amount of products returned becomes too great (we've got about 40.000 products in our magento installation)

Just to clarify, does that mean there's always ONE product for your selection ? Or there can be several products matching that collection selection ? — Raphael at Digital Pianism, Mar 16 '17 at 13:47
The source file will only loop through 1 product at a time (these products are unique in the file), but the collection could return multiple products in our database that might've been added (I can't say our magento database is very clean.) — Tropus, Mar 16 '17 at 13:51
Do I understand your goal correctly: check if product exists by sku and has appropriate attribute value? Which actions would you like to perform if product exists/does not exit? — Sergii Ivashchenko, Mar 16 '17 at 15:06
if a product doesn't exist the script does nothing and proceeds to the next product, but if the product does exists we wish to update the existing product data with the data we get (several attributes would be updated)
Due to the excessively large file I loop through it using XMLReader and pass the xml of a single node once I reach a product node, this singular node containing the data is then passed through simplexml to make it easy to use. — Tropus, Mar 16 '17 at 15:58

score 3 · Answer 1 · answered Mar 17 '17 at 11:41

3

The fastes way for a single call is to use getIdBySku from catalog/product:

if (Mage::getSingleton('catalog/product')->getIdBySku('yoursku')) {
  // Product exists
} else {
  // Product does not exist
}

Mind the getSingleton instead of getModel for this usage!

Alternative:

If you need to read them all in once you can use getAllIds() from a collection.

answered Mar 17 '17 at 11:41

Phoenix128_RiccardoT

7,065
2
22
36

How would I change the call to get the id by something other than sku for example? (am I correct in assuming It'd be getIdByAttributeName('attributevalue')) – Tropus Mar 17 '17 at 12:28

score 3 · Accepted Answer · edited Jun 15 '20 at 08:30

The best call would be getAllIds() in my knowledge.

Explanation

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This one won't take any time since it is just preparation of a collection query. This query will be actually run only when the collection is loaded through load() call or through foreach() loop or through count() call etc.

Here you can check the collection count status in many ways. I am listing out them in their best performance basis (performance decreases while go down).

$collection->getAllIds() - Best Option
$collection->getSize()
count($collection)
$collection->count() - Least option

Why This Order ?

count() defines in Varien_Data_Collection and it loads the collection first and then take the count by counting collection items. Since it involves collection load, it will take highest time.

count($collection) do not have much difference with above one.

getSize() defines in Varien_Data_Collection_Db and it avoids collection load. This is the big advantage over here. This will give you a good performance when use it for checking. See answer from Marius for more details.

getAllIds() is the best choice. This method is available in Mage_Eav_Model_Entity_Collection_Abstract and it has a very subtle difference with getSize() in definition and which makes this best choice.

getAllIds() internally calls getAllIdsSql() and see the definition here:

public function getAllIdsSql()
{
    $idsSelect = clone $this->getSelect();
    $idsSelect->reset(Zend_Db_Select::ORDER);
    $idsSelect->reset(Zend_Db_Select::LIMIT_COUNT);
    $idsSelect->reset(Zend_Db_Select::LIMIT_OFFSET);
    $idsSelect->reset(Zend_Db_Select::COLUMNS);
    $idsSelect->reset(Zend_Db_Select::GROUP);
    $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName());
return $idsSelect;

}

Here $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName()); is what make this method finest in the all available methods. In fact, this line is absent in getSize().

I was using the ->getSize() on the collection and found it somewhat slow (about 0.40s per call). Initially all I need is a true or false on whether the product exists. Is getAllIds() / getSize() truly the only fastest ways, or would there be other, alternative ways? — Tropus, Mar 17 '17 at 12:30
I don't think any other alternative ways to achieve this in order to check it. If you are very much concern about the load time, then couple of things you can do is: 1. reduce collection as much as lean 2. load collection with range — Rajeev K Tomy, Mar 17 '17 at 13:23

score 1 · Answer 3 · edited Apr 13 '17 at 12:55

As mentioned below by Rajeev K Tomy the speed of query using getAllIds() method seems fastest in magento. But you can get needed attribute values also (such as 'production_nr' or 'sku_supplier') without loading collection. This method slightly more slowly then getAllIds().

//addAttributeToSelect use only for EAV attributes
$collection = Mage::getModel('catalog/product')
    ->getCollection()
    ->addAttributeToSelect('meta_title', true) 
    ->addAttributeToSelect('meta_description', true)
    ->addAttributeToFilter(
        'name',
        array('like' => '%a%')
    );
//columns() use only for static attributes (product entity fields)
$collection->getSelect()->reset('columns')->columns(['entity_id', 'sku']);

/** @var Varien_Db_Select $select */
$select = $collection->getSelect();

/** @var Mage_Core_Model_Resource $res */
$res = Mage::getSingleton('core/resource');

/** @var Magento_Db_Adapter_Pdo_Mysql $conn */
$conn = $res->getConnection('core_write');
$res = $conn->fetchAll($select, Zend_Db::FETCH_ASSOC);

P.S. Do not add addAttributeToSelect('name') to collection if you use 'name' attribute for filtering method.
P.S.2 See my question/answer about fastest getting raw product data without loading collection here

Fastest way to check if products exist

3 Answers3

Explanation

Why This Order ?