1

Need to do it for lower Mysql version like 4.9 to 5.6

I need to copy one table column data to another table but need to apply unique check and in case found duplicate then needs to add suffix to data and continue the update.(Don't want to stop query execution because of duplicate data) .

Let me clarify things:

My first table is tbl_categories:

cat_id  cat_parent_id   cat_active  cat_display_order   cat_suggested_hourly_rate
1       0               1           1                   10
2       1               1           2                   10

And second table is tbl_categories_metadata:

cdata_cat_id    cdata_lang_id       cdata_name
1               1                   A
1               2                   B
1               3                   C
2               1                   A
2               2                   B
3               1                   D
3               2                   E
3               3                   F

So in my second table category name added based on language id.

Now I need to add a unique column in first table with name cat_identifier, so I did :

ALTER TABLE `tbl_categories` ADD `cat_identifier` VARCHAR(100) NOT NULL AFTER `cat_id`;

Which worked fine, Now I have to make it unique, but straight forward it can not be done due to similar value present while running above query, so I did:

UPDATE
        `tbl_categories` a
    INNER JOIN `tbl_categories` b ON `a`.cat_id = `b`.cat_id
    SET
        `a`.cat_identifier = `b`.cat_id;
        

It worked fine and cat_id added to cat_identifier column, now I am able to made this column unique via below query :

ALTER TABLE `tbl_categories`
      ADD UNIQUE KEY `cat_identifier` (`cat_identifier`);
      

Worked fine,and my table now look like this now:

cat_id  cat_identifier  cat_parent_id   cat_active  cat_display_order   cat_suggested_hourly_rate
1           1               0               1           1                   10
2           2               1               1           2                   10  

Where I am stuck:

I need to update cat_identifier values taken from cdata_name column based on language id 1, but in-case language id 1 has same data for 2 categories, then i need to add -cat_id as suffix for that data and needs to update

So I tried below query :

UPDATE
`tbl_categories`
INNER JOIN `tbl_categories_metadata` ON `tbl_categories`.cat_id = `tbl_categories_metadata`.cdata_cat_id
SET
    `tbl_categories`.cat_identifier = `tbl_categories_metadata`.cdata_name
WHERE
    `tbl_categories_metadata`.cdata_lang_id = 1;
        
    

It's working , but at once duplicate found for language 1 it stops.

What I want is In case duplicate found then add -cat_id (category id of column) as suffix and do update. like clean-3,clean-4 etc...

Purpose to do so : Sometime admin/front-end seller not adding language specific names for categories and some time they add same name, so we added cat_identifier which will be unique,language independent as well as mandatory to add. This concept will work straightforward for new installation of our project,but in already working system(previous version of our projects) we have to do it in a way so that with minimal changes system work fine.

Note: Queries to create both table along with data

CREATE TABLE `tbl_categories` (
  `cat_id` int(11) UNSIGNED NOT NULL,
  `cat_identifier` varchar(100) NOT NULL,
  `cat_parent_id` int(11) UNSIGNED NOT NULL COMMENT '0 defaults to parent category',
  `cat_active` tinyint(4) UNSIGNED NOT NULL COMMENT '0 - Inactive, 1 - Active',
  `cat_display_order` decimal(4,2) NOT NULL,
  `cat_suggested_hourly_rate` decimal(10,2) NOT NULL COMMENT 'This will be used as suggestion hourly rate for this category.'
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4;


INSERT INTO `tbl_categories` (`cat_id`, `cat_identifier`, `cat_parent_id`, `cat_active`, `cat_display_order`, `cat_suggested_hourly_rate`) VALUES
(1, '', 0, 1, '1.00', '20.00'),
(2, '', 1, 1, '4.00', '15.00'),
(3, '', 1, 0, '3.00', '12.00'),
(4, '', 1, 1, '1.00', '18.00'),
(5, '', 1, 1, '2.00', '15.00'),
(6, '', 1, 1, '5.00', '10.00'),
(7, '', 0, 1, '2.00', '25.00'),
(8, '', 7, 1, '1.00', '20.00'),
(9, '', 7, 1, '2.00', '20.00'),
(10, '', 7, 1, '3.00', '20.00'),
(11, '', 0, 1, '3.00', '25.00'),
(12, '', 11, 1, '1.00', '20.00'),
(13, '', 11, 1, '2.00', '25.00'),
(14, '', 0, 1, '4.00', '20.00'),
(15, '', 14, 1, '1.00', '18.00'),
(16, '', 14, 1, '2.00', '25.00'),
(17, '', 0, 1, '5.00', '30.00'),
(18, '', 17, 1, '1.00', '0.00'),
(19, '', 17, 1, '2.00', '0.00'),
(20, '', 17, 1, '3.00', '0.00'),
(21, '', 0, 0, '2.00', '20.00'),
(22, '', 0, 0, '4.00', '25.00'),
(23, '', 0, 1, '5.00', '15.00'),
(24, '', 0, 0, '8.00', '22.00'),
(25, '', 0, 0, '9.00', '28.00'),
(26, '', 0, 1, '1.00', '20.00'),
(27, '', 26, 1, '1.00', '20.00'),
(28, '', 26, 1, '2.00', '45.00'),
(29, '', 26, 1, '3.00', '40.00'),
(30, '', 0, 0, '2.00', '15.00'),
(31, '', 0, 1, '3.00', '30.00'),
(32, '', 31, 1, '1.00', '22.00'),
(33, '', 31, 1, '2.00', '0.00'),
(34, '', 0, 0, '4.00', '15.00'),
(35, '', 0, 1, '5.00', '25.00'),
(36, '', 35, 1, '1.00', '25.00'),
(37, '', 35, 1, '2.00', '10.00'),
(38, '', 0, 0, '1.00', '40.00'),
(39, '', 0, 1, '3.00', '25.00'),
(40, '', 39, 1, '1.00', '22.00'),
(41, '', 39, 1, '2.00', '25.00'),
(42, '', 0, 0, '6.00', '35.00'),
(43, '', 0, 1, '7.00', '15.00'),
(44, '', 23, 1, '1.00', '22.00'),
(45, '', 23, 1, '2.00', '20.00'),
(46, '', 7, 1, '4.00', '25.00'),
(47, '', 43, 1, '1.00', '35.00'),
(48, '', 43, 1, '2.00', '18.00'),
(49, '', 43, 1, '3.00', '20.00'),
(50, '', 43, 1, '4.00', '40.00'),
(51, '', 7, 1, '5.00', '28.00'),
(52, '', 0, 1, '1.00', '10.00'),
(53, '', 0, 1, '1.00', '10.00');

ALTER TABLE `tbl_categories`
  ADD PRIMARY KEY (`cat_id`);

ALTER TABLE `tbl_categories`
  MODIFY `cat_id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=54;


CREATE TABLE `tbl_categories_metadata` (
  `cdata_cat_id` int(11) UNSIGNED NOT NULL COMMENT 'ID of table tbl_categories',
  `cdata_lang_id` int(11) UNSIGNED NOT NULL,
  `cdata_name` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4;


INSERT INTO `tbl_categories_metadata` (`cdata_cat_id`, `cdata_lang_id`, `cdata_name`) VALUES
(1, 3, 'Limpieza'),
(1, 2, 'Nettoyage'),
(1, 1, 'Cleaning'),
(2, 1, 'Bathroom Deep Cleaning'),
(2, 2, 'Nettoyage en profondeur de la salle de bain'),
(2, 3, 'Limpieza profunda de ba?'),
(3, 3, 'Limpieza de alfombras'),
(3, 2, 'Nettoyage de tapis'),
(3, 1, 'Carpet Cleaning'),
(4, 3, 'Limpieza profunda en el hogar'),
(4, 2, 'Nettoyage en profondeur'),
(4, 1, 'Home Deep Cleaning'),
(5, 1, 'Kitchen Deep Cleaning'),
(5, 2, 'Nettoyage en profondeur de la cuisine'),
(5, 3, 'Limpieza profunda de cocina'),
(6, 1, 'Car Cleaning'),
(6, 2, 'Nettoyage de voiture'),
(6, 3, 'Limpieza de coches'),
(7, 3, 'Experto'),
(7, 2, 'Qualifié'),
(7, 1, 'Skilled'),
(8, 1, 'Electricians'),
(8, 2, '?'),
(8, 3, 'Electricistas'),
(9, 1, 'Plumbers'),
(9, 2, 'Plombiers'),
(9, 3, 'Fontaneros'),
(10, 1, 'Carpenters'),
(10, 2, 'Charpentiers'),
(10, 3, 'Carpinteros'),
(11, 1, 'Fitness & Yoga'),
(11, 2, 'Fitness et yoga'),
(11, 3, 'Fitness y yoga'),
(12, 1, 'Fitness Trainer at Home'),
(12, 2, 'Fitness Trainer ?'),
(12, 3, 'Entrenador de fitness en casa'),
(13, 1, 'Yoga Trainer at Home'),
(13, 2, 'Formateur de yoga ?'),
(13, 3, 'Entrenador de yoga en casa'),
(14, 1, 'Salon at Home'),
(14, 2, 'Salon ?'),
(14, 3, 'Salon en casa'),
(15, 3, 'Salon en casa'),
(15, 2, 'Salon à domicile'),
(15, 1, 'Salon at home'),
(16, 1, 'Makeup and Hairstyling'),
(16, 2, 'Maquillage et Coiffure'),
(16, 3, 'Maquillaje y Peluquer?'),
(17, 3, 'Servicios de fotografia'),
(17, 2, 'Services de photographie'),
(17, 1, 'Photography Services'),
(18, 1, 'Wedding Photography & Filming'),
(18, 2, 'Photographie et tournage de mariage'),
(18, 3, 'Fotografía y filmación de bodas'),
(19, 3, 'Fotografía y rodaje de cumpleaños'),
(19, 2, 'Photographie et tournage d\'anniversaire'),
(19, 1, 'Birthday Photography & Filming'),
(20, 1, 'Family Function Shoots'),
(20, 2, 'Prise de vue en famille'),
(20, 3, 'Disparos de funciones familiares'),
(21, 3, 'Pintura mural'),
(21, 2, 'Peinture murale'),
(22, 2, 'Charpenterie'),
(22, 1, 'Carpentry'),
(23, 3, 'Personal de mantenimiento'),
(23, 2, 'Bricoleur'),
(23, 1, 'Handyman'),
(24, 3, 'Actividades de jardinería'),
(24, 2, 'Activités de jardinage'),
(24, 1, 'Gardening Activities'),
(25, 2, 'Déménagement d\'une maison complète / déménagement d\'une maison'),
(25, 3, 'Remoción de casa completa / mudanza de casa'),
(25, 1, 'Full House Removal / House moving'),
(26, 1, 'Performing Arts'),
(26, 2, 'Arts performants'),
(26, 3, 'Las artes escénicas'),
(27, 1, 'Party Host'),
(27, 2, 'Hôte de fête'),
(27, 3, 'Anfitrión de la fiesta'),
(28, 1, 'DJ'),
(28, 2, 'DJ'),
(28, 3, 'DJ'),
(29, 1, 'Choreographer'),
(29, 2, 'Chorégraphe'),
(29, 3, 'Coreógrafo'),
(30, 3, 'Mesas de barman / espera'),
(30, 2, 'Tables de barman / d\'attente'),
(30, 1, 'Bartending / Waiting Tables'),
(31, 2, 'Connectivité réseau'),
(31, 1, 'Network Connectivity'),
(31, 3, 'Conectividad de red'),
(32, 1, 'Broadband Connection installation'),
(32, 2, 'Installation de connexion à large bande'),
(32, 3, 'Instalación de conexión de banda ancha'),
(33, 1, 'Leased Line Connection'),
(33, 2, 'Connexion de ligne louée'),
(33, 3, 'Conexión de línea arrendada'),
(34, 3, 'Vigilancia de los niños'),
(34, 2, 'Baby-sitting'),
(34, 1, 'Baby Sitting'),
(35, 1, 'Pet Services'),
(35, 2, 'Services pour animaux'),
(35, 3, 'Servicios para mascotas'),
(36, 1, 'Pet Bathing & Grooming'),
(36, 2, 'Bain et toilettage d\'animaux'),
(36, 3, 'Baño y aseo de mascotas'),
(37, 1, 'Walking the pet'),
(37, 2, 'Promener l\'animal'),
(37, 3, 'Paseando a la mascota'),
(38, 2, 'Antiparasitaire'),
(39, 1, 'Personal Training'),
(39, 2, 'Formation personnelle'),
(39, 3, 'Entrenamiento personal'),
(40, 1, 'Voice Modulation / Speech'),
(40, 2, 'Modulation vocale / discours'),
(40, 3, 'Modulación de voz / habla'),
(41, 1, 'Personality Trainer'),
(41, 2, 'Entraîneur de personnalité'),
(41, 3, 'Entrenador de personalidad'),
(42, 3, 'Carta de presentación / Redactor'),
(42, 2, 'Lettre d\'accompagnement / Rédacteur de CV'),
(42, 1, 'Cover Letter / Resume Writer'),
(43, 3, 'Otros'),
(43, 2, 'Autres'),
(43, 1, 'Others'),
(21, 1, 'Wall Painting'),
(44, 1, 'Gardening Activities'),
(44, 2, 'Activités de jardinage'),
(44, 3, 'Actividades de jardinería'),
(45, 1, 'House moving'),
(45, 2, 'déménagement'),
(45, 3, 'mudanza'),
(22, 3, 'Carpintería'),
(46, 1, 'Carpentry'),
(46, 2, 'Charpenterie'),
(46, 3, 'Carpintería'),
(47, 1, 'Cover letter/Resume Writer'),
(47, 2, 'Lettre de motivation / Rédacteur de CV'),
(47, 3, 'Carta de presentación / Redactor'),
(48, 1, 'Baby Sitting'),
(48, 2, 'Baby-sitting'),
(48, 3, 'Vigilancia de los niños'),
(49, 1, 'Bartending/ Waiting Tables'),
(49, 2, 'Tables de barman / d\'attente'),
(49, 3, 'Mesas de barman / espera'),
(50, 1, 'Pest Control'),
(50, 2, 'Antiparasitaire'),
(50, 3, 'Control de plagas'),
(38, 1, 'Pest Control'),
(38, 3, 'Control de plagas'),
(51, 1, 'Wall Painting'),
(51, 2, 'Peinture murale'),
(51, 3, 'Pintura mural'),
(52, 1, 'Cat1'),
(53, 1, 'Cleaning');


ALTER TABLE `tbl_categories_metadata`
  ADD UNIQUE KEY `cat_id` (`cdata_cat_id`,`cdata_lang_id`);
Anant Kumar Singh
  • 68,309
  • 10
  • 50
  • 94
  • Is there any reason why, since you're going to do it at all, you can't simply always append the `cat_id`? – outis Nov 02 '21 at 08:06
  • One more recommendation: look into switching the tables to [InnoDB](https://stackoverflow.com/q/1970160/90527), as it supports transactions, foreign keys, row-level locking, is more resistant to corruption and is generally as fast as MyISAM (as outlined in "[When to use MyISAM and InnoDB?](https://stackoverflow.com/a/15678615/90527)"). – outis Nov 03 '21 at 06:35
  • You need to also show us the desired results. – Rick James Nov 03 '21 at 22:24
  • @RickJames I have clearly mentioned it as well as descriobed it too. – Anant Kumar Singh Nov 08 '21 at 07:58

2 Answers2

2

This might work.

UPDATE tbl_categories a
INNER JOIN (
  SELECT a.cat_id, MAX(b.cdata_name) cdata_name, ROW_NUMBER() OVER (PARTITION BY cdata_name ORDER BY cat_id) rn
  FROM tbl_categories a 
  INNER JOIN tbl_categories_metadata b ON a.cat_id = b.cdata_cat_id
  WHERE b.cdata_lang_id = 1
  GROUP BY a.cat_id
) b ON a.cat_id = b.cat_id
SET a.cat_identifier = (CASE WHEN b.rn = 1 THEN b.cdata_name ELSE CONCAT(b.cdata_name, '-', a.cat_id) END)

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8f620a00e3d81012a3e1332f13914ed8

Revised version for MySQL 5.6

UPDATE tbl_categories a
INNER JOIN (
  SELECT a.cat_id, MAX(b.cdata_name) cdata_name
  FROM tbl_categories a 
  INNER JOIN tbl_categories_metadata b ON a.cat_id = b.cdata_cat_id
  WHERE b.cdata_lang_id = 1
  GROUP BY a.cat_id
) b ON a.cat_id = b.cat_id
LEFT JOIN (
  SELECT MIN(a.cat_id) cat_id, b.cdata_name
  FROM tbl_categories a 
  INNER JOIN tbl_categories_metadata b ON a.cat_id = b.cdata_cat_id
  WHERE b.cdata_lang_id = 1
  GROUP BY b.cdata_name
) c ON a.cat_id = c.cat_id AND b.cdata_name = c.cdata_name
SET a.cat_identifier = (CASE WHEN c.cat_id IS NOT NULL THEN b.cdata_name ELSE CONCAT(b.cdata_name, '-', a.cat_id) END)
;

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=2c433ca4f20af22f7578dfe31e66db7b

ProGu
  • 5,250
  • 1
  • 2
  • 11
2

Aside

cat_identifier and cat_id have the same meaning, which is confusing. A more appropriate name for the new column, given its use, would be default_name (or default_en_name). This answer will use the former.

Answer

First, set the column values to guaranteed unique values as planned, using both tbl_categories_metadata.cdata_name and tbl_categories.cat_id:

UPDATE `tbl_categories` AS tc
    JOIN `tbl_categories_metadata` AS tcm
      ON tc.cat_id = tcm.cdata_cat_id
  SET `default_name` = CONCAT(tcm.cdata_name, '-', tc.cat_id)
  WHERE
    tcm.cdata_lang_id = 1;

The column could simply be left as-is. However, if you don't want the cat_id on some of the fields, remove it. What expression used to remove it depends on the version of MySQL server used (and what UDFs are loaded). If using MySQL 8.0, make use of REGEXP_REPLACE

UPDATE IGNORE `tbl_categories`
  SET `default_name` = REGEXP_REPLACE(`default_name`, 
                         CONCAT('-', cat_id, '$'), 
                         '')
  ORDER BY cat_id

Similarly, if you have a UDF that adds regex functionality, use that. If using a version before 8.0, a combination of SUBSTRING() and CHAR_LENGTH().

  ...
  SET `default_name` =  SUBSTRING(`default_name`, 1, 
                          CHAR_LENGTH(`default_name`) - 1 - CHAR_LENGTH(cat_id)
                        )
  ...

A simpler (though more error-prone) solution would be to use REPLACE()

  ...
  SET `default_name` = REPLACE(`default_name`, 
                         CONCAT('-', cat_id), '')
  ...

Alternate Answer

More as an exercise, it can be done in a single query in a few ways; here's a systematic approach.

Whenever rows might depend on other rows (such as with unique indices, but not only then), a single-query solution can generally be done with an additional join on one of the tables, usually grouped and by the use of aggregate functions, though sometimes with non-equality join conditions (e.g. sometimes you can use something like tbl_alias_0.col < tbl_alias_1.col). To get the related rows, the join goes through tbl_categories_metadata. The table references clause would thus be:

...
  `tbl_categories` AS tc
    JOIN `tbl_categories_metadata` AS tcm
      ON tc.cat_id = tcm.cdata_cat_id
    JOIN `tbl_categories_metadata` AS tcm_groups
      ON     tcm.cdata_name = tcm_groups.cdata_name 
         AND tcm.cdata_lang_id = tcm_groups.cdata_lang_id
...

(Note that tcm is only used to join through in this example, though in some places some of the tcm_groups column references could be replaced with tcm column references.)

For this example, since each row (identified by cat_id or cdata_cat_id) will get assigned a cdata_name, these naturally form groups.

...
GROUP BY tc.cat_id, tcm_groups.cdata_name
...

The cdata_name in each group will potentially come from multiple rows in tbl_categories (via cdata_cat_id). As only one row from tbl_categories in the group won't have a suffix appended to the default name, this must be specified. One simple option is to pick the row with minimal cat_id, but other options (e.g. maximal cat_id, random) could be implemented instead. This is implemented with a CASE using a comparison with tcm_groups.cdata_cat_id to distinguish the cases.

...
   CASE tc.cat_id
     WHEN MIN(tcm_groups.cdata_cat_id) THEN tcm.cdata_name
     ELSE CONCAT(tcm_groups.cdata_name, '-', tc.cat_id)
   END
...

If this were a simple SELECT, the above components are all you'd need. (Starting with a SELECT is useful to check the work.) Combined, they are:

SELECT tc.cat_id, 
       CASE tc.cat_id
         WHEN MIN(tcm_groups.cdata_cat_id) THEN tcm_groups.cdata_name
         ELSE CONCAT(tcm_groups.cdata_name, '-', tc.cat_id)
       END AS default_name
  FROM `tbl_categories` AS tc
        JOIN `tbl_categories_metadata` AS tcm
          ON tc.cat_id = tcm.cdata_cat_id
        JOIN `tbl_categories_metadata` AS tcm_groups
          ON     tcm.cdata_name = tcm_groups.cdata_name 
             AND tcm.cdata_lang_id = tcm_groups.cdata_lang_id
  WHERE tcm_groups.cdata_lang_id = 1
  GROUP BY tc.cat_id, tcm_groups.cdata_name
  ORDER BY tc.cat_id

The one issue with this is that GROUP BY isn't allowed in UPDATE statements. To address this, the joined table, grouping and aggregate functions need to instead take place in a sub-SELECT. The groups therein should be the columns in JOIN conditions and any in the grouping clause. The aggregate functions get used in the result columns. This gives the sub-SELECT:

SELECT MIN(cdata_cat_id) AS cdata_cat_id, cdata_lang_id, cdata_name
  FROM `tbl_categories_metadata`
  GROUP BY cdata_name, cdata_lang_id
        

Rewriting the table references using that gives:

...
`tbl_categories` AS tc
    JOIN `tbl_categories_metadata` AS tcm
      ON tc.cat_id = tcm.cdata_cat_id
    JOIN (
        SELECT MIN(cdata_cat_id) AS cdata_cat_id, cdata_lang_id, cdata_name
          FROM `tbl_categories_metadata`
          GROUP BY cdata_name, cdata_lang_id
    ) AS tcm_groups
      ON     tcm.cdata_name = tcm_groups.cdata_name
         AND tcm.cdata_lang_id = tcm_groups.cdata_lang_id
...

Aggregate functions are replaced with references to the sub-SELECT columns:

...
CASE tc.cat_id
  WHEN tcm_groups.cdata_cat_id THEN tcm_groups.cdata_name
  ELSE CONCAT(tcm_groups.cdata_name, '-', tc.cat_id)
END
...

These parts can be combined into an UPDATE:

UPDATE `tbl_categories` AS tc
    JOIN `tbl_categories_metadata` AS tcm
      ON tc.cat_id = tcm.cdata_cat_id
    JOIN (
        SELECT MIN(cdata_cat_id) AS cdata_cat_id, cdata_lang_id, cdata_name
          FROM `tbl_categories_metadata`
          GROUP BY cdata_name, cdata_lang_id
    ) AS tcm_groups
      ON tcm.cdata_name = tcm_groups.cdata_name AND tcm.cdata_lang_id = tcm_groups.cdata_lang_id
  SET default_name = CASE tc.cat_id
      WHEN tcm_groups.cdata_cat_id THEN tcm.cdata_name
      ELSE CONCAT(tcm.cdata_name, '-', tc.cat_id)
    END
  WHERE tcm.cdata_lang_id = 1

Performance

The sub-SELECT is less performant than the flat join, but can't be avoided in the UPDATE.

A WHERE tcm.cdata_lang_id = 1 could be added to the sub-SELECT. This won't affect correctness, but could result in a more efficient query if there's an index on cdata_lang_id. The most efficient query will result from an index on (cdata_lang_id, cdata_name).

outis
  • 72,188
  • 19
  • 145
  • 210