3

I have user, items and user_items tables, and as usual user has many items. Tables contains: User: ~50K Items: ~3.5M UserItems: ~5M

I am using MySQL and most of my queries are selecting items of a user. So I wonder If it helps to keep data on UserItems table as follows,

user_id item_ids 1 1,22,23,45,66...

Then after getting ids, simply parse to get item ids. Any suggestion/experience?

András Váczi
  • 31,278
  • 13
  • 101
  • 147
tackleberry
  • 133
  • 2

1 Answers1

9

This is a bad idea.

Putting more than one data element into a single field invariably leads to headaches.

You will be much better off making a new table to store this data, like:

Table UserItems

  • UserID int
  • ItemId int

And then one row per item per user. You can query this as needed and update it without any issues.

Think about adding one additional item to your proposed structure - it would be a challenge since you need to concatenate another space and another id to the end of the existing string.

It will also be a pain to check for a single item in that structure since you need to do a string search for every record.

JNK
  • 17,956
  • 5
  • 59
  • 97
  • 1
    I'd give a +2 on this one, if I could. – Aaron Jan 18 '12 at 15:01
  • Yes I know there will be some headaches with this approach, but I thought maybe I can make it a bit faster since I am interested only a small portion of much larger space (i.e 40 items in 5M items) – tackleberry Jan 18 '12 at 15:29
  • 1
    I can't think of a scenario where it would be faster... – JNK Jan 18 '12 at 15:30
  • maybe I'm not thinking through, but in my opinion having 1 user_item in 125K rows should be faster than 40 rows in 5M rows. Of course as you said, that will bring some additional programming and possibly headache. that's why I am trying to see pros and cons. Am I wrong about numbers? – tackleberry Jan 18 '12 at 15:37
  • 1
    Yes, you are wrong :) Int comparisons are orders of magnitude faster than string searches. SQL is optimized for JOIN and relational operations, not string functions. Your 1 record in 125k will be a lot more expensive to check than looking for 40 records in 5m, especially if those 5m rows are all int fields which will take up less space than the equivalent strings. – JNK Jan 18 '12 at 15:40
  • :) thanks, you are right about searching a specific item for a user. Ok you are talking about searching items for a user, but 99% of my queries just like select item_id from user_items where user_id=1 I mean I am interested in whole user_items for a user every time. I think this reduce the searching to integer level. – tackleberry Jan 18 '12 at 15:43
  • @tackleberry you can still do that. There is a CONCAT function in MySQL I think (I'm a SQL Server guy myself) that can give you a string of those values. You are confusing data structure and display. How you want to see the data shouldn't affect the structure of the database. – JNK Jan 18 '12 at 15:53
  • I think I couldn't express right. I wasn't implying displaying the data, by saying I need all user items. What I wanted to say is nearly all my queries, gets data by user_id, so I won't search any specific user_item in that combined text area. I am only querying table by user_id and since it is also integer.. – tackleberry Jan 18 '12 at 16:01
  • Then what do you do with the string that lists all the item ids? surely at some point you want a list of item names or other criteria? – JNK Jan 18 '12 at 17:01
  • Mostly, I fetch item attributes from items table. As I said, I need to parse that string to get item ids. After that the rest of the work same for both cases. – tackleberry Jan 18 '12 at 17:12
  • 1
    Then you definitely should normalize the data as I suggest. I can't conceive of a scenario where it's more efficient to parse a string to get a list of ids to get data from than to just do two JOIN operations. – JNK Jan 18 '12 at 18:11
  • thanks @JNK, after a quick little test, it seems there is not so much performance gain, but more prog. complexity. – tackleberry Jan 18 '12 at 18:25