-2

I have 3 tables:

  1. Customers.

    cust_id cust_name
    1000000001 Village Toys
    1000000002 Kids Place
    1000000003 Fun4All
    1000000004 Fun4All
    1000000005 The Toy Store
  2. Orders.

    cust_id order_num
    1000000001 20005
    1000000003 20006
    1000000005 20008
    1000000001 20009
  3. OrderItems.

    quantity item_price order_num
    100 5.49 20005
    100 10.99 20005
    20 5.99 20006
    10 8.99 20006
    10 11.99 20006
    5 4.99 20008
    5 11.99 20008
    10 3.49 20008
    10 3.49 20008
    10 3.49 20008
    250 2.49 20009
    250 2.49 20009
    250 2.49 20009

And the following code:

SELECT cust_name, Orders.order_num, SUM(item_price * quantity) AS OrderTotal
FROM Customers
INNER JOIN Orders ON Customers.cust_id = Orders.cust_id
INNER JOIN OrderItems ON Orders.order_num = OrderItems.order_num
GROUP BY cust_name, Orders.order_num
ORDER BY cust_name, order_num;

The result is:

cust_name order_num OrderTotal
Fun4All 20006 329.60
The Toy Store 20008 189.60
Village Toys 20005 1648.00
Village Toys 20009 1867.50

If I remove cust_id from GROUP BY statement, I will get an error

The column "Customers.cust_name" is not allowed in the select list because it is not contained in either the aggregate function or the GROUP BY clause.

As I said I can't understand why GROUP BY is used with 2 columns. I can comprehend using only GROUP BY Orders.order_num because it will group OrderTotal by order_num. Also I don't know in which order columns are processed by GROUP BY. Is cust_name processed first or Orders.order_num?

866812
  • 17
  • 7
  • 1
    Where in detail are you 1st stuck applying what published definition of what group by does? Otherwise you are just asking us to write yet another one with no details of what you misunderstand or don't or do understand. – philipxy May 28 '22 at 10:22
  • Did you try removing cust_name? Did it work, or did you get an error telling you the column needs to be either aggregated or in the group-by clause. – Stu May 28 '22 at 10:22
  • Please in code questions give a [mre]--cut & paste & runnable code & example input; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy May 28 '22 at 10:23
  • 2
    The error message that MSQL is giving is pretty clear: "Column 'Customers.cust_name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." (See: [DBFIDDLE](https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=09dc8f0e3c25628ad718180741eea336)). So it is unclear what your question is. – Luuk May 28 '22 at 10:29
  • Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect research in posts. SO/SE search is poor & literal & sometimes limited to titles, but read the help. Google re googling/searching, including in Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [ask] [Help] – philipxy May 28 '22 at 10:34
  • 1
    Liklewise it's not clear what about the error message you don't understand - I would recommend reading the documentation for aggregations and group by, there's nothing special about this particular query, SQL Server *requires* all non-aggregated columns to be defined in the group by clause. Period. – Stu May 28 '22 at 10:42
  • You still aren't justifying your expectations by reference to (authoritative) documentation. (Nor have you acted on everything in the comments.) PS You know you get an error, if you are trying to ask about why the language is designed the way it is, you haven't asked that. But it would be off-topic. But nevertheless it's a duplicate. PS Ask 1 question. – philipxy May 28 '22 at 10:48
  • Possible duplicate of [GROUP BY / aggregate function confusion in SQL](https://stackoverflow.com/q/4611897/3404097) – philipxy May 28 '22 at 10:48
  • 2
    As to "in which order columns are processed by GROUP BY": There is no order. `GROUP BY ____` means "I want a result row per ____". It makes no difference whether you want a result row per A and B or per B and A. Same, same. An example: You want the total salary per shop and year. This means you want the total salary per year and shop. Exactly the same result rows with three columns (shop, year, total) and the same values. (You can of course change the result visually by changint the `ORDER BY` clause and/or change the sequence of the columns in the `SELECT` clause.) – Thorsten Kettner May 28 '22 at 11:49

2 Answers2

1

When you are grouping, the server needs to know which function to apply to the remaining fields of the aggregate query. If you are grouping only by order_num, it needs to know which function to apply to summarize the cust_name field. One simple approach is to use the min() function for cust_name in select (like this SELECT min(cust_name) ...). Which simply means, take the smallest of cust_name (alphabetic sorting) for the records in each order_num group.

From the table structure (data given above), it is clear that you don't issue the same order_num for more than one customer (which should be normal practice). As such, you are right, you don't need to group by customer (cust_name) if you are grouping by order_num.

Group by keys are always processed from left to right and results group hierarchically in that order.

btw: if index fields (id fields) are available, use the index fields for grouping instead of string fields (cust_id vs cust_name) - working on indexed columns is always speedier, esp. when aggregating results.

nazim
  • 1,386
  • 16
  • 25
1

Your question is justified. If you group by order alone, and there is exactly one customer per order, why should you not be able to select the customer's name related to the order?

The answer is: you should be able to do so. (Provided unique constraints and foreign key references are properly set up in your database.) The customer name in this example is functionally dependent on the order number, so according to the SQL standard there should be no problem with this.

You are getting this error message only, because SQL Server doesn't comply with the SQL standard here. They require you to only select columns of the GROUP BY clause and aggregates (SUM(...), COUNT(...), etc.).

So, either keep the customer in the GROUP BY clause or select MIN(cust_name) or MAX(cust_name) instead of the mere cust_name to make the DBMS happy.

Thorsten Kettner
  • 80,418
  • 7
  • 43
  • 64