5

I have to resort to raw SQL where the ORM is falling short (using Django 1.7). The problem is that most of the queries end up being 80-90% similar. I cannot figure out a robust & secure way to build queries without violating re-usability.

Is string concatenation the only way out, i.e. build parameter-less query strings using if-else conditions, then safely include the parameters using prepared statements (to avoid SQL injection). I want to follow a simple approach for templating SQL for my project instead of re-inventing a mini ORM.

For example, consider this query:

SELECT id, name, team, rank_score
FROM
  ( SELECT id, name, team
    ROW_NUMBER() OVER (PARTITION BY team
                       ORDER BY count_score DESC) AS rank_score
    FROM 
      (SELECT id, name, team
       COUNT(score) AS count_score
       FROM people
       INNER JOIN scores on (scores.people_id = people.id)
       GROUP BY id, name, team
      ) AS count_table
  ) AS rank_table
WHERE rank_score < 3

How can I:

a) add optional WHERE constraint on people or
b) change INNER JOIN to LEFT OUTER or
c) change COUNT to SUM or
d) completely skip the OVER / PARTITION clause?

Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
click
  • 2,033
  • 2
  • 13
  • 20
  • Not with Django - but I ended up creating a dictionary based approach, where the various dictionaries defined the data, and the relationships between tables. Is there a way that you can add data to the Django models - or even better create a mixin which create your queries. – Tony Suffolk 66 Aug 07 '14 at 19:36

1 Answers1

8

Better query

For starters you can fix the syntax, simplify and clarify quite a bit:

SELECT *
FROM  (
   SELECT p.person_id, p.name, p.team, sum(s.score)::int AS score
         ,rank() OVER (PARTITION BY p.team
                       ORDER BY sum(s.score) DESC)::int AS rnk
    FROM  person p
    JOIN  score  s USING (person_id)
    GROUP BY 1
   ) sub
WHERE  rnk < 3;
  • Building on my updated table layout. See fiddle below.

  • You do not need the additional subquery. Window functions are executed after aggregate functions, so you can nest it like demonstrated.

  • While talking about "rank", you probably want to use rank(), not row_number().

  • Assuming people.people_id is the PK, you can simplify GROUP BY.

  • Be sure to table-qualify all column names that might be ambiguous

PL/pgSQL function

Then I would write a plpgsql function that takes parameters for your variable parts. Implementing a - c of your points. d is unclear, leaving that for you to add.

CREATE OR REPLACE FUNCTION f_demo(_agg text       DEFAULT 'sum'
                               , _left_join bool  DEFAULT FALSE
                               , _where_name text DEFAULT NULL)
  RETURNS TABLE(person_id int, name text, team text, score int, rnk int) AS
$func$
DECLARE
   _agg_op  CONSTANT text[] := '{count, sum, avg}';  -- allowed functions
   _sql     text;
BEGIN

-- assert --
IF _agg ILIKE ANY (_agg_op) THEN
   -- all good
ELSE
   RAISE EXCEPTION '_agg must be one of %', _agg_op;
END IF;

-- query --
_sql := format('
SELECT *
FROM  (
   SELECT p.person_id, p.name, p.team, %1$s(s.score)::int AS score
         ,rank() OVER (PARTITION BY p.team
                       ORDER BY %1$s(s.score) DESC)::int AS rnk
    FROM  person p
    %2$s  score  s USING (person_id)
    %3$s
    GROUP BY 1
   ) sub
WHERE  rnk < 3
ORDER  BY team, rnk'
   , _agg
   , CASE WHEN _left_join THEN 'LEFT JOIN' ELSE 'JOIN' END
   , CASE WHEN _where_name <> '' THEN 'WHERE p.name LIKE $1' ELSE '' END
);

-- debug   -- quote when tested ok
-- RAISE NOTICE '%', _sql;

-- execute -- unquote when tested ok
RETURN QUERY EXECUTE _sql
USING  _where_name;   -- $1

END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_demo();
SELECT * FROM f_demo('sum', TRUE, '%2');    
SELECT * FROM f_demo('avg', FALSE);
SELECT * FROM f_demo(_where_name := '%1_'); -- named param

SQL Fiddle

Community
  • 1
  • 1
Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
  • Your query cleanup looks great for maintainability. Thank you for the tip. I'll read more to understand the `PL/pgSQL function` function part. Admittedly, it'll be easier for me to have the query builder in Python, since I understand that better. – click Aug 08 '14 at 01:57
  • @click: I added some more explanation and links. – Erwin Brandstetter Aug 11 '14 at 00:25