docs/content/stable/api/ysql/user-defined-subprograms-and-anon-blocks/language-sql-subprograms.md
{{< tip title="Regard the PostgreSQL documentation as the canonical definitional reference on this topic." >}} Make sure that you read the section Query Language (SQL) Functions in the PostgreSQL documentation. {{< /tip >}}
You can define both functions and procedures using language sql. Each of these lets you encapsulate two or more SQL statements. Notice that a function, or a procedure with an inout formal argument, returns the result of only the last SQL statement. See the section The result is delivered by the last "select" (or "values") statement below. The encapsulation lets you do no more than list independent SQL statements in the order that they are to be executed. This means that there's no way within the language sql encapsulation to inform the next statement that's executed about the outcome of the previous statements. However, using two or more SQL statements in concert typically implies the need for a control structure—and this is what language plpgsql subprograms provide.
Yugabyte recommends that you respect the convention that the introductory section User-defined subprograms outlines:
The upshot is that the body of a language sql function will normally be a single select (or values) statement and that the body of a language sql procedure will normally be a single insert, update_, or delete_ statement.
Because the body of a language sql subprogram is nothing more than a list of SQL statements, the analysis that's done at create time is different from what's done at create time for a language plpgsql subprogram. (See the section PL/pgSQL's execution model.)
The difference between the two models means that a language plpgsql subprogram can use objects that it creates just before use—while a language sql subprogram can use only objects that already exist at the moment that it's created. Try the following comparison. First, using language plpgsql:
create schema s;
create procedure s.p(v1 in int, v2 in int, r inout int[])
set search_path = pg_catalog, s, pg_temp
language plpgsql
as $body$
begin
create table if not exists pg_temp.t(v int not null) on commit delete rows;
with c as (
insert into pg_temp.t(v) values (v1), (v2) returning v)
select array_agg(v) into r from c;
end;
$body$;
call s.p(17, 19, null::int[]);
The create procedure succeeds and the call produces this result:
r
---------
{17,19}
Now attempt to rewrite it using language sql:
drop table if exists pg_temp.t;
drop procedure if exists s.p(int, int, int[]);
create procedure s.p(v1 in int, v2 in int, r inout int[])
set search_path = pg_catalog, s, pg_temp
language sql
as $body$
create table if not exists pg_temp.t(v int not null) on commit delete rows;
with c as (
insert into pg_temp.t(v) values (v1), (v2) returning v)
select array_agg(v) from c;
$body$;
The create procedure fails with the 42P01 error:
relation "pg_temp.t" does not exist
You can make the error go away by moving the create table statement to top-level and by executing it before you execute the create procedure:
drop table if exists pg_temp.t;
create table pg_temp.t(v int not null) on commit delete rows;
drop procedure if exists s.p(int, int, int[]);
create procedure s.p(v1 in int, v2 in int, r inout int[])
set search_path = pg_catalog, s, pg_temp
language sql
as $body$
with c as (
insert into pg_temp.t(v) values (v1), (v2) returning v)
select array_agg(v) from c;
$body$;
The create procedure now runs without error and the call produces the same result as it did for the language plpgsql procedure.
The difference between how language plpgsql subprograms and language sql subprograms are created and executed brings some functional disadvantage to the latter kind of subprogram with respect to the former kind. With the former kind, a common pattern for using a temporary table is to include the create table if not exists statement within the encapsulation of the procedure itself so that it's guaranteed that the table exists when it's first accessed. (A more carefully designed pattern is needed if the temporary table's purpose is to represent session-duration state that several different subprograms share. Here, it will be created using on commit preserve rows. And specific initialization code must ensure that the table is created at session start—or at least before it's first needed.) If a language sql subprogram is to be used, then it inevitably must rely on specific initialization code outside of the subprogram's encapsulation to create it before the subprogram uses it.
{{< tip title="A 'language sql' subprogram might need to typecast values where its 'language plpgsql' counterpart does not." >}} Try this:
create schema s;
create function s.f()
returns int
set search_path = pg_catalog, pg_temp
language sql
as $body$
select count(*) from pg_class;
$body$;
The create fails with the 42P13 error thus:
return type mismatch in function declared to return integer
and the "detail" says Actual return type is bigint. The \df count meta-command shows that its return data type is indeed bigint. So you must either create the function with returns bigint or use a typecast thus:
drop function if exists s.f() cascade;
create function s.f()
returns int
set search_path = pg_catalog, pg_temp
language sql
as $body$
select count(*)::int from pg_class;
$body$;
Now the create succeeds without error (and select s.f() produces the expected result).
The language plpgsql counterpart is more forgiving. Try this:
drop function if exists s.f() cascade;
create function s.f()
returns int
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
return (select count(*) from pg_class);
end;
$body$;
Here, the create succeeds without error (and select s.f() again produces the expected result). {{< /tip >}}
You can use a language sql function to return:
Connect as an ordinary user that can create objects in a test database and do this:
create schema s;
create table s.t(k serial primary key, c1 int not null, c2 int not null);
insert into s.t(c1, c2) values (1, 17), (2, 42), (3, 57), (1, 19);
create function s.f(c1_in in int)
returns int
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
select c2 from t where c1 = c1_in order by c2 desc;
$body$;
The example is contrived to make a point: the function is specified to return a single scalar value. Yet, by looking at the table's content, you can see that the select that the function encapsulates will produce two rows when c1_in = 2. However, the function does not cause an error when invoked with this actual argument—and the outcome is well-defined. The function's result is simply the first result that the select statement defined. (The order by clause brings a well-defined result in the usual way.) Now execute it:
select s.f(1);
This is the result"
f
----
19
Compare this with how the corresponding language plpgsql function behaves:
create function s.f_plpgsql(c1_in in int)
returns int
set search_path = pg_catalog, s, pg_temp
security definer
language plpgsql
as $body$
begin
return (select c2 from t where c1 = c1_in order by c2 desc);
end;
$body$;
Now execute it:
select s.f_plpgsql(1);
It causes the 21000 error:
more than one row returned by a subquery used as an expression
The difference in outcome for the language sql function compared to the language plpgsql function that each encapsulates the same select statement is explained thus:
This implies that you can provoke the same error in ordinary top-level SQL, thus:
select (select c2 from s.t where c1 = 1 order by c2 desc) as result;
The key question that informs the choice between a language sql and a language plpgsql subprogram when the returns argument is to be a single value is your confidence that the code is guaranteed to produce a single value. If the code does no more than evaluate an arithmetic expression, then choosing language sql can be completely safe. But if the result is produced by a SQL statement, then the safety must rely on the existence of a suitable unique index (which is not the case in the present example).
The syntax here depends on using the keyword record as the operand of the returns keyword in the function's header together with listing the names and data types of the record's fields as out formal arguments. Try this:
create schema s;
create table s.t(k int primary key, c1 int not null, c2 int not null);
insert into s.t(k, c1, c2) values (1, 3, 17), (2, 5, 42), (3, 7, 57), (4, 9, 19);
create function s.f(k_in in int, c1 out int, c2 out int)
returns record
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
select c1, c2 from t where k = k_in;
$body$;
Test it thus:
select c1, c2 from s.f(3);
This is the result:
c1 | c2
----+----
7 | 57
Compare this approach with defining a function that returns a single composite type value:
create type s.ct as (c1 int, c2 int);
create function s.f2(k_in in int)
returns s.ct
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
select (c1, c2)::s.ct from t where k = k_in;
$body$;
The syntax to test this is identical to what you used to test s.f():
select c1, c2 from s.f2(3);
And it produces the identical result. The choice between the two approaches will be dictated by how the result is to be consumed.
The syntax here is a trivial extension of the syntax for returning a single scalar value. Try this:
create schema s0;
create function s0.my_subprogram_names(kinds_in text[] = null)
returns setof name
set search_path = pg_catalog, pg_temp
language sql
as $body$
with s(name, kind) as (
select p.proname, p.prokind::text
from
pg_proc as p
inner join pg_roles as r
on p.proowner = r.oid
where r.rolname = current_role
and p.prokind = any (array['f'::"char", 'p'::"char"]))
select name
from s
where kind = any(
case kinds_in
when array['f'] then array['f']
when array['p'] then array['p']
else array['f', 'p']
end
)
order by name;
$body$;
Now create another function and a procedure simply to give the function s.my_subprograms() more records to return:
create schema s1;
create function s1.f()
returns int
set search_path = pg_catalog, pg_temp
language sql
as $body$
select 1;
$body$;
create schema s2;
create table s2.t(k serial primary key, v int not null);
create procedure s2.p(v_in in int)
set search_path = pg_catalog, pg_temp
security definer
language sql
as $body$
insert into s2.t(v) values (v_in);
$body$;
Now test s.my_subprogram_names(). First, list the current role's functions:
select t.v as func_name
from s0.my_subprogram_names(array['f']) as t(v);
This is the result:
func_name
---------------------
f
my_subprogram_names
Next, list the current role's procedures:
select t.v as proc_name
from s0.my_subprogram_names(array['p']) as t(v);
This is the result:
proc_name
-----------
p
Finally, list the current role's functions and procedures:
select t.v as subprogram_name
from s0.my_subprogram_names() as t(v);
This is the result:
subprogram_name
---------------------
f
my_subprogram_names
p
The syntax here is a trivial extension of the syntax for returning a single record value. Try this:
create function s0.my_subprograms(
kinds_in text[] = null,
schema out name, name out name, kind out text)
returns setof record
set search_path = pg_catalog, pg_temp
language sql
as $body$
with s(schema, name, kind) as (
select n.nspname, p.proname, p.prokind::text
from
pg_proc as p
inner join
pg_namespace as n
on p.pronamespace = n.oid
inner join pg_roles as r
on p.proowner = r.oid
where r.rolname = current_role
and p.prokind = any (array['f'::"char", 'p'::"char"]))
select schema, name, case kind
when 'f' then 'function'
when 'p' then 'procedure'
end
from s
where kind = any(
case kinds_in
when array['f'] then array['f']
when array['p'] then array['p']
else array['f', 'p']
end
)
order by schema, name;
$body$;
Now test s.my_subprograms(). First, list the current role's functions:
select schema, name as func_name
from s0.my_subprograms(array['f']);
This is the result:
schema | func_name
--------+---------------------
s0 | my_subprogram_names
s0 | my_subprograms
s1 | f
Next, list the current role's procedures:
select schema, name as proc_name
from s0.my_subprograms(array['p']);
This is the result:
schema | proc_name
--------+-----------
s2 | p
Finally, list the current role's functions and procedures:
select schema, name as subprogram_name, kind
from s0.my_subprograms();
This is the result:
schema | subprogram_name | kind
--------+---------------------+-----------
s0 | my_subprogram_names | function
s0 | my_subprograms | function
s1 | f | function
s2 | p | procedure
Try this:
create function s0.my_subprograms_2(kinds_in text[] = null)
returns table(schema name, name name, kind text)
set search_path = pg_catalog, pg_temp
language sql
as $body$
with s(schema, name, kind) as (
select n.nspname, p.proname, p.prokind::text
from
pg_proc as p
inner join
pg_namespace as n
on p.pronamespace = n.oid
inner join pg_roles as r
on p.proowner = r.oid
where r.rolname = current_role
and p.prokind = any (array['f'::"char", 'p'::"char"]))
select schema, name, case kind
when 'f' then 'function'
when 'p' then 'procedure'
end
from s
where kind = any(
case kinds_in
when array['f'] then array['f']
when array['p'] then array['p']
else array['f', 'p']
end
)
order by schema, name;
$body$;
Notice that the defining select statement enquoted by $body$ ... $body$ is identical to the one that implemented the function s0.my_subprograms(). The critical differences are in the function's header:
This syntax feels more intuitive than the syntax that returns setof record uses. (It is newer, in the history of PostgreSQL, than the returns setof record syntax.) The function s0.my_subprograms_2() is invoked using the identical syntax that is used to invoke s0.my_subprograms():
select schema, name as func_name
from s0.my_subprograms_2(array['f']);
select schema, name as proc_name
from s0.my_subprograms_2(array['p']);
select schema, name as subprogram_name, kind
from s0.my_subprograms_2();
And the results are identical too, with the obvious trivial difference that my_subprograms_2() itself now shows up in the output. A further benefit of the returns table() syntax is that you use it to return either a set of scalar values or a set of record values.
A language sql function is allowed to encapsulate several SQL statements—but there are few cases where this might be useful. Try this:
drop function if exists s0.f();
create function s0.f()
returns table(k int, v text)
set search_path = pg_catalog, pg_temp
language sql
as $body$
set local search_path = pg_catalog, s0, pg_temp;
values (1, 'dog'), (2, 'cat'), (3, 'frog');
$body$;
select k, v from s0.f();
It runs without error and produces this result:
k | v
---+------
1 | dog
2 | cat
3 | frog
However, the effect of set local search_path in the function's statement list is the same as making the same setting in the function's header. It's therefore poor practice to write the function as presented here.
Now try this counter-example:
drop function if exists s0.f();
create function s0.f()
returns table(k int, v text)
set search_path = pg_catalog, pg_temp
language sql
as $body$
values ('apple'), ('orange'), ('pear');
values (1, 'dog'), (2, 'cat'), (3, 'frog');
$body$;
select k, v from s0.f();
It, too, runs without error and produces the same result as did the first version of s0.f(). The first values statement, though not illegal, has no effect. Notice that its result set has the wrong shape. But this doesn't cause an error because a language sql function's result is delivered by its final statement. This final statement must, therefore, produce results with the shape that the function's header specifies. Deliberately break this rule to see what error this causes:
drop function if exists s0.f();
create function s0.f()
returns table(k int, v text)
set search_path = pg_catalog, pg_temp
language sql
as $body$
values (1, 'dog'), (2, 'cat'), (3, 'frog');
values ('apple'), ('orange'), ('pear');
$body$;
It fails at create time thus:
ERROR: 42P13: return type mismatch in function declared to return record
DETAIL: Final statement returns text instead of integer at column 1.
You'd see the same error if you made set local search_path = pg_catalog, s0, pg_temp the final statement.
The with ordinality clause can be used by any function that returns a set: a built-in function; a language sql user-defined function; or a language plpgsql user-defined function. Try this first:
select s.k, s.v
from generate_series(13, 40, 7) with ordinality as s(v, k);
This is the result:
k | v
---+----
1 | 13
2 | 20
3 | 27
4 | 34
Now try this:
drop function if exists s0.f();
create function s0.f()
returns table(k int, v text)
set search_path = pg_catalog, pg_temp
language sql
as $body$
with c(k, v) as (
values (13, 'dog'), (20, 'cat'), (27, 'frog'), (34, 'mouse'))
select k, v from c order by k;
$body$;
select a.r, a.k, a.v
from s0.f() with ordinality as a(v, k, r);
This is the result:
r | k | v
---+-------+----
1 | dog | 13
2 | cat | 20
3 | frog | 27
4 | mouse | 34
The with ordinality clause can have a well-defined meaning only because a set-returning function (a.k.a. table function) is guaranteed to deliver its results in the exact order in which it computes them. This fact is most useful when a language plpgsql table function is used to format a report. Here's a stylized example:
drop function if exists s0.f(numeric, numeric, numeric);
create function s0.f(len in numeric, wid in numeric, hght in numeric)
returns table(line text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
line := 'Computed Dimensions'; return next;
line := '———————————————————'; return next;
line := ''; return next;
line := ' length: '||to_char(len, '99D9'); return next;
line := ' width: '||to_char(wid, '99D9'); return next;
line := ' height: '||to_char(hght, '99D9'); return next;
end;
$body$;
\t on
select s0.f(5.7, 3.3, 1.9);
\t off
The select statement has no order by clause—and nor is this needed. (Notice that it's legal to elide the select list here.) This is the result:
Computed Dimensions
———————————————————
length: 5.7
width: 3.3
height: 1.9
This example gives you a hint about how you might choose between language sql and language plpgsql. It's simple to rewrite it as language sql by using this implementation:
$body$
values
('COMPUTED DIMENSIONS'),
('———————————————————'),
(''),
(' length: '||to_char(len, '99D9')),
(' width: '||to_char(wid, '99D9')),
(' height: '||to_char(hght, '99D9'));
$body$;
In this case, then, its terseness makes language sql the winner. But, in general, a report interleaves headings and other annotations with the results from several set-returning select statements. Depending on the ambition level, the implementation of such a report becomes at best unwieldy or, in the limit, impossible with language sql but straightforward with language plpgsql.
The following examples demonstrate that there are very few use cases that can be best implemented using a language sql procedure rather than a language plpgsql procedure.
{{< note title="Procedures cannot have 'out' formal arguments." >}} Try this:
create schema s;
create table s.t(k serial primary key, v int not null);
create procedure s.p(n inout bigint)
set search_path = pg_catalog, s, pg_class
language sql
as $body$
insert into t(v) select g.v from generate_series(-50, 50, 17) as g(v);
select count(*) from t;
$body$;
call s.p(null::bigint);
This runs without error and produces the expected result:
n
---
6
Now rewrite it thus:
drop procedure s.p(int) cascade;
create procedure s.p(n out int)
set search_path = pg_catalog, s, pg_class
language sql
as $body$
insert into t(v) select g.v from generate_series(-50, 50, 17) as g(v);
select count(*)::int from t;
$body$;
The create fails with the 0A000 error:
procedures cannot have OUT arguments
and the hint says INOUT arguments are permitted.
This restriction is inherited from PostgreSQL Version 11. Current PostgreSQL does not suffer from this limitation. Therefore, it will be lifted in a later version of YugabyteDB that uses a PostgreSQL version where the limitation has been lifted. GitHub Issue #12348 tracks this. {{< /note >}}
You might find this example convincing. It certainly runs without error and produces the expected outcome:
create schema s;
create table s.t(k serial primary key, v int not null);
insert into s.t(v) select g.v from generate_series(17, 50, 11) as g(v);
create procedure s.p(lb in int, ub in int, outcome inout int[])
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
with c(v) as (delete from s.t where v between lb and ub returning v)
select array_agg(v) from c;
$body$;
call s.p(20, 45, null);
This is the result:
outcome
---------
{28,39}
This example, too, runs without error and produces the expected outcome:
create table s.t1(k serial primary key, v int not null);
create table s.t2(k serial primary key, v int not null);
create procedure s.p1(t1_v in int, t2_v in int)
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
insert into t1(v) values(t1_v);
insert into t2(v) values(t2_v);
$body$;
call s.p1(17, 42);
select 't1' as "table", k, v from s.t1
union all
select 't2' as "table", k, v from s.t2;
Here is the result of the final status query:
table | k | v
-------+---+----
t1 | 1 | 17
t2 | 1 | 42
However, it's very rare that making two or several mutually independent inserts into different tables in a single transaction meets a useful business purpose. Realistic examples of inserting into two tables, or reading from one table and inserting into another table, occur, for example, when one table has a foreign key constraint to the other, when each table has an autogenerated surrogate primary key, and when the masters table has a unique business key.
This is straightforward if you use language plpgsql procedures. Try this:
<a name="insert-master-and-details"></a>
create table s.masters(
mk serial primary key,
mv text not null unique);
create table s.details(
dk serial primary key,
mk int not null references s.masters(mk),
dv text not null);
create procedure s.insert_new_master_and_details(new_mv in text, dvs in text[])
set search_path = pg_catalog, s, pg_temp
language plpgsql
as $body$
declare
new_mk int not null := 0;
begin
insert into s.masters(mv) values(new_mv) returning mk into new_mk;
insert into s.details(mk, dv)
select new_mk, u.v
from unnest(dvs) as u(v);
end;
$body$;
call s.insert_new_master_and_details('mary', array['skiing', 'running', 'cycling']);
create procedure s.add_details_to_existing_master(old_mv in text, dvs in text[])
set search_path = pg_catalog, s, pg_temp
language plpgsql
as $body$
declare
old_mk int not null := 0;
begin
select mk into old_mk
from masters
where mv = old_mv;
insert into details(mk, dv)
select old_mk, u.v
from unnest(dvs) as u(v);
end;
$body$;
call s.add_details_to_existing_master('mary', array['tennis', 'swimming']);
select mv, dv
from s.masters inner join s.details using(mk)
order by mk, dk;
This is the result of the final query, just as is expected:
mv | dv
------+----------
mary | skiing
mary | running
mary | cycling
mary | tennis
mary | swimming
Each procedure depends, critically, on holding a value that the first SQL statement returns in a local variable so that the second SQL statement can use it. But a language sql subprogram as no notion that corresponds to a local variable to hold state for the duration of the subprogram call. A temporary table (created using on commit delete rows seems at first to be promising—except, of course, that it would distribute the functionality across too many moving parts. (See, too, the section The execution model for 'language sql' subprograms at the start of this page.) However, it turns out anyway to be impractical for another reason. This construct produces the required value:
with c(v) as (
insert into s.masters(mv) values('dick') returning mk)
select v from c;
And this analogous construct to consume it works fine:
create table pg_temp.t(v int);
insert into pg_temp.t(v)
with c(v) as (
values(17))
select v from c;
But combining the two ideas fails:
insert into pg_temp.t(v)
with c(v) as (
insert into s.masters(mv) values('dick') returning mk)
select v from c;
It causes the 0A000 error:
WITH clause containing a data-modifying statement must be at the top level
This example demonstrates vividly that a language sql procedure has severely limited capability. You might like, therefore, to adopt a simple practice rule: always implement user-defined procedures using language plpgsql.
Each of these approaches lets you set up a parameterized SQL statement for reuse so that each successive invocation can use different actual arguments. Connect as an ordinary user that has all privileges on a sandbox database and do this set-up:
create schema s;
create table s.t(k serial primary key, v int not null);
prepare stmt_1(int, int) as
insert into s.t(v) select g.v*2 from generate_series($1, $2) as g(v);
create procedure s.p(lb in int, ub in int)
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
insert into t(v) select g.v*3 from generate_series(lb, ub) as g(v);
$body$;
prepare stmt_2(int[]) as
select k, v from s.t where k = any($1) order by k;
create function s.f(in_list in int[])
returns table(k int, v int)
set search_path = pg_catalog, s, pg_temp
security definer
language sql
as $body$
select k, v from t where k = any(in_list) order by k;
$body$;
Now populate the table:
execute stmt_1(1, 5);
call s.p(lb=>6, ub=>10);
Query it first using the prepared statement:
execute stmt_2(array[2, 4, 6]);
This is the result:
k | v
---+----
2 | 4
4 | 8
6 | 18
And now query it using the function:
select k, v from s.f(in_list=>array[3, 5, 7]);
This is the result:
k | v
---+----
3 | 6
5 | 10
7 | 21
The two approaches are very similar in what they achieve:
The prepare-execute approach uniquely supports a useful explain thus:
explain execute stmt_2(array[2, 4, 6]);
It produces this result:
Sort (cost=4.12..4.13 rows=1 width=8)
Sort Key: k
-> Index Scan using t_pkey on t (cost=0.00..4.11 rows=1 width=8)
Index Cond: (k = ANY ('{2,4,6}'::integer[]))
But the explain for the create-function-select-from-function approach provides no useful information:
explain select k, v from s.f(in_list=>array[3, 5, 7]);
It produces this result:
Function Scan on f (cost=0.25..10.25 rows=1000 width=8)
In all other ways, the create-subprogram-select-from-function approach is better than the prepare-execute approach.
This fact is of no consequence when the table function is used "bare" like in the example that writes the "Computed Dimensions" report in the section Annotating the rows produced by a table function using "with ordinality". But it can have a noticeable negative performance effect if you use a table function in a surrounding SQL statement that imposes a restriction (or even a projection) or especially that uses limit N. In other words, there is no scheme analogous to the "pipelined" feature for table functions in Oracle Database; nor is there a mechanism for pushing down predicates into the function.
Consider a security definer language sql subprogram with no formal parameters whose returns argument is setof record. It's tempting to see it as the functional equivalent of a view. (A view non-negotiably behaves as if it were defined with security definer.) Each is a schema-object and, provided that appropriate privileges are granted, can be used by any role.
There are, however, critical mutually exclusive differences. A view wins in these ways:
And a function wins in this way: