docs/RFCS/20200501_user_defined_schemas.md
Schemas form a level of namespacing between tables and databases and currently have limited support within CockroachDB. This RFC proposes support for user-defined schemas: users will be able to create, alter, and drop arbitrary schemas within a database.
This makes it possible to have two tables with the same name, say orders, in
the same database, by making them live in separate schemas: for example,
mydb.testschema.orders and mydb.prodschema.orders. This is useful for
compatibility and to ease hosting of multiple users/apps on a single cluster.
There are two motivations:
The scenario for the second point goes as follows:
The proposed solution, which incidentally is both industry-standard and already expected by users, goes as follows:
CockroachDB already provides partial support for schemas. Schemas have entries
in the namespace table, all tables have a schema parent ID, and the name
resolution rules take schemas into account. All this is sufficient to support
the virtual schemas such as pg_catalog, the per-session pg_temp schemas for
temporary tables, and the only other schema, public.
What's new about user-defined schemas is supporting additional schemas that can
be created and modified. Syntax-wise, we'll start supporting CREATE SCHEMA,
ALTER SCHEMA, and DROP SCHEMA. The main problem involved in user-defined
schemas is how to perform efficient name resolution for arbitrary, changing
schema namespace entries.
As a starting point, this requires the ability to lease databases and schemas
the way tables are currently leased, which is also necessary for the ongoing
work on supporting enums and other user-defined types (see
#47040). Leasing
enables transactionally renaming and dropping objects such that names are
correctly resolved for the duration of the change. With the introduction of
generalized leasing (see
#48250), we'll
introduce a SchemaDescriptor to the sqlbase.Descriptor union to store
schema metadata. These schema descriptors undergo schema changes similarly to
online schema changes for tables.
A problem of name resolution specific to CockroachDB's behavior is that 2-part
names of the form x.y, are ambiguous, because x can refer to either a
database (which is non-standard) or a schema. We must first attempt to resolve
x as a schema name, and then as a database name. It's necessary to cache
enough information to minimize lookups both in the schema case and in the
database case, and caching a subset of the schemas in the database is
insufficient in general: if x were actually a database, resulting in a cache
miss, an attempted lookup for a schema named x would be required before
proceeding to trying to look up a database. Our solution is to store a snapshot
of the name-to-ID mapping for all schemas in the database on every database
descriptor, and use the cached version of this for schema name resolution. This
ensures that x is a schema if and only if it's present on the database
descriptor.
Many of the remaining open questions are around backward compatibility, and about cross-database references in particular, which are allowed in Cockroach but not in Postgres. The problem is basically how to transition users from a world in which databases were like schemas to a world in which schemas are fully supported.
Technically the changes impact:
system.descriptor: we need to start writing schema metadata to the
descriptor table.sqlbase/resolver.go and its dependencies):
this needs to resolve schemas beyond public, the virtual schemas, and the
pg_temp temporary schemas.cockroach dump: this must be extended to support dumping a single
schema inside a database. It must also be extended to properly qualify
schema names inside the dump data.Additionally, we need to verify that our existing schema-related functionality,
including search_path and the metadata virtual tables pg_catalog.namespace
and information_schema.schemata, continue to work.
Physical schemas (i.e., public and the pg_temp schemas) have an ID in the
namespace table and can be resolved by name, like tables and databases. Unlike
tables and databases, schemas currently have no associated metadata stored in
the descriptor table. To enable lease-based caching and transactional updates
(create, rename, drop) with correct name resolution, we'll need to start
writing descriptors for schemas.
Currently, table descriptors have a table-specific leasing mechanism. Database
name-to-ID mappings have a per-node incoherent cache, and schema IDs have a
cache that is never cleared because the name-to-ID mapping is immutable for the
public and pg_temp schemas, which clearly won't be true in general. The
rest of the RFC assumes that we'll implement
#48250 to extract
leasing-related metadata from the table descriptor to somewhere common to all
sqlbase.Descriptors and generalize the existing table leasing system,
enabling every type of Descriptor to be leased.
First, we'll need to add a SchemaDescriptor to the Descriptor union:
message SchemaDescriptor {
optional string name;
optional uint32 id;
optional uint32 parent_id;
optional PrivilegeDescriptor privileges;
optional []??? draining_names; // For renaming/dropping schemas
optional ??? state; // Enum with PUBLIC and DROP values
}
This is similar to a DatabaseDescriptor, with some fields for schema changes
that will eventually be common across all objects that need to be
transactionally renamed and dropped. It's possible that we'll extract those
fields to the Descriptor itself or somewhere else.
Next, we need to generalize our leasing and caching capabilities (this list is mostly not schema-specific, and overlaps heavily with #48250):
LeaseManager to handle leasing all descriptor types.TableCollection and
establish a consistent interface for getting leased and uncommitted objects.CachedSchemaAccessor to start using the updated LeaseManager
and TableCollection, and clean up and expand the SchemaAccessor interface
to support returning schema descriptors and not just IDs.Note that the schema-related groundwork that's been completed for temporary tables, and the proposed extension of the work in this RFC, is Alternative B in the original user-defined schemas RFC (#30916). Alternative C, which involves storing all schema metadata on database descriptors, would also be possible in our current state, and is discussed later in the context of caching and name resolution.
Additionally, even though schema IDs existed prior to this RFC, we'll want to
update pg_catalog.pg_namespace to use the schema descriptor's ID as the OID,
instead of using a hash of the database ID and the schema name. This is
consistent with the use of table IDs as OIDs in pg_class, and makes the OID
stable across renames, which is required for Postgres compatibility.
These are the categories of schema changes on schemas to consider:
CREATE SCHEMA)ALTER SCHEMA ... RENAME TO ...)DROP SCHEMA ... [CASCADE], or as a result of DROP DATABASE CASCADE)ALTER SCHEMA ... OWNER TO ..., GRANT/REVOKE)ALTER SCHEMA ... CONFIGURE ZONE USING ...)COMMENT ON SCHEMA ...)Schema changes in categories (2) and (3) are relatively straightforward and will be implemented like their equivalents for tables. Schema changes in category (1) are more complex, and will be revisited in the later section about caching and name resolution, but their implementations will also be essentially the same as for tables. In particular, when renaming or dropping a schema, we'll need to use the same two-transaction protocol where we wait for old leases to drain before proceeding to the second transaction to clear the old namespace entry. (User-defined types and databases will need the same treatment.) To review:
a with ID 51:
(name, id) = (a, 51)a to b:
name to b, insert namespace
entry for (b, 51), add a to draining names list(a, 51), delete a from
draining names lista with ID 51:
DROP(a, 51), delete descriptorDROP SCHEMA ... CASCADE may drop tables or user-defined types, and DROP DATABASE ... CASCADE may drop schemas. The implementation of these statements
will be similar to how DROP DATABASE ... CASCADE currently acts on tables.
The actual schema changes will be implemented as jobs, as table descriptor schema changes are. There's a question of how much of the existing schema changer implementation we should extract and generalize, which we'll also want to consider in the context of user-defined types.
As described above, when resolving names of the form x.y, we face the problem
of caching enough information to minimize lookups both when x is a schema and
when it is a database. At the time of writing, to solve this problem for a
restricted set of possible schemas, we have a cache for schema name-to-ID
lookups on the TableCollection, and a workaround for the negative case where
we avoid attempting a schema lookup if the potential schema name is not
public and does not begin with pg_temp.
Our solution is to store a complete schema name-to-ID map on every database
descriptor along with each schema's PUBLIC/DROP state, updated whenever a
schema in the database is added, renamed, or dropped. This snapshot must be
updated in the same transaction whenever a namespace entry is updated or a
schema transitions to the DROP state, as detailed above. (Note that each
update corresponds to a new database descriptor version.)
Then when a schema is leased, we will always acquire a lease on the database if we don't have one already, and we'll always use the namespace mapping on the leased database descriptor for cached name resolution. (TODO: Would we do this even when resolving a 3-part name, or would we just look at the cached schemas directly? The only reason why this might matter is that it's possible for our a leased database and one of its schemas to be offset by one "version" in the cache when both the database and schema versions were bumped in the same transaction. I think this is fine for correctness, but maybe for the sake of a consistent experience it's always better to use the database namespace mapping.)
This means that, at this point, the only data stored on a schema descriptor that isn't duplicated on the database descriptor would be the privileges. An alternative approach considered was to store all schema metadata on database descriptors, without giving them individually leasable descriptors. But we expect memory savings from keeping the privileges off the database descriptor: Assuming we impose a length limit on identifier names of 63 characters (#48443), each additional ID-name pair will require at most 68 bytes (though usually less in practice), whereas privileges (consisting of a list of name-to-permissions mappings) are unbounded in size. Storing privileges separately also reduces potential contention on the database descriptor.
To bound the memory footprint of caching schema names, we can additionally impose a limit on the number of schemas per database.
One other consequence of maintaining all schema names on the database descriptor is that the rate at which schema changes be performed on schemas within the same database is limited by how quickly schema changes can happen in succession for a single database descriptor. This would probably mainly affect users' migrations that run schema changes in quick succession when starting up, as well as ORM tests. (Frameworks and tests generally don't do multiple schema updates concurrently, so we don't necessarily expect contention in the general case, just waiting for leases to drain.)
In general, maintaining a coherent cache of all schema names inherently requires coordination across all nodes, and contention when updating multiple schemas is unavoidable. When publishing new database descriptor versions in the presence of contention, if it turns out that we spend too much time waiting in the retry loop and lowering the backoff isn't viable, we could build a more exact waiting mechanism for retries.
public schema, cross-database references, and backward compatibilityOur current implementation of schemas has areas of incompatibility with Postgres, and requires special handling for backward compatibility in this proposal:
public schema has an "ID" of 29, which is the parentSchemaID for all
new tables, but this is essentially a placeholder. If all schemas are to have
their own descriptors, it's impossible for each database's public schema
to have the same ID. In Postgres, the public schema is a schema like any
other.Currently, all (permanent) tables involved in cross-database references are
necessarily in the public schema of their respective databases. We propose
disallowing cross-database references in the general case, with a temporary
exception for tables in public for the sake of backward compatibility.
Allowing cross-database references introduces complexity while providing no
real advantages if schemas are fully supported, and users may misuse the
"feature" since it violates Postgres-based expectations of databases being
self-contained.
The proposed deprecation cycle:
public pseudo-schema with a parentSchemaID of 29, and all
existing cross-database references are preserved.
public should probably go
into the unupgraded public schema, to avoid complications from creating
a new public schema and having two public schemas existing
simultaneously.public schema from the beginning.public schema should either induce a
migration to a new schema descriptor or be disallowed. Dropping an
unupgraded public schema should involve dropping the relevant tables.public schemas, turned off by default, to make this transition
easier.v when long-running migrations become available (see
#39182), we'll want
to upgrade any remaining databases with an old-style public schema to have
a proper public schema with a unique ID and a descriptor. We'll need this
migration to only start running when it's guaranteed that no more nodes at
version v-1 can join the cluster, and we'll need to prevent nodes at
version v+1 from joining until the migration is complete.
public
schema in a specific database before the cluster-wide migration occurs.v+1, all schemas will have a proper ID and descriptor, and
cross-database references will be disallowed in general.When converting a database to a schema or upgrading the public schema to give
it a new ID and descriptor, what's required is changing the parentID or
parentSchemaID of each table in the database/schema. This process is the same
as the process for renaming a table; in fact, we already support changing the
database of a table with ALTER TABLE RENAME.
Some open questions:
During the interval of time when unupgraded public schemas can coexist with
proper schemas (including upgraded public schemas), there will be
differences in behavior between the two types of schemas. How do we make this
clear in the UI and avoid surprises?
I think the solution is to explain that:
- the public schema cannot get zone configs or permissions at all
- however, just as any other schema which wasn't assigned zone configs or permissions explicitly, it also inherits the settings from the database
Since unupgraded public schemas have no separate descriptor for metadata
and no unique ID, how should we handle updating zone configs and privileges
for the public schema? We could identify those configuration settings with
the settings for the database itself, but that could lead to confusion.
Should we allow users to "upgrade" a public schema manually (thus possibly
unlocking, e.g., the ability to set schema-specific zone configs and
privileges)? What would the UI for this look like?
CREATE SCHEMA public is an option. (This is slightly
inconsistent with us already pretending to have a public schema in the
database.)Note that the pg_temp schemas also exist without schema descriptors. Since
they cannot be renamed or dropped or have their privileges changed, it would
not be useful to make them leasable, and they can remain as descriptor-less
schemas indefinitely.
To preserve references between tables in different databases, we will need to expose a way for users to convert their existing databases into schemas. Only databases with no user-defined schemas would be eligible to be converted.
The conversion consists of:
To provide an all-in-one migration, we can just do these steps sequentially.
For example, suppose database old_db (ID 52) contains a single table tbl
(ID 51), and we would like to convert old_db into a schema in database
new_db (ID 53). Both tbl and new_db are on version 1. Before the schema
change, system.namespace contains the following entries:
| parentID | parentSchemaID | name | id |
|---|
0 | 0 | new_db | 53
0 | 0 | old_db | 52
52 | 29 | tbl | 51
At time t1, create a new schema in new_db with name old_db (ID 54), which
bumps new_db to version 2, and wait for leases on new_db to drain.
At time t2, insert a new namespace entry for tbl with the new parentID
and parentSchemaID, and update the descriptor with the new IDs. tbl is now
on version 2. Wait for leases on tbl to drain. At this point, the namespace
table contains the following entries:
| parentID | parentSchemaID | name | id |
|---|
0 | 0 | new_db | 53
0 | 0 | old_db | 52
52 | 29 | tbl | 51
53 | 0 | old_db | 54
53 | 54 | tbl | 53
At time t3, delete the old namespace entry for tbl, and start dropping the
database old_db. The dropped database needs to go through another draining
interval before being removed from the namespace table at some time t4,
assuming that transactional database dropping is implemented. After that, the
namespace table contains the following entries:
| parentID | parentSchemaID | name | id |
|---|
0 | 0 | new_db | 53
53 | 0 | old_db | 54
53 | 54 | tbl | 53
In the interval between t2 and t3, both the old_db.public.tbl and
new_db.old_db.tbl names can be used, and it is forbidden for any other table
named old_db.public.tbl to exist. Regardless of the cached version of the
table, using the name old_db.tbl always causes old_db to be resolved as a
schema.
Syntax-wise, we could treat this as part of ALTER SCHEMA RENAME with a
special-case implementation, so the above would be a result of ALTER SCHEMA old_db.public RENAME TO new_db.old_db.
To turn a set of databases with cross-database references into a set of schemas with cross-schema references, we'd need a way to migrate multiple databases in the same transaction, and defer the check that no cross-database references persist until when the transaction is ready to commit.
system.descriptor will make full-table
scans worse on that table. This was brought up in the context of user-defined
types, and the decision was that (somehow) building an index would be better
than trying to split types off in their own table; the same thing applies
here. (#47534 tracks
support for a native protobuf column type, which is a first step.)On schema metadata:
On backward compatibility and cross-database references:
public in perpetuity
(or even allowing them in general), but decided this wasn't worth the
complexity. From a feature point of view, there's no good reason to allow
cross-database references in the first place, since they violate user
expectations and solve a problem that should have been solved by schemas.
Backward-compatibility concerns are alleviated by giving users ways to
rename/move their existing tables and schemas to avoid cross-database
references.public as a special case for cross-database
references, there was a possibility of keeping it as a special descriptorless
schema for tables whose parent (conceptually speaking) was the database
itself. We decided against this along similar lines.public schemas will work, and how those unupgraded
public schemas will coexist with other schemas, new tables, etc. The first
question should be answered once more progress is made on long-running
migrations.MutableTableDescriptor, the SchemaChanger or specific parts of it, etc.)
It seems tempting to do this for renaming and dropping objects, since the
interactions with system.namespace across all object types are very
similar. This is a question for user-defined types as well as schemas and
databases.