Feature Name: User IDs For Internal User Information
Status: draft
Start Date: 2022-03-07
Authors: Fenil Patel
RFC PR: #77453
Cockroach Issue: #76079

Summary

A user and all of their privileges in CockroachDB are identified by their username. To implement ALTER ROLE ... RENAME TO ... statement is difficult as you need to create a new role with the same privileges and then drop the old one (mentioned in #50821).

We will support stable IDs to key internal user information instead of usernames. It requires a long-running migration to give existing users IDs. The type of the user IDs is yet to be decided.

Motivation

Currently, all internal per-user information is keyed by the username. It's not ideal because of the complications to implement ALTER ROLE ... RENAME TO ... statement and usernames can be considered personal data, but we need to use them in many places in logging. Implementing this in CockroachDB will enhance compatibility with the privilege system of Postgres, where pg_user has users have IDs in usesysid column.

High-level Summary

The idea is to convert code dependencies of username and lookups to use the IDs instead in 3 steps. First adding the ID columns to system tables and update related syntax statements. Second, updating the user privileges objects and lookup based on usernames. Finally, migrating the existing instances of Cockroach.

Technical design

Adding ID columns to tables with username columns

The ids will be randomly generated OID when creating a new user during insert into system.users. SQL statements like insert, lookups and delete for all other tables will be updated such that the IDs from the users table are used.

System Tables Currently:

[email protected]:26257/defaultdb> \d system.users
column_name   | data_type | is_nullable | column_default | generation_expression |  indices  | is_hidden
-----------------+-----------+-------------+----------------+-----------------------+-----------+------------
username       | STRING    |    false    | NULL           |                       | {primary} |   false
hashedPassword | BYTES     |    true     | NULL           |                       | {primary} |   false
isRole         | BOOL      |    false    | false          |                       | {primary} |   false
(3 rows)

[email protected]:26257/defaultdb> \d system.role_options
column_name | data_type | is_nullable | column_default | generation_expression |  indices  | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
username    | STRING    |    false    | NULL           |                       | {primary} |   false
option      | STRING    |    false    | NULL           |                       | {primary} |   false
value       | STRING    |    true     | NULL           |                       | {primary} |   false
(3 rows)

[email protected]:26257/defaultdb> \d system.role_members
column_name | data_type | is_nullable | column_default | generation_expression |                         indices                         | is_hidden
--------------+-----------+-------------+----------------+-----------------------+---------------------------------------------------------+------------
role        | STRING    |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
member      | STRING    |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
isAdmin     | BOOL      |    false    | NULL           |                       | {primary}                                               |   false
(3 rows)

[email protected]:26257/defaultdb> \d system.database_role_settings
column_name | data_type | is_nullable | column_default | generation_expression |  indices  | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
database_id | OID       |    false    | NULL           |                       | {primary} |   false
role_name   | STRING    |    false    | NULL           |                       | {primary} |   false
settings    | STRING[]  |    false    | NULL           |                       | {primary} |   false
(3 rows)

After Change:

[email protected]:26257/movr> \d system.users
column_name   | data_type | is_nullable |  column_default   | generation_expression |             indices             | is_hidden
-----------------+-----------+-------------+-------------------+-----------------------+---------------------------------+------------
username       | STRING    |    false    | NULL              |                       | {primary,users_username_id_idx} |   false
hashedPassword | BYTES     |    true     | NULL              |                       | {primary}                       |   false
isRole         | BOOL      |    false    | false             |                       | {primary}                       |   false
user_id        | OID       |    false    | nextval('id_seq') |                       | {primary,users_username_id_idx} |   false
(4 rows)

[email protected]:26257/movr> \d system.role_options
column_name | data_type | is_nullable | column_default | generation_expression |  indices  | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
username    | STRING    |    false    | NULL           |                       | {primary} |   false
option      | STRING    |    false    | NULL           |                       | {primary} |   false
value       | STRING    |    true     | NULL           |                       | {primary} |   false
user_id     | OID       |    false    | NULL           |                       | {primary} |   false
(4 rows)

[email protected]:26257/movr> \d system.role_members
column_name | data_type | is_nullable | column_default | generation_expression |                         indices                         | is_hidden
--------------+-----------+-------------+----------------+-----------------------+---------------------------------------------------------+------------
role        | STRING    |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
member      | STRING    |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
isAdmin     | BOOL      |    false    | NULL           |                       | {primary}                                               |   false
role_id     | OID       |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
member_id   | OID       |    false    | NULL           |                       | {primary,role_members_member_idx,role_members_role_idx} |   false
(5 rows)

[email protected]:26257/movr> \d system.database_role_settings
column_name | data_type | is_nullable | column_default | generation_expression |  indices  | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
database_id | OID       |    false    | NULL           |                       | {primary} |   false
role_name   | STRING    |    false    | NULL           |                       | {primary} |   false
settings    | STRING[]  |    false    | NULL           |                       | {primary} |   false
role_id     | OID       |    false    | NULL           |                       | {primary} |   false
(4 rows)

Type of User ID

UUID allows for randomly generated IDs during insert using gen_random_uuid() as default value for the column.
- Pros:
  - It requires no collision handling as it is handled by gen_random_uuid().
  - Adding a new column with a default value makes it easy to migrate older versions.
- Cons:
  - Users can't make use of ids themselves in other statements as they are hard to remember. Ideally, they should access it programmatically from the pg_user table but postgres allows users to specify uid when creating new users
OID type that CRDB already has and mimics what postgres does with uid when creating a user (the type of usesysid is OID). This is useful when allowing a user to specify the uid in statements for future updates. For existing users, a hash function could be used to generate their ID. We can also take an approach similar to postgres and have the new ID be the highest ID + 1.
- Pros:
  - Matches postgres use of UID (see example below) and migration from postrges to CRDB will be easier by generating a pg-compatible pg_user virtual table, with a useful usesysid column.
  - User can refer to them and possibly use them in other statements when specifying uid.
  - Currently, OID is uint64 and a column can be added easily during long running migration: alter table t1 add column y oid default oid(unique_rowid());
- Cons:
  - Due to #41904, OID might be converted to uint32 to match postgres and this means unique_rowid() can no longer be used as it generates uint64 values. So without resolving this issue, we can migrate from postgres to CRDB, but it may not be possible vice versa.
  - To atomically add a NOT NULL column during long running migration after issue is resolved, we have to come up with another function to generate the ID here instead of unique_rowid. This will most likely be less efficient.
INT - it would be similar to OID type, and we can migrate from postgres to CRDB but not vice versa. Adding columns with a default value of unique_rowid() also simplifies long running migration between node versions. The only difference is that there will be no changes to INT in the near future.

postgres=# create user a;
CREATE ROLE
postgres=# SELECT * FROM pg_user;
usename  | usesysid | usecreatedb | usesuper | userepl | usebypassrls |  passwd  | valuntil | useconfig
----------+----------+-------------+----------+---------+--------------+----------+----------+-----------
postgres |       10 | t           | t        | t       | t            | ******** |          |
a        |    16385 | f           | f        | f       | f            | ******** |          |
(2 rows)

postgres=# create user b;
CREATE ROLE
postgres=# create user aa;
CREATE ROLE
postgres=# create user ba;
CREATE ROLE
postgres=# create user ab;
CREATE ROLE
postgres=# SELECT * FROM pg_user;
usename  | usesysid | usecreatedb | usesuper | userepl | usebypassrls |  passwd  | valuntil | useconfig
----------+----------+-------------+----------+---------+--------------+----------+----------+-----------
postgres |       10 | t           | t        | t       | t            | ******** |          |
a        |    16385 | f           | f        | f       | f            | ******** |          |
b        |    16386 | f           | f        | f       | f            | ******** |          |
aa       |    16387 | f           | f        | f       | f            | ******** |          |
ba       |    16388 | f           | f        | f       | f            | ******** |          |
ab       |    16389 | f           | f        | f       | f            | ******** |          |
(6 rows)

*Require more discussion on this

Updating User Privileges and Lookups

An ID field will be added to UserPrivileges and any objects that contain a username. The Privilege Descriptors store user privilege objects in an array in sorted order based on usernames currently. In 23.1, we can remove the username field from privileges. It will allow us to alter usernames without having to update privilege objects.

To support user lookups by IDs, we have two options:

Refactor the code such that the privileges array is sorted by IDs instead. Or have an additional array in privilege descriptors sorted by IDs.
Use a map to lookup user privileges by IDs. Not sure why this wasn't used for lookups by username. Possibly because it may not have been supported by protobuf initially.

Code in privilege.pb.go:

// PrivilegeDescriptor describes a list of users and attached
// privileges. The list should be sorted by user for fast access.
type PrivilegeDescriptor struct {
Users      []UserPrivileges                                               `protobuf:"bytes,1,rep,name=users" json:"users"`
OwnerProto github_com_cockroachdb_cockroach_pkg_security.SQLUsernameProto `protobuf:"bytes,2,opt,name=owner_proto,json=ownerProto,casttype=github.com/cockroachdb/cockroach/pkg/security.SQLUsernameProto" json:"owner_proto"`
Version    PrivilegeDescVersion                                           `protobuf:"varint,3,opt,name=version,casttype=PrivilegeDescVersion" json:"version"`
}

Since most SQL syntax takes in username/role, we also require a lookup function to get the user ID from system.users by a username. Additionally, add caching the lookups to reduce the overall runtime of statements and tests.

Long-running migration

We need to add the new system table columns to existing instances and migrate all the existing users and generate their corresponding IDs. The migration will need to update the UserPriviliges to contain the new user IDs.

During the migration, we will need to update every field where a username is stored to also store an ID. The only places right now are the various system tables, user privileges and default privileges. Update the existing Privilege descriptors using one of the lookup implementation options mentioned above in order to support the lookups by user IDs.

Mixed Version Deployments

We’ll need to version gate all lookups to ensure the use of username until all the nodes have been migrated to 22.1. It ensures that if the ID field does not exist in UserPrivileges of all nodes, it doesn’t result in conflicts during lookups. Once all nodes are upgraded from 21.2 to 22.1, the version gate will allow privilege lookups using IDs.

Differences:

22.1 & Mixed Cluster	22.2
User privilege lookups are done using username. ID field in 22.2 version nodes is ignored.	ID is acquired from system.user and user privilege lookups are done using IDs instead.

Summary

Summary

Motivation

High-level Summary

Technical design

Adding ID columns to tables with username columns

System Tables Currently:

After Change:

Type of User ID

Pros:

Cons:

Pros:

Cons:

Updating User Privileges and Lookups

To support user lookups by IDs, we have two options:

Long-running migration

Mixed Version Deployments

Differences: