docs/content/stable/develop/data-modeling/primary-keys-ycql.md
{{<api-tabs>}}
The Primary key is a column or a set of columns that uniquely identifies a row, such as a user ID or order number. You should choose the primary key based on the most common access pattern. Columns of data type string, number, serial, or UUID make good choices for primary keys.
The best way to uniquely identify a record is to allow the database to assign a unique identifier to the row. YugabyteDB supports multiple schemes for generating identifiers that you can choose based on the needs of your application.
A UUID is a 128-bit number represented as a string of 36 characters, including hyphens. For example, 4b6aa2ff-53e6-44f5-8bd0-ef9de90a8095. YugabyteDB natively supports UUID generation as per RFC 4122 via the uuid-ossp extension. UUIDs have several advantages:
You can add a UUID to your schema as follows:
create keyspace if not exists yugabyte;
use yugabyte;
drop table if exists users;
create table users (
id uuid,
name text,
primary key (id)
);
Insert some rows into the table:
insert into users (id, name) values (uuid(), 'John Wick');
insert into users (id, name) values (uuid(), 'Iron Man');
insert into users (id, name) values (uuid(), 'Harry Potter');
insert into users (id, name) values (uuid(), 'Kack Sparrow');
Select all the rows from the table:
select * from users;
Notice how the generated IDs are totally random.
id | name
--------------------------------------+--------------
85a17586-317f-4ef1-b5dd-582a13ccc832 | Harry Potter
b431bb80-b20d-42fe-900d-fed295de507a | Kack Sparrow
7abae478-532c-40aa-9f81-42e85750fe01 | John Wick
2a151214-272d-4448-af3e-a343f434fa68 | Iron Man
TimeUUID is a special type of UUID that has a time factor integrated into it so the generated UUIDs have an order associated with them. They can be generated using the now() function. To do this, first create a table as follows:
create keyspace if not exists yugabyte;
use yugabyte;
drop table if exists users;
create table users (
id timeuuid,
name text,
primary key(id)
);
Insert some rows into the table:
insert into users (id, name) values (now(), 'John Wick');
insert into users (id, name) values (now(), 'Iron Man');
insert into users (id, name) values (now(), 'Harry Potter');
insert into users (id, name) values (now(), 'Kack Sparrow');
Select all the rows from the table:
select * from users;
The generated ids are not very random:
id | name
--------------------------------------+--------------
19f24446-2904-11ef-917b-6bf61abbc06e | Iron Man
13dc1460-2904-11ef-917b-6bf61abbc06e | John Wick
1a3f8ac6-2904-11ef-917b-6bf61abbc06e | Kack Sparrow
19f29720-2904-11ef-917b-6bf61abbc06e | Harry Potter
To illustrate how to choose existing columns as primary keys, first create a sample census schema.
<!-- begin: nav tabs -->{{<nav/tabs list="local,anywhere" active="local" repeatedTabs="true"/>}}
{{<nav/panels>}} {{<nav/panel name="local" active="true">}}
<!-- local cluster setup instructions -->{{<setup/local>}} {{</nav/panel>}}
{{<nav/panel name="anywhere">}} {{<setup/anywhere>}} {{</nav/panel>}} {{</nav/panels>}}
<!-- end: nav tabs -->Create a census table as follows:
CREATE TABLE census(
id int,
name varchar,
age int,
zipcode int,
employed boolean,
PRIMARY KEY(id)
);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (1,'Zachary',55,94085,True);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (2,'James',56,94085,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (3,'Kimberly',50,94084,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (4,'Edward',56,94085,True);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (5,'Barry',56,94084,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (6,'Tyler',45,94084,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (7,'James',47,94085,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (8,'Sarah',52,94084,True);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (9,'James',59,94084,False);
INSERT INTO census (id,name,age,zipcode,employed) VALUES (10,'Diane',51,94083,False);
In the census table, the most likely way to look up a person is by their id, so the primary key has been set to id. This means that the data is distributed based on ID. This works well for point lookups on ID. For example, to look up ID 9, you can do the following:
select * from census where id=9;
You will see output similar to the following:
id | name | age | zipcode | employed
----+-------+-----+---------+----------
9 | Nancy | 59 | 94084 | False
One row matching ID 9 was quickly fetched with just one request.
Suppose your most common lookup is based on the name. In this case you would make the name column part of the primary key. Because the name alone may not be unique enough to be the primary key (the primary key has to be unique), you can choose a primary key with both name and ID as follows:
CREATE TABLE census2(
id int,
name varchar,
age int,
zipcode int,
employed boolean,
PRIMARY KEY(name, id)
) WITH CLUSTERING ORDER BY id ASC;
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (1,'Zachary',55,94085,True);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (2,'James',56,94085,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (3,'Kimberly',50,94084,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (4,'Edward',56,94085,True);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (5,'Barry',56,94084,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (6,'Tyler',45,94084,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (7,'James',47,94085,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (8,'Sarah',52,94084,True);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (9,'James',59,94084,False);
INSERT INTO census2 (id,name,age,zipcode,employed) VALUES (10,'Diane',51,94083,False);
When specifying the primary key, the name column is specified first, and id second. This ensures that the data is stored sorted based on name first, and for all matching names, the id is stored sorted in ascending order, ensuring all people with the same name will be stored in the same tablet. This allows you to do a fast lookup on name even though (name, id) is the primary key.
Retrieve all the people with the name James as follows:
select * from census2 where name = 'James';
You will see output similar to the following:
name | id | age | zipcode | employed
-------+----+-----+---------+----------
James | 2 | 56 | 94085 | False
James | 7 | 47 | 94085 | False
James | 9 | 59 | 94084 | False
(5 rows)
There are 3 people named James, and all of them can be quickly looked up as the data has been distributed by name.
Notice that the rows are ordered by id. This is because you specified CLUSTERING ORDER BY id ASC to ensure that the rows with the same name will be stored ordered in the order of the id column.