src/postgres/third-party-extensions/postgresql_anonymizer/docs/how-to/2-dynamic_masking.md
With Dynamic Masking, the database owner can hide personnal data for some users, while other users are still allowed to read and write the authentic data.
Paul has 2 employees:
In this section, we will learn:
DROP TABLE IF EXISTS supplier CASCADE;
DROP TABLE IF EXISTS company CASCADE;
CREATE TABLE company (
id SERIAL PRIMARY KEY,
name TEXT,
vat_id TEXT UNIQUE
);
INSERT INTO company
VALUES
(952,'Shadrach', 'FR62684255667'),
(194,E'Johnny\'s Shoe Store','CHE670945644'),
(346,'Capitol Records','GB663829617823')
;
SELECT * FROM company;
CREATE TABLE supplier (
id SERIAL PRIMARY KEY,
fk_company_id INT REFERENCES company(id),
contact TEXT,
phone TEXT,
job_title TEXT
);
INSERT INTO supplier
VALUES
(299,194,'Johnny Ryall','597-500-569','CEO'),
(157,346,'George Clinton', '131-002-530','Sales manager')
;
SELECT * FROM supplier;
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
SELECT setseed(0);
SELECT anon.start_dynamic_masking();
SECURITY LABEL FOR anon ON ROLE pierre IS 'MASKED';
GRANT SELECT ON supplier TO pierre;
GRANT ALL ON SCHEMA public TO jack;
GRANT ALL ON ALL TABLES IN SCHEMA public TO jack;
Now connect as Pierre and try to read the supplier table:
SELECT * FROM supplier;
For the moment, there is no masking rule so Pierre can see the original data in each table.
Connect as Paul and define a masking rule on the supplier table:
SECURITY LABEL FOR anon ON COLUMN supplier.contact
IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';
Now connect as Pierre and try to read the supplier table again:
SELECT * FROM supplier;
Now connect as Jack and try to read the real data:
SELECT * FROM supplier;
Masking the supplier name is clearly not enough to provide anonymity.
Connect as Pierre and write a simple SQL query that would reindentify some suppliers based on their job and their company.
Company names and job positions are available in many public datasets. A simple search on Linkedin or Google, would give you the names of the top executives of most companies..
This is called Linkability: the ability to connect multiple records concerning the same data subject.
We need to anonymize the "company" table, too. Even if they don't contain personal information, some fields can be used to infer the identity of their employees...
Write 2 masking rules for the company table. The first one will replace the "name" field with a fake name. The second will replace the "vat_id" with a random sequence of 10 characters
HINT: Go to the documentation and look at the faking functions and random functions!
Connect as Pierre and check that he cannot view the real company info:
Because of dynamic masking, the fake values will be different everytime Pierre tries to read the table.
Pierre would like to have always the same fake values for a given company. This is called pseudonymization.
Write a new masking rule over the "vat_id" field by generating 10 random characters using the md5() function.
Write a new masking rule over the "name" field by using a pseudonymizing function.
SELECT s.id, s.contact, s.job_title, c.name
FROM supplier s
JOIN company c ON s.fk_company_id = c.id;
SECURITY LABEL FOR anon ON COLUMN company.name
IS 'MASKED WITH FUNCTION anon.fake_company()';
SECURITY LABEL FOR anon ON COLUMN company.vat_id
IS 'MASKED WITH FUNCTION anon.random_string(10)';
Now connect as Pierre and read the table again:
SELECT * FROM company;
Pierre will see different "fake data" everytime he reads the table:
SELECT * FROM company;
ALTER FUNCTION anon.pseudo_company SECURITY DEFINER;
SECURITY LABEL FOR anon ON COLUMN company.name
IS 'MASKED WITH FUNCTION anon.pseudo_company(id)';
Connect as Pierre and read the table multiple times:
SELECT * FROM company;
SELECT * FROM company;
Now the fake company name is always the same.