manage.py delete_non_public
Delete Non-Public Data Command
The delete_non_public
management command is a powerful utility for cleaning up
non-public data from the DataWarehouse database. This command is primarily used
for data sanitization, testing environments, or preparing data for public distribution.
DESTRUCTIVE OPERATION: This command permanently deletes data from the database.
There is no undo functionality.
Usage
usage: manage.py delete_non_public [-h] [--dry-run] [--force]
Delete non-public data from the DataWarehouse (KCIDBCheckouts, Issues, and Users)
options:
-h, --help show this help message and exit
--dry-run Show what would be deleted without actually deleting anything
--force Force deletion without confirmation
Example output
$ python manage.py delete_non_public
Items to be deleted:
- Non-public KCIDBCheckouts: 88319
- Non-public Issues: 182
- All Users: 475
WARNING: This will permanently delete 88976 items from the database.
Are you sure you want to continue? [y/N]: y
Deleted 25427 objects related to Issues (cascaded:
{'datawarehouse.IssueRegex': 157, 'datawarehouse.IssueOccurrence': 25088, 'datawarehouse.Issue': 182})
2025-07-02T20:11:17.897000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Deleting orphaned artifacts...
Deleted 47553 objects related to orphaned Artifacts
Using checkout batch size: 245 based on max iid: 245653
2025-07-02T20:13:38.488000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Batch 1: Deleting KCIDBCheckouts from 0 to 245...
Batch 1: Deleted 54066 objects related to KCIDBCheckouts (cascaded:
{'datawarehouse.IssueOccurrence': 28, 'datawarehouse.KCIDBTest_maintainers': 17504, 'datawarehouse.KCIDBTest_provenance': 22840, 'datawarehouse.KCIDBBuild_provenance': 403, 'datawarehouse.Report_addr_to': 263, 'datawarehouse.Report_addr_cc': 203, 'datawarehouse.KCIDBCheckout_patches': 220, 'datawarehouse.KCIDBCheckout_contacts': 20, 'datawarehouse.KCIDBCheckout_provenance': 141, 'datawarehouse.KCIDBTest': 11753, 'datawarehouse.Report': 139, 'datawarehouse.KCIDBBuild': 410, 'datawarehouse.KCIDBCheckout': 142})
2025-07-02T20:13:43.633000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Batch 2: Deleting KCIDBCheckouts from 245 to 490...
Batch 2: Deleted 46484 objects related to KCIDBCheckouts (cascaded: {'datawarehouse.IssueOccurrence': 20, 'datawarehouse.KCIDBTest_maintainers': 14495, 'datawarehouse.KCIDBTest_provenance': 19820, 'datawarehouse.KCIDBBuild_provenance': 392, 'datawarehouse.Report_addr_to': 244, 'datawarehouse.Report_addr_cc': 130, 'datawarehouse.KCIDBCheckout_patches': 505, 'datawarehouse.KCIDBCheckout_contacts': 9, 'datawarehouse.KCIDBCheckout_provenance': 141, 'datawarehouse.KCIDBTest': 10054, 'datawarehouse.Report': 138, 'datawarehouse.KCIDBBuild': 392, 'datawarehouse.KCIDBCheckout': 144})
...1h later...
Batch 1001: ...
Total deleted KCIDBCheckout objects: 62497358
2025-07-02T22:15:47.444000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Deleting orphaned artifacts...
Deleted 5972906 objects related to orphaned Artifacts
Deleted 475 User objects
Successfully deleted all non-public items.
What Gets Deleted
The command removes the following types of data:
1. Non-Public KCIDBCheckouts
- Checkouts where
public != True
(includesFalse
andNone
values) - All related objects cascade automatically:
- KCIDBBuild, KCIDBTest, and KCIDBTestResult children objects
- Related IssueOccurrence objects
- Associated Report, ProvenanceComponent and Patch objects
- Artifact objects
2. Non-Public Issues
- Issues where
policy.name != "public"
(includesinternal
,retrigger
, andNone
policies) - All related IssueOccurrence objects cascade automatically
- All related IssueRegex objects cascade automatically
3. Users models
- All LDAPGroupLink records are deleted
- ALL User objects in the database are deleted, including service accounts and superusers.
- All one-to-one related objects cascade automatically
- Weak references, like
created_by
fields, are set to NULL
4. Orphaned Artifacts
- Artifact objects that are not referenced by any other objects
- The command checks for relationships at:
- KCIDBCheckout.log
- KCIDBBuild.log
- KCIDBBuild.input_files
- KCIDBBuild.output_files
- KCIDBTest.log
- KCIDBTest.output_files
- KCIDBTestResult.output_files
Recommendations
-
Use dry-run when in doubt:
python manage.py delete_non_public --dry-run
-
Create database backup before running the command:
If you are using the recommended development containers:
podman-compose exec db pg_dump -U datawarehouse datawarehouse > backup_$(date +%Y%m%d_%H%M%S).sql
alternatively, you can create a copy of the database inside
psql -U datawarehouse
using:CREATE DATABASE backup WITH TEMPLATE datawarehouse OWNER datawarehouse;
then, restoring from the backup, would look like:
\c backup -- You are now connected to database "backup" as user "datawarehouse". DROP DATABASE datawarehouse; CREATE DATABASE datawarehouse WITH TEMPLATE backup OWNER datawarehouse;
Related Commands
python manage.py clearsessions
- Clean expired sessionspython manage.py flush
- Remove all data from databasepython manage.py loaddata
- Load fixture data