manage.py delete_non_public

Delete Non-Public Data Command

The delete_non_public management command is a powerful utility for cleaning up non-public data from the DataWarehouse database. This command is primarily used for data sanitization, testing environments, or preparing data for public distribution.

Usage

usage: manage.py delete_non_public [-h] [--dry-run] [--force]

Delete non-public data from the DataWarehouse (KCIDBCheckouts, Issues, and Users)

options:
  -h, --help            show this help message and exit
  --dry-run             Show what would be deleted without actually deleting anything
  --force               Force deletion without confirmation

Example output

$ python manage.py delete_non_public

Items to be deleted:
- Non-public KCIDBCheckouts: 88319
- Non-public Issues: 182
- All Users: 475

WARNING: This will permanently delete 88976 items from the database.
Are you sure you want to continue? [y/N]: y

Deleted 25427 objects related to Issues (cascaded:
  {'datawarehouse.IssueRegex': 157, 'datawarehouse.IssueOccurrence': 25088, 'datawarehouse.Issue': 182})

2025-07-02T20:11:17.897000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Deleting orphaned artifacts...
Deleted 47553 objects related to orphaned Artifacts

Using checkout batch size: 245 based on max iid: 245653

2025-07-02T20:13:38.488000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Batch 1: Deleting KCIDBCheckouts from 0 to 245...
Batch 1: Deleted 54066 objects related to KCIDBCheckouts (cascaded:
  {'datawarehouse.IssueOccurrence': 28, 'datawarehouse.KCIDBTest_maintainers': 17504, 'datawarehouse.KCIDBTest_provenance': 22840, 'datawarehouse.KCIDBBuild_provenance': 403, 'datawarehouse.Report_addr_to': 263, 'datawarehouse.Report_addr_cc': 203, 'datawarehouse.KCIDBCheckout_patches': 220, 'datawarehouse.KCIDBCheckout_contacts': 20, 'datawarehouse.KCIDBCheckout_provenance': 141, 'datawarehouse.KCIDBTest': 11753, 'datawarehouse.Report': 139, 'datawarehouse.KCIDBBuild': 410, 'datawarehouse.KCIDBCheckout': 142})
2025-07-02T20:13:43.633000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Batch 2: Deleting KCIDBCheckouts from 245 to 490...
Batch 2: Deleted 46484 objects related to KCIDBCheckouts (cascaded: {'datawarehouse.IssueOccurrence': 20, 'datawarehouse.KCIDBTest_maintainers': 14495, 'datawarehouse.KCIDBTest_provenance': 19820, 'datawarehouse.KCIDBBuild_provenance': 392, 'datawarehouse.Report_addr_to': 244, 'datawarehouse.Report_addr_cc': 130, 'datawarehouse.KCIDBCheckout_patches': 505, 'datawarehouse.KCIDBCheckout_contacts': 9, 'datawarehouse.KCIDBCheckout_provenance': 141, 'datawarehouse.KCIDBTest': 10054, 'datawarehouse.Report': 138, 'datawarehouse.KCIDBBuild': 392, 'datawarehouse.KCIDBCheckout': 144})
...1h later...
Batch 1001: ...

Total deleted KCIDBCheckout objects: 62497358
2025-07-02T22:15:47.444000 - [DEBUG] - cki.datawarehouse.management.commands.delete_non_public - Deleting orphaned artifacts...
Deleted 5972906 objects related to orphaned Artifacts
Deleted 475 User objects
Successfully deleted all non-public items.

What Gets Deleted

The command removes the following types of data:

1. Non-Public KCIDBCheckouts

  • Checkouts where public != True (includes False and None values)
  • All related objects cascade automatically:
    • KCIDBBuild, KCIDBTest, and KCIDBTestResult children objects
    • Related IssueOccurrence objects
    • Associated Report, ProvenanceComponent and Patch objects
    • Artifact objects

2. Non-Public Issues

  • Issues where policy.name != "public" (includes internal, retrigger, and None policies)
  • All related IssueOccurrence objects cascade automatically
  • All related IssueRegex objects cascade automatically

3. Users models

  • All LDAPGroupLink records are deleted
  • ALL User objects in the database are deleted, including service accounts and superusers.
  • All one-to-one related objects cascade automatically
  • Weak references, like created_by fields, are set to NULL

4. Orphaned Artifacts

  • Artifact objects that are not referenced by any other objects
  • The command checks for relationships at:
    • KCIDBCheckout.log
    • KCIDBBuild.log
    • KCIDBBuild.input_files
    • KCIDBBuild.output_files
    • KCIDBTest.log
    • KCIDBTest.output_files
    • KCIDBTestResult.output_files

Recommendations

  1. Use dry-run when in doubt:

    python manage.py delete_non_public --dry-run
    
  2. Create database backup before running the command:

    If you are using the recommended development containers:

    podman-compose exec db pg_dump -U datawarehouse datawarehouse > backup_$(date +%Y%m%d_%H%M%S).sql
    

    alternatively, you can create a copy of the database inside psql -U datawarehouse using:

    CREATE DATABASE backup WITH TEMPLATE datawarehouse OWNER datawarehouse;
    

    then, restoring from the backup, would look like:

    \c backup
    -- You are now connected to database "backup" as user "datawarehouse".
    DROP DATABASE datawarehouse;
    CREATE DATABASE datawarehouse WITH TEMPLATE backup OWNER datawarehouse;
    
  • python manage.py clearsessions - Clean expired sessions
  • python manage.py flush - Remove all data from database
  • python manage.py loaddata - Load fixture data