Skip to main content

Configure anonymisation rules

DocuDesk's anonymisation pipeline detects entities (PERSON, ADDRESS, BSN, PHONE_NUMBER, EMAIL, …) through the configured backend (Presidio by default). The default rule set fits most Dutch government use cases, but you'll usually want to tune it for your domain — adding custom entity types, raising the confidence threshold for noisy fields, or excluding entities that aren't sensitive in your context.

Goal

By the end you will have reviewed the active anonymisation rule set, added one custom rule, and verified that the rule fires when you re-run an anonymisation.

Prerequisites

  • You are an administrator on the Nextcloud instance, or your user has been granted the DocuDesk Anonymisation curator role.
  • The Presidio backend is reachable from Nextcloud and the connection has been tested at least once (Settings → DocuDesk → Anonymisation → Test connection).

Steps

  1. Open Settings → DocuDesk → Anonymisation. The active rule set is listed in three sections: Entities to detect, Confidence thresholds and Custom recognisers.

    Anonymisation settings overview

  2. Review Entities to detect. Each entity type has an Enabled toggle and a Replacement strategy ([ENTITY_TYPE], ***, custom string). Tighten or loosen this for your domain.

    Entity-type toggles

  3. Under Custom recognisers, click Add custom recogniser. Declare the entity type name, the regex pattern that matches it (Presidio uses Python re-style regex), and the replacement strategy. Save.

    Adding a custom recogniser

  4. Verify by running anonymisation against a document that contains a string matching your new regex. The detection table should list a row of your custom entity type at the position of the match.

    Custom recogniser firing

Verification

You are done when: the custom recogniser appears in the Custom recognisers list, an anonymisation run against a test document produces a detection row of your custom entity type, and the resulting redacted document has the match replaced with your declared replacement.

Common issues

SymptomFix
Custom recogniser saves but never firesThe regex pattern doesn't match — Presidio uses anchored matches by default, so wrap with .* if you need substring matching.
Detection table fires twice on the same span (one default, one custom)Two recognisers overlap. Either disable the default entity type for that span or raise its confidence threshold so your custom one wins.
Connection test fails after the rule changeThe Presidio backend rejected the rule payload (invalid regex). Check Settings → DocuDesk → Logs for the Presidio response.

Reference