Frequently asked questions about our approach to PHI de-identification.
We offer enterprise-grade de-identification accuracy (95% F1 score) at a developer-friendly price point, plus features that others charge extra for or don't offer at all:
Private AI is excellent - they support 50+ entity types, 52 languages, and have strong accuracy. They're a great choice for enterprises with complex requirements.
We differentiate on:
| Feature | Private AI | RedactiPHI |
|---|---|---|
| Setup Time | Hours to days (on-prem containers) | 5 minutes (managed API) |
| Pricing Model | $10k+ (reports of $50k+ at scale) | Free tier + transparent pricing |
| Developer Dashboard | No | Yes |
| Joinable Tokens | Unclear | Yes - deterministic per subject |
| Re-identification API | Limited | Full API |
| Webhooks | No | Built-in |
| Cryptographic Receipts | No | Yes |
| Languages | 52 languages | English (multi-language planned) |
Choose Private AI if: You need 50+ languages, have dedicated DevOps for container deployment, or have enterprise procurement budget.
Choose RedactiPHI if: You want to start immediately, need joinable tokens for analytics, want transparent pricing, or need re-identification for LLM workflows.
John Snow Labs has the best accuracy in the industry (96-98% F1). Their Healthcare NLP library is incredibly powerful and customizable.
The tradeoffs:
| Feature | John Snow Labs | RedactiPHI |
|---|---|---|
| Accuracy (F1) | 96-98% | 95% |
| Setup Complexity | Spark cluster required | pip install + API key |
| Pricing | $1.86-$253/hr (AWS), per-server license | Free tier, $0.04/doc |
| Interface | Python library (Spark) | REST API + SDK + Dashboard |
| Re-identification | Build your own | Built-in API |
| Customization | Highly flexible | Policy-based |
| Bulk Processing | Spark-native (massive scale) | API batch endpoint |
Choose John Snow Labs if: You need the absolute highest accuracy, have Spark infrastructure, need deep customization, or are processing billions of records.
Choose RedactiPHI if: You want competitive accuracy without managing Spark, need a simple REST API, want built-in re-identification, or have moderate volume (<1M docs/month).
Cloud provider APIs are convenient but have limitations:
We're cloud-agnostic, offer re-identification, and provide joinable tokens out of the box.
HIPAA note: AWS/Azure are "HIPAA-eligible" but require you to sign a BAA and configure everything correctly. We're HIPAA-compliant out of the box with BAA available.
Microsoft Presidio is a solid framework, but it's exactly that - a framework you need to build on.
Open source is great for learning or highly custom needs. For production healthcare use, you'll spend more on DevOps than our subscription costs.
When we de-identify "John Smith" in document A, we create a token like [NAM_abc123]. When "John Smith" appears in document B (for the same subject), we create the same token.
This is critical for:
Re-identification restores the original values from tokens. The primary use case is LLM workflows:
The LLM never sees real PHI, but the final output is human-readable.
Access control: Re-identification requires the document ID and is audit-logged. You control who can re-identify.
Yes! Common use cases include:
Every de-identification produces a signed receipt containing:
This creates tamper-proof evidence for compliance audits. You can prove what was processed, when, and that the output hasn't been modified since.
We detect all 18 HIPAA Safe Harbor identifiers plus clinical extensions:
On our internal benchmark of real clinical notes:
We use a multi-engine approach: pattern matching, transformer-based NER, medical terminology filtering, and name detection heuristics. Results are reconciled with confidence scoring.
Note: The demo on our homepage uses only the pattern engine for speed. The full API uses all engines.
Currently supported:
Coming soon: