gdpr

Pipeline-Centric UI All operations moved to Pipeline tab
Domain
Legal
Backend
Elasticsearch
triplets-gdpr
LLM
gpt-4o
text-embedding-3-small
Features
Incremental

Hold Ctrl/Cmd to select multiple

No structured data sources configured

Unstructured Sources 0
No unstructured data sources configured

Structured Data Sources 0

🚪 Pipeline Stage Gates

🎮 Pipeline Controls

⚙️ Advanced Options

Systematic Foundation Discovery Step 0

RootFinder analyzes your document corpus to discover foundational types, tier assignments, and protected documents before extraction begins. This ensures proper ontology alignment.

Status
Not Started
No foundation analysis performed
Outputs
No outputs generated
GenClair Zone Discovery Step 1

GenClair analyzes your document corpus to discover knowledge zones using embedding clustering and domain-fit validation. This creates zone boundaries for focused extraction.

Status
Not Started
No zone discovery performed
Zones Discovered
0
No documents selected
Domain Profile legal
Entity Types (8)
Article A numbered article or section of a regulation or statute
Definition A legally defined term from a regulation
Right A right granted to individuals by a regulation
Obligation A duty or requirement imposed by a regulation
PolicySection A section of a company privacy policy or terms of service
Penalty A fine, sanction, or enforcement action
Exception An exemption, derogation, or exception to a rule
Organization A company, agency, or regulatory body
Relationships (11)
REFERENCES Article or clause references another article
DEFINES Article defines a term used elsewhere
GRANTS_RIGHT Article grants a right to data subjects or individ
IMPOSES_OBLIGATION Article imposes an obligation on controllers, proc
PROVIDES_EXCEPTION Article provides an exception to a requirement
PENALIZES Violation of this article results in a penalty
APPLIES_TO Requirement applies to an entity type (controller,
SUPERSEDES New regulation supersedes an old one
REQUIRES_CONSENT Processing requires consent under this article
COMPLIES_WITH Policy section addresses or complies with a regula
CONFLICTS_WITH Policy section conflicts with or does not meet a r
Enrichment
Generate Comparison Chunks
Generate Eligibility Matrix
Generate Restriction Summary
Generate Policy Chains
Safety Critical
5 question templates
Guardrails
1 input checks
3 output checks
1 data checks
Step 0: Scrape ? Pending
Site Scrape

No scraped pages yet.

Input Dir: /Users/hollyhaughney/Downloads/RIDBFullExport_V1_JSON/output_gdpr/docs
Class Scrape

No class data yet.

Step 1: Extract ? Pending

No extraction data yet. Run pipeline Step 1 (extract).

Step 2: Build ? Pending

No build data yet. Run pipeline Step 2 (build).

Step 3: Enrich ? Pending

No enrichment data yet. Run pipeline Step 3 (enrich).

Step 4: Policies ? Pending

No policy data yet. Run pipeline Step 4 (policies).

Step 5: Lookups ? Pending

No lookup data yet. Run pipeline Step 5 (lookups).

Steps 6-8: Upload / Parse / Configure ? Pending

Not uploaded yet. Run pipeline Steps 6-8 (upload/parse/configure).


Document Registry
Click to load document status

-

Total Tracked

-

New

-

Changed
FileStatusProcessedSteps
Entities & Relationships

Entity Roles

-

Core

-

Supporting

-

Noise

-

Uncertain
Domain - Core Types -
Entity Type Role Score ClusterCompareQAHeal Reason
Suppression Log

Entities blocked from one or more synthesis steps. All remain searchable via retrieval.

EntityTypeRoleBlocked FromReason
Test Questions
Test History
Latest Results

Pipeline Control Center

Manage pipeline execution, monitor progress, and control operations

Ready
Never
Pipeline Operations
Data Sources Configuration
Structured Data
No structured data sources configured
Unstructured Data
No unstructured data sources configured
RootFinder Setup
Ready
Foundation discovery
393 total documents
1 structured (classes.json)
392 unstructured (.txt files)
Est. ~30 seconds
Outputs: 393 doc analysis
GenClair Discovery
Ready
Zone boundaries
RootFinder outputs ready
Est. ~2-3 minutes
Cost: ~$0.15 embeddings
Outputs: Zone manifest
Extract Data
Waiting
Extract entities & relationships
Needs GenClair zones
Est. ~5-10 minutes
Cost: ~$4.20 LLM calls
Outputs: Entity triplets
Pipeline Flow & Status
Overall Progress 0% Complete
Foundation & Discovery
RootFinder (Steps 0.10-0.55):
Source Intake
Pending
Doc Identity
Pending
Ontology Input
Pending
Tier Assignment
Pending
Schema Discovery
Pending
Roots Lock
Pending
GenClair
Pending
Extract
Pending
Promote
Pending
Build & Validate
Canonical
Pending
Build
Pending
Validate Facts
Pending
Enrich
Pending
Domain & Policies
Domain Enrich
Pending
Policies
Pending
Lookups
Pending
Upload
Pending
Index & Test
Index ES
Pending
Auto Tests
Pending
Test
Pending
Evaluate
Pending
Healing
Grader
Pending
Diagnosis
Pending
Heal
Pending
Execution Console
[System] Pipeline console ready. Select an operation to begin.
Run Eval & Patch
ES backend — no Chat ID or Dataset ID needed. Eval uses config settings automatically.
Admin Review Queue

No pending review items.

Scheduled Operations
ES backend — IDs not needed
Cron Setup

Add this line to your crontab (crontab -e):

0 */12 * * * cd /path/to/project && python pipeline/scheduled_ops.py --config clients/gdpr/config.json
Pipeline Log

No pipeline log found.

Query Traces 0

No query traces found. Traces are captured when questions are routed through the RetrievalRouter.

Repair Ledger 0

No repair ledger entries. Run Eval & Patch to populate.

System Health
Repair Stats
Total repairs0
Positive0
Negative0
Neutral0
Query Performance
Traces captured0
Avg latency0ms
Data Freshness

No structured data sources loaded.

No risk report available. Run the pipeline to generate failure predictions.