gtMay16

Pipeline-Centric UI All operations moved to Pipeline tab
Domain
Recreation
Backend
Elasticsearch
LLM
gpt-4o-mini
text-embedding-3-small
Features
Spatial Incremental

Hold Ctrl/Cmd to select multiple

class_schedule

Unstructured Sources 1

Structured Data Sources 1
Filters (10 defined)

🚪 Pipeline Stage Gates

🎮 Pipeline Controls

⚙️ Advanced Options

Systematic Foundation Discovery Step 0

RootFinder analyzes your document corpus to discover foundational types, tier assignments, and protected documents before extraction begins. This ensures proper ontology alignment.

Status
Not Started
No foundation analysis performed
Outputs
No outputs generated
GenClair Zone Discovery Step 1

GenClair analyzes your document corpus to discover knowledge zones using embedding clustering and domain-fit validation. This creates zone boundaries for focused extraction.

Status
Not Started
No zone discovery performed
Zones Discovered
0
No documents selected
Domain Profile recreation
Entity Types (8)
Program A class, lesson, course, or structured activity
Facility A building, room, pool, court, or gym space
Policy A rule, requirement, restriction, or access policy
Fee A price, membership fee, or registration cost
Contact A phone number, email, office, or staff person
Person A person type or group (member, guest, minor, senior, alumni
Amenity Equipment, gear, or service available at a facility
Schedule Operating hours, class times, or semester dates
Relationships (12)
COSTS Entity has a fee or price
LOCATED_IN Entity is at a location or facility
OFFERS Facility or organization offers a program or servi
OFFERS_ACTIVITY Entity offers a specific activity
ELIGIBLE_FOR Person type is eligible for a program or facility
GOVERNED_BY Entity is subject to a policy or rule
REQUIRES Entity requires a prerequisite, certification, or
CONTACTS_VIA Entity can be reached through a contact method
CONTAINS Facility contains a sub-facility or amenity
HAS_AMENITY Facility has an amenity or feature
RESTRICTS Policy restricts or prohibits an action
HAS_SCHEDULE Entity operates on a schedule or time period
Enrichment
Generate Comparison Chunks
Generate Eligibility Matrix
Generate Restriction Summary
Generate Policy Chains
Safety Critical
5 question templates
Guardrails
1 input checks
2 output checks
1 data checks
Step 0: Scrape ? Pending
Site Scrape

No scraped pages yet.

Class Scrape

No class data yet.

Scraper: https://mycrc.gatech.edu (innosoft)
Step 1: Extract ? Pending

No extraction data yet. Run pipeline Step 1 (extract).

Step 2: Build ? Pending

No build data yet. Run pipeline Step 2 (build).

Step 3: Enrich ? Pending

No enrichment data yet. Run pipeline Step 3 (enrich).

Step 4: Policies ? Pending

No policy data yet. Run pipeline Step 4 (policies).

Step 5: Lookups ? Pending

No lookup data yet. Run pipeline Step 5 (lookups).

Steps 6-8: Upload / Parse / Configure ? Pending

Not uploaded yet. Run pipeline Steps 6-8 (upload/parse/configure).


Document Registry
Click to load document status

-

Total Tracked

-

New

-

Changed
FileStatusProcessedSteps
Entities & Relationships

Entity Roles

-

Core

-

Supporting

-

Noise

-

Uncertain
Domain - Core Types -
Entity Type Role Score ClusterCompareQAHeal Reason
Suppression Log

Entities blocked from one or more synthesis steps. All remain searchable via retrieval.

EntityTypeRoleBlocked FromReason
Test Questions
Test History
Latest Results

Pipeline Control Center

Manage pipeline execution, monitor progress, and control operations

Ready
Never
Pipeline Operations
Data Sources Configuration
Structured Data
class_schedule
Path: output_gtMay16/structured/classes.json
Type: Programs data
Source: Innosoft (https://mycrc.gatech.edu)
Unstructured Data
park_documents
Location: output_gtMay16/unstructured
Type: File_path
Entities: Park, Facility, Program, Activity, Organization, Person
RootFinder Setup
Ready
Foundation discovery
393 total documents
1 structured (classes.json)
392 unstructured (.txt files)
Est. ~30 seconds
Outputs: 393 doc analysis
GenClair Discovery
Ready
Zone boundaries
RootFinder outputs ready
Est. ~2-3 minutes
Cost: ~$0.15 embeddings
Outputs: Zone manifest
Extract Data
Waiting
Extract entities & relationships
Needs GenClair zones
Est. ~5-10 minutes
Cost: ~$4.20 LLM calls
Outputs: Entity triplets
Pipeline Flow & Status
Overall Progress 0% Complete
Foundation & Discovery
RootFinder (Steps 0.10-0.55):
Source Intake
Pending
Doc Identity
Pending
Ontology Input
Pending
Tier Assignment
Pending
Schema Discovery
Pending
Roots Lock
Pending
GenClair
Pending
Extract
Pending
Promote
Pending
Build & Validate
Canonical
Pending
Build
Pending
Validate Facts
Pending
Enrich
Pending
Domain & Policies
Domain Enrich
Pending
Policies
Pending
Lookups
Pending
Upload
Pending
Index & Test
Index ES
Pending
Auto Tests
Pending
Test
Pending
Evaluate
Pending
Healing
Grader
Pending
Diagnosis
Pending
Heal
Pending
Execution Console
[System] Pipeline console ready. Select an operation to begin.
Run Eval & Patch
ES backend — no Chat ID or Dataset ID needed. Eval uses config settings automatically.
Admin Review Queue

No pending review items.

Scheduled Operations
ES backend — IDs not needed
Cron Setup

Add this line to your crontab (crontab -e):

0 */12 * * * cd /path/to/project && python pipeline/scheduled_ops.py --config clients/gtMay16/config.json
Pipeline Log

No pipeline log found.

Query Traces 0

No query traces found. Traces are captured when questions are routed through the RetrievalRouter.

Repair Ledger 0

No repair ledger entries. Run Eval & Patch to populate.

System Health
Repair Stats
Total repairs0
Positive0
Negative0
Neutral0
Query Performance
Traces captured0
Avg latency0ms
Data Freshness
class_schedule unknown
No risk report available. Run the pipeline to generate failure predictions.