Component Runner API Reference¶

The Heritage Data Processor Component Runner API provides endpoints for executing pipeline components, monitoring their execution in real-time, and managing component lifecycles.

Base URL¶

All endpoints are prefixed with /components.

Component Execution¶

Run Pipeline Component¶

Initiates asynchronous execution of a specified pipeline component with provided inputs and parameters.

Endpoint: POST /components/<component_name>/run

URL Parameters:

component_name (string, required): Name of the component to execute (e.g., xml_processor, data_validator)

Request Body:

{
  "inputs": {
    "input_file": "/path/to/source.xml",
    "config_file": "/path/to/config.yaml"
  },
  "parameters": {
    "validate": true,
    "output_format": "json",
    "verbose": true
  },
  "output_directory": "/path/to/output"
}

Request Parameters:

inputs (object, optional): Dictionary mapping input names to file paths. Keys must match the input definitions in the component specification
parameters (object, optional): Dictionary of runtime parameters. Keys must match parameter definitions in the component specification
output_directory (string, required): Absolute path to the directory where output files should be created

Response:

{
  "success": true,
  "execution_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Component execution started",
  "command": "python main.py --input \"/path/to/source.xml\" --output \"/path/to/output/source_processed.xml\" --validate",
  "output_strategy": "single_file_output",
  "estimated_outputs": ["/path/to/output/source_processed.xml"]
}

Response Fields:

success (boolean): Always true for successful execution start
execution_id (string): Unique UUID identifying this execution instance. Use this ID to monitor, cancel, or query the execution
message (string): Human-readable confirmation message
command (string): The complete shell command that will be executed, with arguments properly quoted
output_strategy (string): The output strategy detected for this component (e.g., single_file_output, directory, single_file_output_file)
estimated_outputs (array): List of predicted output file paths based on component specification and output strategy

Status Codes:

202 Accepted: Component execution started successfully. The execution runs asynchronously
400 Bad Request: Invalid JSON payload or missing output_directory parameter
404 Not Found: Component not found, or component's Python executable does not exist
500 Internal Server Error: Unexpected error during execution setup

Execution Process:

The endpoint performs the following steps:

Validates the request payload
Delegates command construction to the build_full_command utility function
Generates a unique execution ID
Starts asynchronous execution via the component_executor service
Returns immediately with the execution ID

Log Streaming¶

Stream Component Logs¶

Streams real-time execution logs using Server-Sent Events (SSE) for live monitoring of component execution.

Endpoint: GET /components/logs/<execution_id>

URL Parameters:

execution_id (string, required): UUID of the execution to monitor

Response Format:

Server-Sent Events stream with Content-Type text/event-stream

Event Format:

data: {"level": "info", "message": "Processing file: source.xml"}

data: {"level": "warning", "message": "Validation warning: missing optional field"}

data: {"level": "error", "message": "Processing failed", "status": "failed"}

Log Message Fields:

level (string): Log level (info, warning, error, debug)
message (string): Log message content
status (string, optional): Execution status when the execution completes, fails, or is cancelled. Values: completed, failed, cancelled

Stream Behavior:

The stream remains open and sends log messages as they are generated by the component. Heartbeat messages (: heartbeat\n\n) are sent periodically to keep the connection alive. The stream automatically closes when the execution reaches a terminal status (completed, failed, or cancelled).

Error Handling:

If the execution is not found, the stream sends a single error message and closes:

data: {"level": "error", "message": "Execution not found"}

Headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Status Codes:

200 OK: Stream established successfully

Execution Control¶

Cancel Component Execution¶

Terminates a running component execution and releases associated resources.

Endpoint: POST /components/<execution_id>/cancel

URL Parameters:

execution_id (string, required): UUID of the execution to cancel

Response:

{
  "success": true,
  "message": "Execution cancelled"
}

Response Fields:

success (boolean): Always true when cancellation succeeds
message (string): Confirmation message

Error Response:

{
  "error": "Execution not found or already completed"
}

Status Codes:

200 OK: Execution cancelled successfully
404 Not Found: Execution does not exist or has already reached a terminal state (completed, failed, or cancelled)

Cancellation Behavior:

Cancellation terminates the underlying subprocess immediately. Any log streams monitoring this execution will receive a final log message with status: "cancelled" and close.

Immediate Termination

Component execution is terminated immediately without cleanup. Partial output files may exist in the output directory.

Status Monitoring¶

Get Execution Status¶

Retrieves the current status and results of a specific component execution.

Endpoint: GET /components/executions/<execution_id>/status

URL Parameters:

execution_id (string, required): UUID of the execution to query

Response (Running):

{
  "status": "running"
}

Response (Completed):

{
  "status": "completed",
  "results": {
    "output_files": [
      "/path/to/output/source_processed.xml",
      "/path/to/output/metadata.json"
    ]
  }
}

Response (Failed):

{
  "status": "failed",
  "error": "Execution 550e8400-e29b-41d4-a716-446655440000 failed."
}

Response Fields:

status (string): Current execution status. Values: running, completed, failed, cancelled
results (object, optional): Only present when status is completed. Contains execution results
output_files (array): List of actual output file paths created by the component
error (string, optional): Only present when status is failed. Contains error description

Status Codes:

200 OK: Status retrieved successfully
404 Not Found: Execution with the specified ID does not exist

Output Files:

For completed executions, the output_files array contains the actual paths determined by the output strategy. These paths may differ from the estimated_outputs returned during execution start if the component produced additional or differently-named files.

Execution Lifecycle¶

Status Flow¶

Component executions progress through the following states:

Created: Execution initialized but not yet started (internal state)
Running: Component is actively executing
Completed: Execution finished successfully with output files available
Failed: Execution terminated with an error
Cancelled: Execution terminated by user request

Terminal States¶

Once an execution reaches a terminal state (completed, failed, or cancelled), it cannot be modified or restarted. Attempting to cancel an execution in a terminal state returns a 404 error.

Error Responses¶

All error responses follow a consistent format:

{
  "error": "Descriptive error message"
}

Common Error Scenarios¶

Invalid JSON Payload:

POST /components/xml_processor/run
Content-Type: application/json

{ invalid json }

Response: 400 Bad Request with error message "Invalid JSON payload"

Missing Output Directory:

{
  "inputs": {"file": "/path/to/input.xml"},
  "parameters": {}
}

Response: 400 Bad Request with error message "Missing 'output_directory'"

Component Not Found:

POST /components/nonexistent_component/run

Response: 404 Not Found with error message indicating the component's Python executable was not found

Execution Not Found:

GET /components/executions/invalid-uuid/status

Response: 404 Not Found with error message "Execution not found"

Usage Examples¶

Example 1: Execute Component with Inputs and Parameters¶

POST /components/xml_validator/run
Content-Type: application/json

{
  "inputs": {
    "input_file": "/data/manuscript.xml"
  },
  "parameters": {
    "schema": "tei_all",
    "strict_mode": true,
    "generate_report": true
  },
  "output_directory": "/output/validation"
}

Response:

{
  "success": true,
  "execution_id": "a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789",
  "message": "Component execution started",
  "command": "python main.py --input_file \"/data/manuscript.xml\" --schema tei_all --strict_mode --generate_report --output \"/output/validation/manuscript_validated.xml\" --verbose",
  "output_strategy": "single_file_output",
  "estimated_outputs": ["/output/validation/manuscript_validated.xml"]
}

Example 2: Monitor Execution with SSE¶

const eventSource = new EventSource(
  '/components/logs/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789'
);

eventSource.onmessage = (event) => {
  const log = JSON.parse(event.data);
  console.log(`[${log.level}] ${log.message}`);

  if (log.status && log.status !== 'running') {
    console.log(`Execution ${log.status}`);
    eventSource.close();
  }
};

Stream Output:

data: {"level": "info", "message": "Starting validation process"}

data: {"level": "info", "message": "Loading schema: tei_all"}

data: {"level": "warning", "message": "Non-standard element detected: <customTag>"}

data: {"level": "info", "message": "Validation completed successfully", "status": "completed"}

Example 3: Check Execution Status¶

GET /components/executions/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789/status

Response:

{
  "status": "completed",
  "results": {
    "output_files": [
      "/output/validation/manuscript_validated.xml",
      "/output/validation/validation_report.html"
    ]
  }
}

Example 4: Cancel Running Execution¶

POST /components/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789/cancel

Response:

{
  "success": true,
  "message": "Execution cancelled"
}

Integration with Component Service¶

Component Executor Service¶

The API delegates execution management to the component_executor service, which handles:

Asynchronous subprocess management
Log queue management for SSE streaming
Execution state tracking
Resource cleanup

Command Building¶

The API uses the build_full_command utility function to construct execution commands, which automatically:

Locates the component's Python executable
Introspects CLI patterns
Determines output strategies
Merges installation configurations with runtime parameters
Adds all inputs and parameters to the command

Data Types & Formats¶

Execution ID Format¶

Execution IDs are UUID version 4 strings in the format xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx.

Input and Parameter Structure¶

Inputs and parameters must be provided as flat JSON objects with string keys:

{
  "inputs": {
    "key1": "value1",
    "key2": "value2"
  },
  "parameters": {
    "param1": true,
    "param2": "string_value",
    "param3": 42
  }
}

Path Requirements¶

All file paths in inputs and output_directory must be absolute paths. Relative paths may cause execution failures.

Server-Sent Events Details¶

SSE Protocol¶

The log streaming endpoint implements the Server-Sent Events specification:

Message Format: Each event consists of one or more lines prefixed with data:, followed by a blank line

Heartbeat: Empty comment lines (: heartbeat\n\n) are sent periodically to prevent timeout

Connection Management: Clients should handle reconnection if the connection drops before receiving a terminal status

Client Implementation¶

SSE clients should:

Parse JSON from each data: line
Check for status field to detect completion
Close the connection after receiving a terminal status
Handle connection errors gracefully

Logging¶

The module uses Python's standard logging framework with logger name component_runner_bp:

Error Level: File not found errors and unexpected exceptions are logged with full stack traces