Skip to content

Component Runner API Reference

The Heritage Data Processor Component Runner API provides endpoints for executing pipeline components, monitoring their execution in real-time, and managing component lifecycles.

Base URL

All endpoints are prefixed with /components.


Component Execution

Run Pipeline Component

Initiates asynchronous execution of a specified pipeline component with provided inputs and parameters.

Endpoint: POST /components/<component_name>/run

URL Parameters:

  • component_name (string, required): Name of the component to execute (e.g., xml_processor, data_validator)

Request Body:

{
  "inputs": {
    "input_file": "/path/to/source.xml",
    "config_file": "/path/to/config.yaml"
  },
  "parameters": {
    "validate": true,
    "output_format": "json",
    "verbose": true
  },
  "output_directory": "/path/to/output"
}

Request Parameters:

  • inputs (object, optional): Dictionary mapping input names to file paths. Keys must match the input definitions in the component specification
  • parameters (object, optional): Dictionary of runtime parameters. Keys must match parameter definitions in the component specification
  • output_directory (string, required): Absolute path to the directory where output files should be created

Response:

{
  "success": true,
  "execution_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Component execution started",
  "command": "python main.py --input \"/path/to/source.xml\" --output \"/path/to/output/source_processed.xml\" --validate",
  "output_strategy": "single_file_output",
  "estimated_outputs": ["/path/to/output/source_processed.xml"]
}

Response Fields:

  • success (boolean): Always true for successful execution start
  • execution_id (string): Unique UUID identifying this execution instance. Use this ID to monitor, cancel, or query the execution
  • message (string): Human-readable confirmation message
  • command (string): The complete shell command that will be executed, with arguments properly quoted
  • output_strategy (string): The output strategy detected for this component (e.g., single_file_output, directory, single_file_output_file)
  • estimated_outputs (array): List of predicted output file paths based on component specification and output strategy

Status Codes:

  • 202 Accepted: Component execution started successfully. The execution runs asynchronously
  • 400 Bad Request: Invalid JSON payload or missing output_directory parameter
  • 404 Not Found: Component not found, or component's Python executable does not exist
  • 500 Internal Server Error: Unexpected error during execution setup

Execution Process:

The endpoint performs the following steps:

  1. Validates the request payload
  2. Delegates command construction to the build_full_command utility function
  3. Generates a unique execution ID
  4. Starts asynchronous execution via the component_executor service
  5. Returns immediately with the execution ID

Log Streaming

Stream Component Logs

Streams real-time execution logs using Server-Sent Events (SSE) for live monitoring of component execution.

Endpoint: GET /components/logs/<execution_id>

URL Parameters:

  • execution_id (string, required): UUID of the execution to monitor

Response Format:

Server-Sent Events stream with Content-Type text/event-stream

Event Format:

data: {"level": "info", "message": "Processing file: source.xml"}

data: {"level": "warning", "message": "Validation warning: missing optional field"}

data: {"level": "error", "message": "Processing failed", "status": "failed"}

Log Message Fields:

  • level (string): Log level (info, warning, error, debug)
  • message (string): Log message content
  • status (string, optional): Execution status when the execution completes, fails, or is cancelled. Values: completed, failed, cancelled

Stream Behavior:

The stream remains open and sends log messages as they are generated by the component. Heartbeat messages (: heartbeat\n\n) are sent periodically to keep the connection alive. The stream automatically closes when the execution reaches a terminal status (completed, failed, or cancelled).

Error Handling:

If the execution is not found, the stream sends a single error message and closes:

data: {"level": "error", "message": "Execution not found"}

Headers:

  • Content-Type: text/event-stream
  • Cache-Control: no-cache
  • Connection: keep-alive

Status Codes:

  • 200 OK: Stream established successfully

Execution Control

Cancel Component Execution

Terminates a running component execution and releases associated resources.

Endpoint: POST /components/<execution_id>/cancel

URL Parameters:

  • execution_id (string, required): UUID of the execution to cancel

Response:

{
  "success": true,
  "message": "Execution cancelled"
}

Response Fields:

  • success (boolean): Always true when cancellation succeeds
  • message (string): Confirmation message

Error Response:

{
  "error": "Execution not found or already completed"
}

Status Codes:

  • 200 OK: Execution cancelled successfully
  • 404 Not Found: Execution does not exist or has already reached a terminal state (completed, failed, or cancelled)

Cancellation Behavior:

Cancellation terminates the underlying subprocess immediately. Any log streams monitoring this execution will receive a final log message with status: "cancelled" and close.

Immediate Termination

Component execution is terminated immediately without cleanup. Partial output files may exist in the output directory.


Status Monitoring

Get Execution Status

Retrieves the current status and results of a specific component execution.

Endpoint: GET /components/executions/<execution_id>/status

URL Parameters:

  • execution_id (string, required): UUID of the execution to query

Response (Running):

{
  "status": "running"
}

Response (Completed):

{
  "status": "completed",
  "results": {
    "output_files": [
      "/path/to/output/source_processed.xml",
      "/path/to/output/metadata.json"
    ]
  }
}

Response (Failed):

{
  "status": "failed",
  "error": "Execution 550e8400-e29b-41d4-a716-446655440000 failed."
}

Response Fields:

  • status (string): Current execution status. Values: running, completed, failed, cancelled
  • results (object, optional): Only present when status is completed. Contains execution results
  • output_files (array): List of actual output file paths created by the component
  • error (string, optional): Only present when status is failed. Contains error description

Status Codes:

  • 200 OK: Status retrieved successfully
  • 404 Not Found: Execution with the specified ID does not exist

Output Files:

For completed executions, the output_files array contains the actual paths determined by the output strategy. These paths may differ from the estimated_outputs returned during execution start if the component produced additional or differently-named files.


Execution Lifecycle

Status Flow

Component executions progress through the following states:

  1. Created: Execution initialized but not yet started (internal state)
  2. Running: Component is actively executing
  3. Completed: Execution finished successfully with output files available
  4. Failed: Execution terminated with an error
  5. Cancelled: Execution terminated by user request

Terminal States

Once an execution reaches a terminal state (completed, failed, or cancelled), it cannot be modified or restarted. Attempting to cancel an execution in a terminal state returns a 404 error.


Error Responses

All error responses follow a consistent format:

{
  "error": "Descriptive error message"
}

Common Error Scenarios

Invalid JSON Payload:

POST /components/xml_processor/run
Content-Type: application/json

{ invalid json }

Response: 400 Bad Request with error message "Invalid JSON payload"

Missing Output Directory:

{
  "inputs": {"file": "/path/to/input.xml"},
  "parameters": {}
}

Response: 400 Bad Request with error message "Missing 'output_directory'"

Component Not Found:

POST /components/nonexistent_component/run

Response: 404 Not Found with error message indicating the component's Python executable was not found

Execution Not Found:

GET /components/executions/invalid-uuid/status

Response: 404 Not Found with error message "Execution not found"


Usage Examples

Example 1: Execute Component with Inputs and Parameters

POST /components/xml_validator/run
Content-Type: application/json

{
  "inputs": {
    "input_file": "/data/manuscript.xml"
  },
  "parameters": {
    "schema": "tei_all",
    "strict_mode": true,
    "generate_report": true
  },
  "output_directory": "/output/validation"
}

Response:

{
  "success": true,
  "execution_id": "a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789",
  "message": "Component execution started",
  "command": "python main.py --input_file \"/data/manuscript.xml\" --schema tei_all --strict_mode --generate_report --output \"/output/validation/manuscript_validated.xml\" --verbose",
  "output_strategy": "single_file_output",
  "estimated_outputs": ["/output/validation/manuscript_validated.xml"]
}

Example 2: Monitor Execution with SSE

const eventSource = new EventSource(
  '/components/logs/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789'
);

eventSource.onmessage = (event) => {
  const log = JSON.parse(event.data);
  console.log(`[${log.level}] ${log.message}`);

  if (log.status && log.status !== 'running') {
    console.log(`Execution ${log.status}`);
    eventSource.close();
  }
};

Stream Output:

data: {"level": "info", "message": "Starting validation process"}

data: {"level": "info", "message": "Loading schema: tei_all"}

data: {"level": "warning", "message": "Non-standard element detected: <customTag>"}

data: {"level": "info", "message": "Validation completed successfully", "status": "completed"}

Example 3: Check Execution Status

GET /components/executions/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789/status

Response:

{
  "status": "completed",
  "results": {
    "output_files": [
      "/output/validation/manuscript_validated.xml",
      "/output/validation/validation_report.html"
    ]
  }
}

Example 4: Cancel Running Execution

POST /components/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789/cancel

Response:

{
  "success": true,
  "message": "Execution cancelled"
}

Integration with Component Service

Component Executor Service

The API delegates execution management to the component_executor service, which handles:

  • Asynchronous subprocess management
  • Log queue management for SSE streaming
  • Execution state tracking
  • Resource cleanup

Command Building

The API uses the build_full_command utility function to construct execution commands, which automatically:

  • Locates the component's Python executable
  • Introspects CLI patterns
  • Determines output strategies
  • Merges installation configurations with runtime parameters
  • Adds all inputs and parameters to the command

Data Types & Formats

Execution ID Format

Execution IDs are UUID version 4 strings in the format xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx.

Input and Parameter Structure

Inputs and parameters must be provided as flat JSON objects with string keys:

{
  "inputs": {
    "key1": "value1",
    "key2": "value2"
  },
  "parameters": {
    "param1": true,
    "param2": "string_value",
    "param3": 42
  }
}

Path Requirements

All file paths in inputs and output_directory must be absolute paths. Relative paths may cause execution failures.


Server-Sent Events Details

SSE Protocol

The log streaming endpoint implements the Server-Sent Events specification:

Message Format: Each event consists of one or more lines prefixed with data:, followed by a blank line

Heartbeat: Empty comment lines (: heartbeat\n\n) are sent periodically to prevent timeout

Connection Management: Clients should handle reconnection if the connection drops before receiving a terminal status

Client Implementation

SSE clients should:

  • Parse JSON from each data: line
  • Check for status field to detect completion
  • Close the connection after receiving a terminal status
  • Handle connection errors gracefully

Logging

The module uses Python's standard logging framework with logger name component_runner_bp:

Error Level: File not found errors and unexpected exceptions are logged with full stack traces