Component Runner API Reference¶
The Heritage Data Processor Component Runner API provides endpoints for executing pipeline components, monitoring their execution in real-time, and managing component lifecycles.
Base URL¶
All endpoints are prefixed with /components.
Component Execution¶
Run Pipeline Component¶
Initiates asynchronous execution of a specified pipeline component with provided inputs and parameters.
Endpoint: POST /components/<component_name>/run
URL Parameters:
component_name(string, required): Name of the component to execute (e.g.,xml_processor,data_validator)
Request Body:
{
"inputs": {
"input_file": "/path/to/source.xml",
"config_file": "/path/to/config.yaml"
},
"parameters": {
"validate": true,
"output_format": "json",
"verbose": true
},
"output_directory": "/path/to/output"
}
Request Parameters:
inputs(object, optional): Dictionary mapping input names to file paths. Keys must match the input definitions in the component specificationparameters(object, optional): Dictionary of runtime parameters. Keys must match parameter definitions in the component specificationoutput_directory(string, required): Absolute path to the directory where output files should be created
Response:
{
"success": true,
"execution_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Component execution started",
"command": "python main.py --input \"/path/to/source.xml\" --output \"/path/to/output/source_processed.xml\" --validate",
"output_strategy": "single_file_output",
"estimated_outputs": ["/path/to/output/source_processed.xml"]
}
Response Fields:
success(boolean): Alwaystruefor successful execution startexecution_id(string): Unique UUID identifying this execution instance. Use this ID to monitor, cancel, or query the executionmessage(string): Human-readable confirmation messagecommand(string): The complete shell command that will be executed, with arguments properly quotedoutput_strategy(string): The output strategy detected for this component (e.g.,single_file_output,directory,single_file_output_file)estimated_outputs(array): List of predicted output file paths based on component specification and output strategy
Status Codes:
202 Accepted: Component execution started successfully. The execution runs asynchronously400 Bad Request: Invalid JSON payload or missingoutput_directoryparameter404 Not Found: Component not found, or component's Python executable does not exist500 Internal Server Error: Unexpected error during execution setup
Execution Process:
The endpoint performs the following steps:
- Validates the request payload
- Delegates command construction to the
build_full_commandutility function - Generates a unique execution ID
- Starts asynchronous execution via the
component_executorservice - Returns immediately with the execution ID
Log Streaming¶
Stream Component Logs¶
Streams real-time execution logs using Server-Sent Events (SSE) for live monitoring of component execution.
Endpoint: GET /components/logs/<execution_id>
URL Parameters:
execution_id(string, required): UUID of the execution to monitor
Response Format:
Server-Sent Events stream with Content-Type text/event-stream
Event Format:
data: {"level": "info", "message": "Processing file: source.xml"}
data: {"level": "warning", "message": "Validation warning: missing optional field"}
data: {"level": "error", "message": "Processing failed", "status": "failed"}
Log Message Fields:
level(string): Log level (info,warning,error,debug)message(string): Log message contentstatus(string, optional): Execution status when the execution completes, fails, or is cancelled. Values:completed,failed,cancelled
Stream Behavior:
The stream remains open and sends log messages as they are generated by the component. Heartbeat messages (: heartbeat\n\n) are sent periodically to keep the connection alive. The stream automatically closes when the execution reaches a terminal status (completed, failed, or cancelled).
Error Handling:
If the execution is not found, the stream sends a single error message and closes:
Headers:
Content-Type:text/event-streamCache-Control:no-cacheConnection:keep-alive
Status Codes:
200 OK: Stream established successfully
Execution Control¶
Cancel Component Execution¶
Terminates a running component execution and releases associated resources.
Endpoint: POST /components/<execution_id>/cancel
URL Parameters:
execution_id(string, required): UUID of the execution to cancel
Response:
Response Fields:
success(boolean): Alwaystruewhen cancellation succeedsmessage(string): Confirmation message
Error Response:
Status Codes:
200 OK: Execution cancelled successfully404 Not Found: Execution does not exist or has already reached a terminal state (completed,failed, orcancelled)
Cancellation Behavior:
Cancellation terminates the underlying subprocess immediately. Any log streams monitoring this execution will receive a final log message with status: "cancelled" and close.
Immediate Termination
Component execution is terminated immediately without cleanup. Partial output files may exist in the output directory.
Status Monitoring¶
Get Execution Status¶
Retrieves the current status and results of a specific component execution.
Endpoint: GET /components/executions/<execution_id>/status
URL Parameters:
execution_id(string, required): UUID of the execution to query
Response (Running):
Response (Completed):
{
"status": "completed",
"results": {
"output_files": [
"/path/to/output/source_processed.xml",
"/path/to/output/metadata.json"
]
}
}
Response (Failed):
Response Fields:
status(string): Current execution status. Values:running,completed,failed,cancelledresults(object, optional): Only present whenstatusiscompleted. Contains execution resultsoutput_files(array): List of actual output file paths created by the componenterror(string, optional): Only present whenstatusisfailed. Contains error description
Status Codes:
200 OK: Status retrieved successfully404 Not Found: Execution with the specified ID does not exist
Output Files:
For completed executions, the output_files array contains the actual paths determined by the output strategy. These paths may differ from the estimated_outputs returned during execution start if the component produced additional or differently-named files.
Execution Lifecycle¶
Status Flow¶
Component executions progress through the following states:
- Created: Execution initialized but not yet started (internal state)
- Running: Component is actively executing
- Completed: Execution finished successfully with output files available
- Failed: Execution terminated with an error
- Cancelled: Execution terminated by user request
Terminal States¶
Once an execution reaches a terminal state (completed, failed, or cancelled), it cannot be modified or restarted. Attempting to cancel an execution in a terminal state returns a 404 error.
Error Responses¶
All error responses follow a consistent format:
Common Error Scenarios¶
Invalid JSON Payload:
Response: 400 Bad Request with error message "Invalid JSON payload"
Missing Output Directory:
Response: 400 Bad Request with error message "Missing 'output_directory'"
Component Not Found:
Response: 404 Not Found with error message indicating the component's Python executable was not found
Execution Not Found:
Response: 404 Not Found with error message "Execution not found"
Usage Examples¶
Example 1: Execute Component with Inputs and Parameters¶
POST /components/xml_validator/run
Content-Type: application/json
{
"inputs": {
"input_file": "/data/manuscript.xml"
},
"parameters": {
"schema": "tei_all",
"strict_mode": true,
"generate_report": true
},
"output_directory": "/output/validation"
}
Response:
{
"success": true,
"execution_id": "a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789",
"message": "Component execution started",
"command": "python main.py --input_file \"/data/manuscript.xml\" --schema tei_all --strict_mode --generate_report --output \"/output/validation/manuscript_validated.xml\" --verbose",
"output_strategy": "single_file_output",
"estimated_outputs": ["/output/validation/manuscript_validated.xml"]
}
Example 2: Monitor Execution with SSE¶
const eventSource = new EventSource(
'/components/logs/a1b2c3d4-e5f6-4789-a012-b3c4d5e6f789'
);
eventSource.onmessage = (event) => {
const log = JSON.parse(event.data);
console.log(`[${log.level}] ${log.message}`);
if (log.status && log.status !== 'running') {
console.log(`Execution ${log.status}`);
eventSource.close();
}
};
Stream Output:
data: {"level": "info", "message": "Starting validation process"}
data: {"level": "info", "message": "Loading schema: tei_all"}
data: {"level": "warning", "message": "Non-standard element detected: <customTag>"}
data: {"level": "info", "message": "Validation completed successfully", "status": "completed"}
Example 3: Check Execution Status¶
Response:
{
"status": "completed",
"results": {
"output_files": [
"/output/validation/manuscript_validated.xml",
"/output/validation/validation_report.html"
]
}
}
Example 4: Cancel Running Execution¶
Response:
Integration with Component Service¶
Component Executor Service¶
The API delegates execution management to the component_executor service, which handles:
- Asynchronous subprocess management
- Log queue management for SSE streaming
- Execution state tracking
- Resource cleanup
Command Building¶
The API uses the build_full_command utility function to construct execution commands, which automatically:
- Locates the component's Python executable
- Introspects CLI patterns
- Determines output strategies
- Merges installation configurations with runtime parameters
- Adds all inputs and parameters to the command
Data Types & Formats¶
Execution ID Format¶
Execution IDs are UUID version 4 strings in the format xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx.
Input and Parameter Structure¶
Inputs and parameters must be provided as flat JSON objects with string keys:
{
"inputs": {
"key1": "value1",
"key2": "value2"
},
"parameters": {
"param1": true,
"param2": "string_value",
"param3": 42
}
}
Path Requirements¶
All file paths in inputs and output_directory must be absolute paths. Relative paths may cause execution failures.
Server-Sent Events Details¶
SSE Protocol¶
The log streaming endpoint implements the Server-Sent Events specification:
Message Format: Each event consists of one or more lines prefixed with data:, followed by a blank line
Heartbeat: Empty comment lines (: heartbeat\n\n) are sent periodically to prevent timeout
Connection Management: Clients should handle reconnection if the connection drops before receiving a terminal status
Client Implementation¶
SSE clients should:
- Parse JSON from each
data:line - Check for
statusfield to detect completion - Close the connection after receiving a terminal status
- Handle connection errors gracefully
Logging¶
The module uses Python's standard logging framework with logger name component_runner_bp:
Error Level: File not found errors and unexpected exceptions are logged with full stack traces