CLI Pipeline API Reference¶
The Heritage Data Processor CLI Pipeline API provides specialized endpoints for command-line interface operations, specifically designed to execute batch pipeline operations with corrected draft creation logic.
Base URL¶
All endpoints are prefixed with /cli.
Pipeline Execution¶
Execute CLI Pipeline¶
Executes a batch pipeline operation for multiple records, creating Zenodo drafts and processing items with enhanced error handling designed for CLI usage.
Endpoint: POST /cli/pipelines/<pipeline_name>/execute
URL Parameters:
pipeline_name(string, required): Name of the pipeline to execute. This parameter is captured from the URL but not currently used in the logic (reserved for future pipeline routing)
Request Body:
Request Parameters:
record_ids(array, required): List of local record database IDs to process through the pipeline
Response (All Success):
{
"success": true,
"message": "Pipeline execution completed successfully for all items.",
"processed_items": [1, 2, 3, 4, 5]
}
Response (Partial Failure):
{
"success": false,
"message": "Pipeline run completed with errors.",
"processed_items": [1, 2, 4],
"failed_items": {
"3": "Record not found",
"5": "Initial Zenodo draft creation failed: Authentication failed"
}
}
Response Fields:
success(boolean):trueif all records processed successfully,falseif any failures occurredmessage(string): Human-readable summary of the pipeline executionprocessed_items(array): List of record IDs that were successfully processedfailed_items(object, optional): Dictionary mapping failed record IDs to their error messages. Only present when failures occur
Status Codes:
200 OK: All records processed successfully without errors207 Multi-Status: Pipeline completed but some records failed. Checkfailed_itemsfor details400 Bad Request: No HDPC project loaded orrecord_idsarray is empty
Pipeline Logic¶
Execution Flow¶
The pipeline executes the following steps for each record ID:
- Record Retrieval: Queries the
zenodo_recordstable to fetch the record with the specifiedrecord_id - Existence Check: Validates that the record exists in the database. If not found, marks the item as failed and continues to the next record
- Draft Status Check: Examines the
zenodo_record_idfield to determine if a Zenodo draft already exists - Draft Creation: If
zenodo_record_idisNULLor empty, creates a new Zenodo draft by calling the corrected CLI draft creation endpoint - Zenodo ID Validation: Verifies that the draft creation returned a valid
zenodo_record_id - Success Recording: Adds the record ID to
processed_itemsand logs successful processing
Error Handling¶
The pipeline implements error-tolerant batch processing:
- Individual record failures do not stop the entire pipeline
- Each record is processed in a try-catch block
- Failed records are logged with their specific error messages
- Processing continues for remaining records even after failures
- Final response includes both successful and failed items
Draft Creation¶
Corrected Endpoint Integration¶
This endpoint specifically calls a corrected version of the Zenodo draft creation logic to bypass known bugs in the original implementation.
Internal Endpoint Called:
Request to Internal Endpoint:
Expected Response from Internal Endpoint:
Draft Creation Behavior:
- Only creates drafts for records without an existing
zenodo_record_id - Uses Flask's
url_forwith_external=Trueto generate the full internal endpoint URL - Makes internal HTTP POST request using the
requestslibrary - Validates the response status code is
200 OK - Extracts the
zenodo_record_idfrom the nestedzenodo_response.idfield
Error Scenarios¶
Common Error Cases¶
No Project Loaded:
Response:
Status: 400 Bad Request
Empty Record IDs:
Response:
Status: 400 Bad Request
Record Not Found:
If a specified record ID does not exist in the zenodo_records table, the error is captured in failed_items:
{
"success": false,
"message": "Pipeline run completed with errors.",
"processed_items": [1, 2],
"failed_items": {
"3": "Record not found"
}
}
Status: 207 Multi-Status
Draft Creation Failed:
If the internal draft creation endpoint returns a non-200 status code:
{
"success": false,
"message": "Pipeline run completed with errors.",
"processed_items": ,
"failed_items": {
"2": "Initial Zenodo draft creation failed: Invalid API token"
}
}
Status: 207 Multi-Status
Zenodo ID Not Retrieved:
If draft creation succeeds but the response does not contain a valid Zenodo ID:
{
"success": false,
"message": "Pipeline run completed with errors.",
"processed_items": ,
"failed_items": {
"2": "Draft created, but could not retrieve new Zenodo ID."
}
}
Status: 207 Multi-Status
Database Access¶
Direct SQLite Connection¶
The endpoint uses direct SQLite database connections instead of the query service for granular control:
Connection Configuration:
- Database path obtained from
project_manager.db_path - Uses context manager (
with sqlite3.connect()) for automatic connection cleanup - Sets
row_factory = sqlite3.Rowfor dictionary-like row access
Query Executed:
Row Access:
Logging¶
Application Logger¶
The endpoint uses Flask's application logger (current_app.logger) for comprehensive logging:
Info Level:
"CLI PIPELINE: Creating draft for record {record_id}.""CLI PIPELINE: Successfully processed record {record_id} with Zenodo ID {zenodo_record_id}."
Error Level:
"Failed to process item {record_id}: {exception_message}"- Includes full stack trace with
exc_info=True
Usage Examples¶
Example 1: Execute Pipeline for Multiple Records¶
POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json
{
"record_ids": [101, 102, 103, 104, 105]
}
Response (All Success):
{
"success": true,
"message": "Pipeline execution completed successfully for all items.",
"processed_items": [101, 102, 103, 104, 105]
}
Status: 200 OK
Example 2: Pipeline with Mixed Results¶
POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json
{
"record_ids": [201, 202, 203]
}
Response (Partial Failure):
{
"success": false,
"message": "Pipeline run completed with errors.",
"processed_items": [201, 203],
"failed_items": {
"202": "Initial Zenodo draft creation failed: Network timeout"
}
}
Status: 207 Multi-Status
Example 3: Error - No Project Loaded¶
POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json
{
"record_ids": [301, 302]
}
Response:
Status: 400 Bad Request
Example 4: Error - Empty Record IDs¶
Response:
Status: 400 Bad Request
Implementation Notes¶
CLI-Specific Design¶
This endpoint was specifically created for command-line interface usage to address bugs in the original web-based pipeline implementation:
Bug Bypass: Calls a corrected Zenodo draft creation endpoint (create_api_draft_for_cli) that fixes issues in the original draft creation logic
Duplicated Logic: Intentionally duplicates pipeline logic from other parts of the application to ensure CLI operations are isolated from web bugs
Future Development: Contains a TODO comment indicating the endpoint is primarily focused on draft creation, with additional pipeline steps to be implemented
URL Parameter Reserved¶
The pipeline_name URL parameter is captured but not currently used in the implementation. This design allows for future pipeline routing where different pipeline types (e.g., zenodo_upload, metadata_sync, file_processing) can be handled by the same endpoint with conditional logic based on the pipeline name.
Multi-Status Response¶
HTTP 207 Specification¶
The endpoint uses HTTP status code 207 Multi-Status to indicate partial success:
When Used: Returned when at least one record fails but others succeed
Semantic Meaning: The request itself was valid and processed, but not all sub-operations completed successfully
Client Handling: Clients should parse the failed_items object to identify which records require retry or manual intervention
Integration Requirements¶
Project Manager Dependency¶
The endpoint requires the project_manager service to be in a loaded state:
project_manager.is_loaded: Must returnTrueproject_manager.db_path: Must point to a valid SQLite database file
Zenodo Endpoint Dependency¶
The endpoint depends on the existence and correct implementation of the internal Zenodo draft creation endpoint:
- Route:
zenodo.create_api_draft_for_cli - Must accept JSON payload with
local_record_db_id - Must return JSON with nested
zenodo_response.id
Internal HTTP Request¶
The endpoint makes internal HTTP requests to other Flask routes:
- Uses
url_forwith_external=Truefor full URL generation - Makes synchronous POST request using
requestslibrary - No timeout configured (may hang on slow internal responses)
Data Types & Formats¶
Record ID Format¶
All record IDs are integers representing primary keys in the zenodo_records table.
Failed Items Structure¶
The failed_items object maps record IDs (as strings) to error messages (as strings):
Note: Record IDs are serialized as string keys in JSON objects.