CLI Pipeline API Reference¶

The Heritage Data Processor CLI Pipeline API provides specialized endpoints for command-line interface operations, specifically designed to execute batch pipeline operations with corrected draft creation logic.

Base URL¶

All endpoints are prefixed with /cli.

Pipeline Execution¶

Execute CLI Pipeline¶

Executes a batch pipeline operation for multiple records, creating Zenodo drafts and processing items with enhanced error handling designed for CLI usage.

Endpoint: POST /cli/pipelines/<pipeline_name>/execute

URL Parameters:

pipeline_name (string, required): Name of the pipeline to execute. This parameter is captured from the URL but not currently used in the logic (reserved for future pipeline routing)

Request Body:

{
  "record_ids": [1, 2, 3, 4, 5]
}

Request Parameters:

record_ids (array, required): List of local record database IDs to process through the pipeline

Response (All Success):

{
  "success": true,
  "message": "Pipeline execution completed successfully for all items.",
  "processed_items": [1, 2, 3, 4, 5]
}

Response (Partial Failure):

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [1, 2, 4],
  "failed_items": {
    "3": "Record not found",
    "5": "Initial Zenodo draft creation failed: Authentication failed"
  }
}

Response Fields:

success (boolean): true if all records processed successfully, false if any failures occurred
message (string): Human-readable summary of the pipeline execution
processed_items (array): List of record IDs that were successfully processed
failed_items (object, optional): Dictionary mapping failed record IDs to their error messages. Only present when failures occur

Status Codes:

200 OK: All records processed successfully without errors
207 Multi-Status: Pipeline completed but some records failed. Check failed_items for details
400 Bad Request: No HDPC project loaded or record_ids array is empty

Pipeline Logic¶

Execution Flow¶

The pipeline executes the following steps for each record ID:

Record Retrieval: Queries the zenodo_records table to fetch the record with the specified record_id
Existence Check: Validates that the record exists in the database. If not found, marks the item as failed and continues to the next record
Draft Status Check: Examines the zenodo_record_id field to determine if a Zenodo draft already exists
Draft Creation: If zenodo_record_id is NULL or empty, creates a new Zenodo draft by calling the corrected CLI draft creation endpoint
Zenodo ID Validation: Verifies that the draft creation returned a valid zenodo_record_id
Success Recording: Adds the record ID to processed_items and logs successful processing

Error Handling¶

The pipeline implements error-tolerant batch processing:

Individual record failures do not stop the entire pipeline
Each record is processed in a try-catch block
Failed records are logged with their specific error messages
Processing continues for remaining records even after failures
Final response includes both successful and failed items

Draft Creation¶

Corrected Endpoint Integration¶

This endpoint specifically calls a corrected version of the Zenodo draft creation logic to bypass known bugs in the original implementation.

Internal Endpoint Called:

POST /api/zenodo/create_api_draft_for_cli

Request to Internal Endpoint:

{
  "local_record_db_id": 123
}

Expected Response from Internal Endpoint:

{
  "zenodo_response": {
    "id": 1234567,
    "title": "Record Title",
    "status": "draft"
  }
}

Draft Creation Behavior:

Only creates drafts for records without an existing zenodo_record_id
Uses Flask's url_for with _external=True to generate the full internal endpoint URL
Makes internal HTTP POST request using the requests library
Validates the response status code is 200 OK
Extracts the zenodo_record_id from the nested zenodo_response.id field

Error Scenarios¶

Common Error Cases¶

No Project Loaded:

POST /cli/pipelines/zenodo_upload/execute

Response:

{
  "success": false,
  "error": "No project loaded."
}

Status: 400 Bad Request

Empty Record IDs:

{
  "record_ids": []
}

Response:

{
  "success": false,
  "error": "No record_ids provided."
}

Status: 400 Bad Request

Record Not Found:

If a specified record ID does not exist in the zenodo_records table, the error is captured in failed_items:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [1, 2],
  "failed_items": {
    "3": "Record not found"
  }
}

Status: 207 Multi-Status

Draft Creation Failed:

If the internal draft creation endpoint returns a non-200 status code:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": ,
  "failed_items": {
    "2": "Initial Zenodo draft creation failed: Invalid API token"
  }
}

Status: 207 Multi-Status

Zenodo ID Not Retrieved:

If draft creation succeeds but the response does not contain a valid Zenodo ID:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": ,
  "failed_items": {
    "2": "Draft created, but could not retrieve new Zenodo ID."
  }
}

Status: 207 Multi-Status

Database Access¶

Direct SQLite Connection¶

The endpoint uses direct SQLite database connections instead of the query service for granular control:

Connection Configuration:

Database path obtained from project_manager.db_path
Uses context manager (with sqlite3.connect()) for automatic connection cleanup
Sets row_factory = sqlite3.Row for dictionary-like row access

Query Executed:

SELECT * FROM zenodo_records WHERE record_id = ?

Row Access:

zenodo_record_id = record["zenodo_record_id"]

Logging¶

Application Logger¶

The endpoint uses Flask's application logger (current_app.logger) for comprehensive logging:

Info Level:

"CLI PIPELINE: Creating draft for record {record_id}."
"CLI PIPELINE: Successfully processed record {record_id} with Zenodo ID {zenodo_record_id}."

Error Level:

"Failed to process item {record_id}: {exception_message}"
Includes full stack trace with exc_info=True

Usage Examples¶

Example 1: Execute Pipeline for Multiple Records¶

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [101, 102, 103, 104, 105]
}

Response (All Success):

{
  "success": true,
  "message": "Pipeline execution completed successfully for all items.",
  "processed_items": [101, 102, 103, 104, 105]
}

Status: 200 OK

Example 2: Pipeline with Mixed Results¶

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [201, 202, 203]
}

Response (Partial Failure):

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [201, 203],
  "failed_items": {
    "202": "Initial Zenodo draft creation failed: Network timeout"
  }
}

Status: 207 Multi-Status

Example 3: Error - No Project Loaded¶

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [301, 302]
}

Response:

{
  "success": false,
  "error": "No project loaded."
}

Status: 400 Bad Request

Example 4: Error - Empty Record IDs¶

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": []
}

Response:

{
  "success": false,
  "error": "No record_ids provided."
}

Status: 400 Bad Request

Implementation Notes¶

CLI-Specific Design¶

This endpoint was specifically created for command-line interface usage to address bugs in the original web-based pipeline implementation:

Bug Bypass: Calls a corrected Zenodo draft creation endpoint (create_api_draft_for_cli) that fixes issues in the original draft creation logic

Duplicated Logic: Intentionally duplicates pipeline logic from other parts of the application to ensure CLI operations are isolated from web bugs

Future Development: Contains a TODO comment indicating the endpoint is primarily focused on draft creation, with additional pipeline steps to be implemented

URL Parameter Reserved¶

The pipeline_name URL parameter is captured but not currently used in the implementation. This design allows for future pipeline routing where different pipeline types (e.g., zenodo_upload, metadata_sync, file_processing) can be handled by the same endpoint with conditional logic based on the pipeline name.

Multi-Status Response¶

HTTP 207 Specification¶

The endpoint uses HTTP status code 207 Multi-Status to indicate partial success:

When Used: Returned when at least one record fails but others succeed

Semantic Meaning: The request itself was valid and processed, but not all sub-operations completed successfully

Client Handling: Clients should parse the failed_items object to identify which records require retry or manual intervention

Integration Requirements¶

Project Manager Dependency¶

The endpoint requires the project_manager service to be in a loaded state:

project_manager.is_loaded: Must return True
project_manager.db_path: Must point to a valid SQLite database file

Zenodo Endpoint Dependency¶

The endpoint depends on the existence and correct implementation of the internal Zenodo draft creation endpoint:

Route: zenodo.create_api_draft_for_cli
Must accept JSON payload with local_record_db_id
Must return JSON with nested zenodo_response.id

Internal HTTP Request¶

The endpoint makes internal HTTP requests to other Flask routes:

Uses url_for with _external=True for full URL generation
Makes synchronous POST request using requests library
No timeout configured (may hang on slow internal responses)

Data Types & Formats¶

Record ID Format¶

All record IDs are integers representing primary keys in the zenodo_records table.

Failed Items Structure¶

The failed_items object maps record IDs (as strings) to error messages (as strings):

{
  "123": "Error message for record 123",
  "456": "Error message for record 456"
}

Note: Record IDs are serialized as string keys in JSON objects.