Skip to content

CLI Pipeline API Reference

The Heritage Data Processor CLI Pipeline API provides specialized endpoints for command-line interface operations, specifically designed to execute batch pipeline operations with corrected draft creation logic.

Base URL

All endpoints are prefixed with /cli.


Pipeline Execution

Execute CLI Pipeline

Executes a batch pipeline operation for multiple records, creating Zenodo drafts and processing items with enhanced error handling designed for CLI usage.

Endpoint: POST /cli/pipelines/<pipeline_name>/execute

URL Parameters:

  • pipeline_name (string, required): Name of the pipeline to execute. This parameter is captured from the URL but not currently used in the logic (reserved for future pipeline routing)

Request Body:

{
  "record_ids": [1, 2, 3, 4, 5]
}

Request Parameters:

  • record_ids (array, required): List of local record database IDs to process through the pipeline

Response (All Success):

{
  "success": true,
  "message": "Pipeline execution completed successfully for all items.",
  "processed_items": [1, 2, 3, 4, 5]
}

Response (Partial Failure):

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [1, 2, 4],
  "failed_items": {
    "3": "Record not found",
    "5": "Initial Zenodo draft creation failed: Authentication failed"
  }
}

Response Fields:

  • success (boolean): true if all records processed successfully, false if any failures occurred
  • message (string): Human-readable summary of the pipeline execution
  • processed_items (array): List of record IDs that were successfully processed
  • failed_items (object, optional): Dictionary mapping failed record IDs to their error messages. Only present when failures occur

Status Codes:

  • 200 OK: All records processed successfully without errors
  • 207 Multi-Status: Pipeline completed but some records failed. Check failed_items for details
  • 400 Bad Request: No HDPC project loaded or record_ids array is empty

Pipeline Logic

Execution Flow

The pipeline executes the following steps for each record ID:

  1. Record Retrieval: Queries the zenodo_records table to fetch the record with the specified record_id
  2. Existence Check: Validates that the record exists in the database. If not found, marks the item as failed and continues to the next record
  3. Draft Status Check: Examines the zenodo_record_id field to determine if a Zenodo draft already exists
  4. Draft Creation: If zenodo_record_id is NULL or empty, creates a new Zenodo draft by calling the corrected CLI draft creation endpoint
  5. Zenodo ID Validation: Verifies that the draft creation returned a valid zenodo_record_id
  6. Success Recording: Adds the record ID to processed_items and logs successful processing

Error Handling

The pipeline implements error-tolerant batch processing:

  • Individual record failures do not stop the entire pipeline
  • Each record is processed in a try-catch block
  • Failed records are logged with their specific error messages
  • Processing continues for remaining records even after failures
  • Final response includes both successful and failed items

Draft Creation

Corrected Endpoint Integration

This endpoint specifically calls a corrected version of the Zenodo draft creation logic to bypass known bugs in the original implementation.

Internal Endpoint Called:

POST /api/zenodo/create_api_draft_for_cli

Request to Internal Endpoint:

{
  "local_record_db_id": 123
}

Expected Response from Internal Endpoint:

{
  "zenodo_response": {
    "id": 1234567,
    "title": "Record Title",
    "status": "draft"
  }
}

Draft Creation Behavior:

  • Only creates drafts for records without an existing zenodo_record_id
  • Uses Flask's url_for with _external=True to generate the full internal endpoint URL
  • Makes internal HTTP POST request using the requests library
  • Validates the response status code is 200 OK
  • Extracts the zenodo_record_id from the nested zenodo_response.id field

Error Scenarios

Common Error Cases

No Project Loaded:

POST /cli/pipelines/zenodo_upload/execute

Response:

{
  "success": false,
  "error": "No project loaded."
}

Status: 400 Bad Request


Empty Record IDs:

{
  "record_ids": []
}

Response:

{
  "success": false,
  "error": "No record_ids provided."
}

Status: 400 Bad Request


Record Not Found:

If a specified record ID does not exist in the zenodo_records table, the error is captured in failed_items:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [1, 2],
  "failed_items": {
    "3": "Record not found"
  }
}

Status: 207 Multi-Status


Draft Creation Failed:

If the internal draft creation endpoint returns a non-200 status code:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": ,
  "failed_items": {
    "2": "Initial Zenodo draft creation failed: Invalid API token"
  }
}

Status: 207 Multi-Status


Zenodo ID Not Retrieved:

If draft creation succeeds but the response does not contain a valid Zenodo ID:

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": ,
  "failed_items": {
    "2": "Draft created, but could not retrieve new Zenodo ID."
  }
}

Status: 207 Multi-Status


Database Access

Direct SQLite Connection

The endpoint uses direct SQLite database connections instead of the query service for granular control:

Connection Configuration:

  • Database path obtained from project_manager.db_path
  • Uses context manager (with sqlite3.connect()) for automatic connection cleanup
  • Sets row_factory = sqlite3.Row for dictionary-like row access

Query Executed:

SELECT * FROM zenodo_records WHERE record_id = ?

Row Access:

zenodo_record_id = record["zenodo_record_id"]

Logging

Application Logger

The endpoint uses Flask's application logger (current_app.logger) for comprehensive logging:

Info Level:

  • "CLI PIPELINE: Creating draft for record {record_id}."
  • "CLI PIPELINE: Successfully processed record {record_id} with Zenodo ID {zenodo_record_id}."

Error Level:

  • "Failed to process item {record_id}: {exception_message}"
  • Includes full stack trace with exc_info=True

Usage Examples

Example 1: Execute Pipeline for Multiple Records

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [101, 102, 103, 104, 105]
}

Response (All Success):

{
  "success": true,
  "message": "Pipeline execution completed successfully for all items.",
  "processed_items": [101, 102, 103, 104, 105]
}

Status: 200 OK


Example 2: Pipeline with Mixed Results

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [201, 202, 203]
}

Response (Partial Failure):

{
  "success": false,
  "message": "Pipeline run completed with errors.",
  "processed_items": [201, 203],
  "failed_items": {
    "202": "Initial Zenodo draft creation failed: Network timeout"
  }
}

Status: 207 Multi-Status


Example 3: Error - No Project Loaded

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": [301, 302]
}

Response:

{
  "success": false,
  "error": "No project loaded."
}

Status: 400 Bad Request


Example 4: Error - Empty Record IDs

POST /cli/pipelines/zenodo_upload/execute
Content-Type: application/json

{
  "record_ids": []
}

Response:

{
  "success": false,
  "error": "No record_ids provided."
}

Status: 400 Bad Request


Implementation Notes

CLI-Specific Design

This endpoint was specifically created for command-line interface usage to address bugs in the original web-based pipeline implementation:

Bug Bypass: Calls a corrected Zenodo draft creation endpoint (create_api_draft_for_cli) that fixes issues in the original draft creation logic

Duplicated Logic: Intentionally duplicates pipeline logic from other parts of the application to ensure CLI operations are isolated from web bugs

Future Development: Contains a TODO comment indicating the endpoint is primarily focused on draft creation, with additional pipeline steps to be implemented

URL Parameter Reserved

The pipeline_name URL parameter is captured but not currently used in the implementation. This design allows for future pipeline routing where different pipeline types (e.g., zenodo_upload, metadata_sync, file_processing) can be handled by the same endpoint with conditional logic based on the pipeline name.


Multi-Status Response

HTTP 207 Specification

The endpoint uses HTTP status code 207 Multi-Status to indicate partial success:

When Used: Returned when at least one record fails but others succeed

Semantic Meaning: The request itself was valid and processed, but not all sub-operations completed successfully

Client Handling: Clients should parse the failed_items object to identify which records require retry or manual intervention


Integration Requirements

Project Manager Dependency

The endpoint requires the project_manager service to be in a loaded state:

  • project_manager.is_loaded: Must return True
  • project_manager.db_path: Must point to a valid SQLite database file

Zenodo Endpoint Dependency

The endpoint depends on the existence and correct implementation of the internal Zenodo draft creation endpoint:

  • Route: zenodo.create_api_draft_for_cli
  • Must accept JSON payload with local_record_db_id
  • Must return JSON with nested zenodo_response.id

Internal HTTP Request

The endpoint makes internal HTTP requests to other Flask routes:

  • Uses url_for with _external=True for full URL generation
  • Makes synchronous POST request using requests library
  • No timeout configured (may hang on slow internal responses)

Data Types & Formats

Record ID Format

All record IDs are integers representing primary keys in the zenodo_records table.

Failed Items Structure

The failed_items object maps record IDs (as strings) to error messages (as strings):

{
  "123": "Error message for record 123",
  "456": "Error message for record 456"
}

Note: Record IDs are serialized as string keys in JSON objects.