Data Query API Reference¶
The Heritage Data Processor Data Query API provides read-only endpoints for retrieving project data, including files, Zenodo records, pipeline configurations, batches, API logs, credentials, and file hierarchies.
Base URL¶
All endpoints use the base path defined by the Blueprint mounting point.
Project Requirement¶
Project Context Required
All endpoints in this API require an active HDPC project to be loaded. They use the @project_required decorator, which returns a 400 Bad Request error if no project is loaded.
File Management¶
List Source Files¶
Retrieves a paginated list of source files from the project database with optional search filtering.
Endpoint: GET /files
Query Parameters:
page(integer, optional): Page number for pagination. Defaults to1limit(integer, optional): Number of items per page. Defaults to25search(string, optional): Search term to filter files by filename using partial matching
Response:
{
"items": [
{
"filename": "manuscript_001.xml",
"relative_path": "data/manuscripts/manuscript_001.xml",
"size_bytes": 45120,
"mime_type": "application/xml",
"file_type": "xml",
"status": "processed"
},
{
"filename": "manuscript_002.xml",
"relative_path": "data/manuscripts/manuscript_002.xml",
"size_bytes": 38945,
"mime_type": "application/xml",
"file_type": "xml",
"status": "pending"
}
],
"totalItems": 150,
"page": 1,
"totalPages": 6
}
Response Fields:
items(array): List of source file objects for the current pagefilename(string): Name of the filerelative_path(string): Path relative to the project rootsize_bytes(integer): File size in bytesmime_type(string): MIME type of the filefile_type(string): Classification of the file typestatus(string): Processing status of the filetotalItems(integer): Total number of files matching the search criteriapage(integer): Current page numbertotalPages(integer): Total number of pages based on the limit
Search Behavior:
The search parameter performs case-insensitive partial matching on the filename field using SQL LIKE with wildcards (%search%).
Status Codes:
200 OK: Files retrieved successfully400 Bad Request: No project loaded
Get File Hierarchy¶
Recursively retrieves a file and all its child files (dependencies, associated files) to build a complete hierarchical tree structure.
Endpoint: GET /files/<file_id>/hierarchy
URL Parameters:
file_id(integer, required): Database ID of the root file
Response:
{
"file_id": 42,
"filename": "model.obj",
"absolute_path": "/project/models/model.obj",
"file_type": "obj",
"status": "processed",
"error_message": null,
"children": [
{
"file_id": 43,
"filename": "model.mtl",
"absolute_path": "/project/models/model.mtl",
"file_type": "mtl",
"status": "processed",
"error_message": null,
"children": [
{
"file_id": 44,
"filename": "texture.png",
"absolute_path": "/project/textures/texture.png",
"file_type": "texture",
"status": "processed",
"error_message": null,
"children": []
}
]
}
]
}
Response Fields:
file_id(integer): Database identifier for the filefilename(string): Name of the fileabsolute_path(string): Full filesystem path to the filefile_type(string): Classification of the file typestatus(string): Processing status of the fileerror_message(string, nullable): Error message if processing failed, otherwisenullchildren(array): Recursive array of child file objects with the same structure
Hierarchy Logic:
The endpoint uses the parent_file_id foreign key relationship in the source_files table to build the tree. Children are sorted alphabetically by filename.
Status Codes:
200 OK: Hierarchy retrieved successfully400 Bad Request: No project loaded404 Not Found: File with the specified ID does not exist500 Internal Server Error: Database error during recursive fetch
Use Case:
This endpoint is designed for displaying file dependencies in modals or tree views, such as 3D models with their material files and textures.
Zenodo Integration¶
Get Latest Zenodo Record¶
Retrieves the most recently updated Zenodo record metadata from the project database.
Endpoint: GET /zenodo_record
Response:
{
"record_title": "Medieval Manuscript Collection - Volume 1",
"zenodo_doi": "10.5281/zenodo.1234567",
"record_status": "published",
"record_metadata_json": "{\"title\": \"Medieval Manuscript Collection\", \"creators\": [...]}",
"version": "1.2.0"
}
Response Fields:
record_title(string): Title of the Zenodo recordzenodo_doi(string): Digital Object Identifier assigned by Zenodorecord_status(string): Current status of the record (e.g.,draft,published)record_metadata_json(string): JSON-serialized metadata object containing full Zenodo metadataversion(string): Version number of the record
Empty Response:
If no Zenodo records exist in the project, the endpoint returns an empty object {}.
Ordering:
Records are ordered by last_updated_timestamp in descending order, ensuring the most recent record is returned.
Status Codes:
200 OK: Record retrieved successfully (or empty object if no records exist)400 Bad Request: No project loaded
List Record Files¶
Retrieves all files associated with a specific Zenodo record, including their upload status and metadata.
Endpoint: GET /records/<record_id>/files
URL Parameters:
record_id(integer, required): Database ID of the Zenodo record
Response:
[
{
"file_id": 101,
"filename": "manuscript_001.xml",
"absolute_path": "/project/data/manuscript_001.xml",
"file_type": "xml",
"pipeline_source": "xml_processor",
"step_source": "validation",
"upload_status": "uploaded"
},
{
"file_id": 102,
"filename": "metadata.json",
"absolute_path": "/project/output/metadata.json",
"file_type": "json",
"pipeline_source": "metadata_generator",
"step_source": "generation",
"upload_status": "pending"
}
]
Response Fields:
file_id(integer): Database identifier for the filefilename(string): Name of the fileabsolute_path(string): Full filesystem path to the filefile_type(string): Classification of the file typepipeline_source(string): Name of the pipeline component that produced this filestep_source(string): Specific processing step that generated the fileupload_status(string): Current upload status (e.g.,pending,uploaded,failed)
Ordering:
Files are sorted by file_type in descending order, then by filename in ascending order.
Database Join:
This endpoint joins the record_files_map table with the source_files table to retrieve comprehensive file information.
Status Codes:
200 OK: Files retrieved successfully400 Bad Request: No project loaded500 Internal Server Error: Database query failed
Pipeline Configuration¶
Get Pipeline Steps¶
Retrieves all configured pipeline steps for the project, ordered by modality and execution order.
Endpoint: GET /pipeline_steps
Response:
[
{
"modality": "text",
"component_name": "xml_validator",
"component_order": 1,
"is_active": 1,
"parameters": "{\"schema\": \"tei_all\", \"strict\": true}"
},
{
"modality": "text",
"component_name": "metadata_extractor",
"component_order": 2,
"is_active": 1,
"parameters": "{\"format\": \"json\"}"
},
{
"modality": "image",
"component_name": "image_processor",
"component_order": 1,
"is_active": 0,
"parameters": "{\"resize\": true, \"quality\": 95}"
}
]
Response Fields:
modality(string): Data modality or category (e.g.,text,image,3d_model)component_name(string): Name of the pipeline componentcomponent_order(integer): Execution order within the modalityis_active(integer): Boolean flag indicating whether the component is active (1) or disabled (0)parameters(string): JSON-serialized parameters for the component
Ordering:
Pipeline steps are ordered first by modality, then by component_order to reflect the execution sequence.
Empty Response:
If no pipeline steps are configured, the endpoint returns an empty array [].
Status Codes:
200 OK: Pipeline steps retrieved successfully400 Bad Request: No project loaded
Project Configuration¶
Get Configuration Settings¶
Retrieves all project-level configuration key-value pairs.
Endpoint: GET /configuration
Response:
[
{
"config_key": "project_name",
"config_value": "Medieval Manuscripts Archive"
},
{
"config_key": "default_output_format",
"config_value": "xml"
},
{
"config_key": "enable_auto_backup",
"config_value": "true"
}
]
Response Fields:
config_key(string): Configuration parameter nameconfig_value(string): Configuration parameter value (stored as string regardless of actual data type)
Empty Response:
If no configuration entries exist, the endpoint returns an empty array [].
Status Codes:
200 OK: Configuration retrieved successfully400 Bad Request: No project loaded
Batch Processing¶
List Batches¶
Retrieves all processing batches created in the project, ordered by creation timestamp.
Endpoint: GET /batches
Response:
[
{
"batch_name": "Manuscript Validation - January 2025",
"batch_description": "Validation of all manuscript files received in January",
"status": "completed",
"created_timestamp": "2025-01-15T10:30:00Z"
},
{
"batch_name": "Image Processing - February 2025",
"batch_description": "Batch processing of scanned images",
"status": "in_progress",
"created_timestamp": "2025-02-01T08:00:00Z"
}
]
Response Fields:
batch_name(string): Descriptive name of the batchbatch_description(string): Detailed description of the batch purposestatus(string): Current processing status (e.g.,pending,in_progress,completed,failed)created_timestamp(string): ISO 8601 timestamp of batch creation
Ordering:
Batches are sorted by created_timestamp in descending order, with the most recent batches first.
Empty Response:
If no batches exist, the endpoint returns an empty array [].
Status Codes:
200 OK: Batches retrieved successfully400 Bad Request: No project loaded
API Logging¶
List API Log Entries¶
Retrieves paginated API activity logs showing HTTP requests made to external services (e.g., Zenodo).
Endpoint: GET /apilog
Query Parameters:
page(integer, optional): Page number for pagination. Defaults to1limit(integer, optional): Number of items per page. Defaults to25
Response:
{
"items": [
{
"timestamp": "2025-10-21T11:45:23Z",
"http_method": "POST",
"endpoint_url": "https://zenodo.org/api/deposit/depositions",
"response_status_code": 201,
"status": "success"
},
{
"timestamp": "2025-10-21T11:40:15Z",
"http_method": "GET",
"endpoint_url": "https://zenodo.org/api/deposit/depositions/1234567",
"response_status_code": 200,
"status": "success"
},
{
"timestamp": "2025-10-21T11:35:08Z",
"http_method": "PUT",
"endpoint_url": "https://zenodo.org/api/deposit/depositions/1234567/files",
"response_status_code": 500,
"status": "failed"
}
],
"totalItems": 487,
"page": 1,
"totalPages": 20
}
Response Fields:
items(array): List of API log entries for the current pagetimestamp(string): ISO 8601 timestamp of the API requesthttp_method(string): HTTP method used (e.g.,GET,POST,PUT,DELETE)endpoint_url(string): Full URL of the external API endpointresponse_status_code(integer): HTTP status code returned by the external APIstatus(string): Interpreted status of the request (e.g.,success,failed)totalItems(integer): Total number of log entriespage(integer): Current page numbertotalPages(integer): Total number of pages based on the limit
Ordering:
Log entries are sorted by timestamp in descending order, with the most recent entries first.
Status Codes:
200 OK: Log entries retrieved successfully400 Bad Request: No project loaded
Credentials Management¶
List API Credentials¶
Retrieves all stored API credentials for external services, without exposing sensitive credential values.
Endpoint: GET /credentials
Response:
[
{
"credential_name": "Zenodo Production",
"credential_type": "zenodo_api_token",
"is_sandbox": 0
},
{
"credential_name": "Zenodo Sandbox",
"credential_type": "zenodo_api_token",
"is_sandbox": 1
}
]
Response Fields:
credential_name(string): Human-readable name for the credentialcredential_type(string): Type or category of the credential (e.g.,zenodo_api_token,oauth_token)is_sandbox(integer): Boolean flag indicating whether this credential is for a sandbox environment (1) or production (0)
Security:
This endpoint does not return actual credential values (API keys, tokens, passwords) for security reasons.
Ordering:
Credentials are sorted alphabetically by credential_name.
Empty Response:
If no credentials are stored, the endpoint returns an empty array [].
Status Codes:
200 OK: Credentials retrieved successfully400 Bad Request: No project loaded
Pagination¶
Pagination Format¶
Endpoints that support pagination (/files and /apilog) use a consistent pagination structure:
Query Parameters:
page: 1-indexed page number (defaults to1)limit: Number of items per page (defaults to25)
Response Format:
Page Calculation:
The totalPages value is calculated using: (totalItems + limit - 1) // limit, ensuring at least 1 page even when totalItems is 0.
Offset Calculation:
The database offset is calculated as: (page - 1) * limit.
Error Responses¶
Standard Error Format¶
All error responses follow a consistent JSON format:
Common Error Scenarios¶
No Project Loaded:
All endpoints return 400 Bad Request when no HDPC project is loaded due to the @project_required decorator.
File Not Found:
Response: 404 Not Found with error message "File not found"
Database Query Failed:
Response: 500 Internal Server Error with error message "Database query failed"
General Exception:
The /files/<file_id>/hierarchy endpoint catches all exceptions and returns 500 Internal Server Error with the exception message.
Database Integration¶
Query Service¶
All endpoints use the query_db function from the database service to execute SQL queries against the project SQLite database.
Connection Management:
Most endpoints use query_db, which handles connection lifecycle automatically. The /files/<file_id>/hierarchy endpoint uses get_db_connection directly for manual connection management due to its recursive nature.
Database Path:
All queries execute against project_manager.db_path, which points to the currently loaded project's database file.
Row Factory¶
Query results are returned as dictionaries with column names as keys, making them directly serializable to JSON.
Usage Examples¶
Example 1: Search Files with Pagination¶
Response:
{
"items": [
{
"filename": "manuscript_011.xml",
"relative_path": "data/manuscripts/manuscript_011.xml",
"size_bytes": 42300,
"mime_type": "application/xml",
"file_type": "xml",
"status": "processed"
}
],
"totalItems": 35,
"page": 2,
"totalPages": 4
}
Example 2: Get Complete File Hierarchy¶
Response:
{
"file_id": 42,
"filename": "scene.obj",
"absolute_path": "/project/3d/scene.obj",
"file_type": "obj",
"status": "processed",
"error_message": null,
"children": [
{
"file_id": 43,
"filename": "scene.mtl",
"absolute_path": "/project/3d/scene.mtl",
"file_type": "mtl",
"status": "processed",
"error_message": null,
"children": [
{
"file_id": 44,
"filename": "diffuse.jpg",
"absolute_path": "/project/3d/textures/diffuse.jpg",
"file_type": "texture",
"status": "processed",
"error_message": null,
"children": []
}
]
}
]
}
Example 3: Monitor API Activity¶
Response:
{
"items": [
{
"timestamp": "2025-10-21T13:30:00Z",
"http_method": "POST",
"endpoint_url": "https://sandbox.zenodo.org/api/deposit/depositions",
"response_status_code": 201,
"status": "success"
}
],
"totalItems": 142,
"page": 1,
"totalPages": 29
}
Example 4: View Pipeline Configuration¶
Response:
[
{
"modality": "text",
"component_name": "tei_validator",
"component_order": 1,
"is_active": 1,
"parameters": "{\"encoding\": \"utf-8\", \"validate_schema\": true}"
},
{
"modality": "text",
"component_name": "metadata_enricher",
"component_order": 2,
"is_active": 1,
"parameters": "{\"add_timestamps\": true}"
}
]
Data Types & Constraints¶
Integer Fields¶
All integer fields (file_id, record_id, page, limit, size_bytes, etc.) are parsed and validated according to their SQL column types.
Boolean Fields¶
Boolean values are stored as integers in SQLite: 0 for false, 1 for true (e.g., is_active, is_sandbox).
Timestamps¶
All timestamp fields follow ISO 8601 format (e.g., 2025-10-21T13:30:00Z).
JSON Fields¶
Parameters and metadata fields are stored as JSON-serialized strings and must be parsed by the client.