Project Management API Reference¶
The Heritage Data Processor Project Management API provides comprehensive endpoints for creating, loading, and managing HDPC (.hdpc) projects, including hierarchical file scanning, validation, metadata preparation, and Zenodo integration.
Base URL¶
Endpoints use various paths: /hdpc, /project_info, and /project.
Project Lifecycle¶
Load HDPC Database¶
Loads an existing .hdpc project database file into the application.
Endpoint: POST /hdpc/load
Request Body:
Request Parameters:
path(string, required): Absolute file path to the .hdpc database file
Response:
{
"message": "HDPC loaded successfully",
"project_name": "Medieval Manuscripts Project",
"project_id": 1
}
Response Fields:
message(string): Confirmation messageproject_name(string): Name of the loaded project. Defaults to"Unknown Project"if project_info table is emptyproject_id(integer, nullable): Database ID of the project, ornullif not available
Status Codes:
200 OK: Project loaded successfully400 Bad Request: Path not provided500 Internal Server Error: Failed to load HDPC (file invalid or not found)
Get Project Info¶
Retrieves basic project information from the loaded HDPC database.
Endpoint: GET /project_info
Response:
{
"project_id": 1,
"project_name": "Medieval Manuscripts Project",
"description": "Digital preservation of medieval manuscript collection",
"hdpc_schema_version": "1.2.0"
}
Response Fields:
project_id(integer): Project database identifierproject_name(string): Project namedescription(string): Project descriptionhdpc_schema_version(string): Version of the HDPC database schema
Empty Response:
Returns empty object {} if no project info exists.
Status Codes:
200 OK: Project info retrieved successfully400 Bad Request: No HDPC loaded
Get Project Details with Modality¶
Retrieves project information including the configured data modality.
Endpoint: GET /project_details_with_modality
Response:
{
"project_id": 1,
"project_name": "3D Models Archive",
"description": "Archive of 3D cultural heritage models",
"hdpc_schema_version": "1.2.0",
"modality": "[\"3d_model\", \"image\"]"
}
Response Fields:
project_id(integer): Project database identifierproject_name(string): Project namedescription(string): Project descriptionhdpc_schema_version(string): Database schema versionmodality(string): JSON-serialized array of data modalities, or"Not Set"if not configured
Status Codes:
200 OK: Project details retrieved successfully400 Bad Request: No HDPC loaded500 Internal Server Error: Could not retrieve project info
Project Creation¶
Create and Scan Project¶
Creates a new HDPC project database with comprehensive file scanning, validation, hierarchical structure detection, and optional asset archiving.
Endpoint: POST /project/create-and-scan
Request Body:
{
"projectName": "3D Models Collection",
"shortCode": "3DMODELS",
"hdpcPath": "/projects/3d_models.hdpc",
"modalities": ["3d_model", "image"],
"dataInPath": "/data/input/models",
"dataOutPath": "/data/output",
"batchEntity": "subdirectory",
"scanOptions": {
"extensions": [".obj", ".mtl", ".jpg", ".png"],
"primarysourceext": ".obj",
"bundling": {
"enabled": true,
"strategy": "stem_match"
},
"obj_options": {
"add_mtl": true,
"add_textures": true,
"archive_subdirectories": true
}
}
}
Request Parameters:
projectName(string, required): Human-readable project nameshortCode(string, required): Short code identifier for the projecthdpcPath(string, required): Path where the .hdpc file will be createdmodalities(array, required): List of data modalities (e.g.,["3d_model"],["text", "image"])dataInPath(string, required): Input data directory path for file scanningdataOutPath(string, required): Output data directory path for generated filesbatchEntity(string, required): File processing mode:"root","subdirectory", or"hybrid"scanOptions(object, required): File scanning configurationextensions(array): List of file extensions to scan (with dot prefix)primarysourceext(string): Primary source file extension for each groupbundling(object): Bundling configurationenabled(boolean): Whether to bundle congruent filesstrategy(string): Bundling strategy (e.g.,"stem_match")
obj_options(object): Options for 3D model processingadd_mtl(boolean): Whether to scan for MTL filesadd_textures(boolean): Whether to scan for texture filesarchive_subdirectories(boolean): Whether to archive texture subdirectories into ZIP files
Response:
{
"success": true,
"message": "Project created and 47 files scanned.",
"projectId": 1,
"filesAdded": 47,
"foundFiles": [
{
"name": "model_001.obj",
"path": "/data/input/models/model_001.obj",
"type": "primary_source",
"status": "Valid",
"is_primary_source": true,
"relative_path": "model_001.obj",
"validation_report": {},
"children": [
{
"name": "model_001.mtl",
"path": "/data/input/models/model_001.mtl",
"type": "primary",
"status": "Valid",
"children": [
{
"name": "model_001_textures.zip",
"path": "/data/output/archives/model_001_textures.zip",
"type": "archive",
"status": "Valid",
"children": [...]
}
]
}
]
}
]
}
Response Fields:
success(boolean): Operation success statusmessage(string): Summary message with file countprojectId(integer): Created project database IDfilesAdded(integer): Total number of files added to databasefoundFiles(array): Hierarchical file structure array
Status Codes:
201 Created: Project created successfully400 Bad Request: Missing required parameters or input directory does not exist500 Internal Server Error: Failed to create database schema or unexpected error during creation
Batch Entity Modes¶
The batchEntity parameter controls how files are grouped for processing:
Root Mode: Scans only files in the root of dataInPath. When bundling is enabled, groups files by common stem (e.g., model.obj, model.mtl, model.jpg become one group).
Subdirectory Mode: Treats each subdirectory as a separate processing group. All files within a subdirectory are grouped together recursively.
Hybrid Mode: Combines both approaches - scans root files with optional bundling and processes each subdirectory as a group.
File Validation Statuses¶
During scanning, files are validated and assigned one of these statuses:
Valid: File passed all validation checksInvalid: File failed basic validation (corrupted, empty, wrong format)Problems: File has multiple issues including validation failures and missing dependenciesMTL Missing: OBJ file missing its referenced MTL fileTextures Missing: MTL file missing some referenced texture filesFile Conflict: Multiple non-identical files found for the same texture reference
Hierarchical File Structure¶
The scanner builds hierarchical relationships:
Primary Source: The main file for a record (e.g., .obj file)
Primary Dependencies: Direct dependencies (e.g., .mtl file referenced by OBJ)
Secondary Dependencies: Nested dependencies (e.g., textures referenced by MTL)
Archive Files: ZIP archives containing grouped assets
Archived Files: Individual files within archives
Archival Logic¶
When obj_options.archive_subdirectories is enabled, texture files in subdirectories are automatically archived:
- Detects texture files in subdirectories relative to MTL file
- Creates ZIP archive in
{dataOutPath}/archives/directory - Archive name format:
{base_stem}_{subdirectory_name}.zip - Validates ZIP archive and marks archived files with
archive_namemetadata - Maintains hierarchy: MTL → Archive → Archived Textures
Project Inspection¶
Inspect Project¶
Performs comprehensive analysis of an HDPC database, returning detailed statistics, file trees, and metadata about all project components.
Endpoint: POST /project/inspect
Request Body:
Request Parameters:
hdpcPath(string, required): Path to the .hdpc database file to inspectshowFiles(boolean, optional): Whether to include complete file tree in response. Defaults tofalse
Response:
{
"success": true,
"hdpc_path": "/projects/my_project.hdpc",
"project_info": {
"project_id": 1,
"project_name": "3D Models Archive",
"project_short_code": "3DMA",
"description": "Archive description",
"creation_timestamp": "2025-10-15T10:00:00Z",
"last_modified_timestamp": "2025-10-21T13:00:00Z",
"hdpc_schema_version": "1.2.0"
},
"configuration": [...],
"scan_settings": [...],
"file_statistics": {
"total_files": 150,
"primary_files": 50,
"root_files": 50,
"associated_files": 100,
"total_size_bytes": 2147483648,
"unique_file_types": 8,
"unique_statuses": 5
},
"file_status_breakdown": [...],
"file_type_breakdown": [...],
"mime_type_breakdown": [...],
"zenodo_statistics": {...},
"metadata_mappings": [...],
"batches": [...],
"pipeline_steps": [...],
"recent_api_activity": [...],
"file_tree": [...]
}
Response Fields:
success(boolean): Operation success statushdpc_path(string): Absolute path to the inspected databaseproject_info(object): Complete project metadata fromproject_infotableconfiguration(array): All project configuration key-value pairsscan_settings(array): File scan settings with modality and scan optionsfile_statistics(object): Aggregated file statisticsfile_status_breakdown(array): Count of files grouped by statusfile_type_breakdown(array): Count of files grouped by typemime_type_breakdown(array): Top 10 MIME types by file countzenodo_statistics(object): Zenodo records statisticsmetadata_mappings(array): Configured metadata mappingsbatches(array): Processing batchespipeline_steps(array): Configured pipeline stepsrecent_api_activity(array): 10 most recent API log entriesfile_tree(array, optional): Complete hierarchical file tree (only ifshowFilesis true)
File Tree Structure:
When showFiles is true, the file_tree array contains recursive file objects:
{
"file_id": 1,
"filename": "model.obj",
"relative_path": "models/model.obj",
"file_type": "primary_source",
"status": "Valid",
"is_primary_source": true,
"size_bytes": 1048576,
"mime_type": "model/obj",
"parent_file_id": null,
"added_timestamp": "2025-10-15T10:30:00Z",
"children": [...]
}
Status Codes:
200 OK: Inspection completed successfully400 Bad Request: MissinghdpcPathparameter404 Not Found: HDPC file not found at specified path500 Internal Server Error: Database error or unexpected error during inspection
Zenodo Integration¶
Get Uploads Tab Counts¶
Retrieves counts for different stages of the Zenodo upload workflow, used to populate UI tabs.
Endpoint: GET /project/uploads_tab_counts
Query Parameters:
is_sandbox(string, optional): Whether to count sandbox records. Defaults to"true". Accepts"true"or"false"
Response:
{
"pending_preparation": 15,
"pending_operations": 8,
"drafts": 12,
"published": 45,
"versioning": 0
}
Response Fields:
pending_preparation(integer): Source files without prepared metadatapending_operations(integer): Records with prepared metadata but no Zenodo draft createddrafts(integer): Active Zenodo draftspublished(integer): Unique published Zenodo concept recordsversioning(integer): Records eligible for versioning (currently always 0)
Pending Preparation Query:
Counts root-level source files with statuses indicating they need metadata preparation, excluding files that already have records in non-failed states.
Status Codes:
200 OK: Counts retrieved successfully400 Bad Request: No HDPC loaded500 Internal Server Error: Database query failed
Match Files for Versioning¶
Matches files in a directory against published Zenodo records to facilitate new version creation.
Endpoint: POST /project/match_files_for_versioning
Decorator: @project_required
Request Body:
Request Parameters:
directory_path(string, required): Path to directory containing potential new version filesmatch_method(string, optional): Matching strategy. Options:"filename"(default) or"hashcode"
Response:
{
"success": true,
"matches": [
{
"concept_rec_id": "7891234",
"record_title": "Medieval Manuscript - Volume 1",
"matched_file_path": "/data/new_versions/manuscript_v2.xml"
}
]
}
Response Fields:
success(boolean): Operation success statusmatches(array): List of matched file-record pairsconcept_rec_id(string): Zenodo concept record ID for versioningrecord_title(string): Title of the published recordmatched_file_path(string): Absolute path to the matched file
Matching Methods:
Filename: Matches if the file in the directory has the same name as the source file of a published record
Hashcode: Matches if the SHA256 hash of the file content equals the hash of a published record's source file
Empty Response:
Returns empty matches array if no published records exist or no matches are found.
Status Codes:
200 OK: Matching completed successfully400 Bad Request: Missingdirectory_pathor no project loaded404 Not Found: Directory not found at specified path500 Internal Server Error: Matching operation failed
Metadata Management¶
Preview Prepared Metadata¶
Performs a dry run of metadata preparation without saving to the database, allowing users to preview the result.
Endpoint: POST /project/preview_prepared_metadata
Request Body:
Request Parameters:
source_file_db_id(integer, required): Database ID of the source file
Response:
{
"success": true,
"prepared_metadata": {
"metadata": {
"title": "Medieval Manuscript 001",
"upload_type": "dataset",
"description": "Zenodo record for the data file: Medieval Manuscript 001.",
"creators": [{"name": "Smith, John", "affiliation": "University"}],
"access_right": "open"
}
},
"filename": "manuscript_001.xml"
}
Response Fields:
success(boolean): Operation success statusprepared_metadata(object): Complete Zenodo API payload that would be submittedfilename(string): Name of the source file
Description Auto-Construction:
If the metadata mapping contains a construct_later flag for description, the endpoint automatically generates: "Zenodo record for the data file: {title}."
Status Codes:
200 OK: Metadata preview generated successfully400 Bad Request: No HDPC loaded or missingsource_file_db_id404 Not Found: No active metadata mapping configured or source file not found500 Internal Server Error: Metadata preparation failed
Prepare Metadata for File¶
Prepares and validates metadata for a source file, storing it in the database and creating a Zenodo record entry.
Endpoint: POST /project/prepare_metadata_for_file
Decorator: @project_required
Request Body:
{
"source_file_db_id": 42,
"target_is_sandbox": true,
"overrides": {
"title": "Custom Title Override",
"keywords": ["heritage", "manuscript", "medieval"]
}
}
Request Parameters:
source_file_db_id(integer, required): Database ID of the source filetarget_is_sandbox(boolean, optional): Whether to prepare for Zenodo sandbox. Defaults totrueoverrides(object, optional): User-provided metadata field overrides
Response:
{
"success": true,
"message": "Metadata prepared and validated successfully.",
"log": [
"Prepare metadata for File ID: 42, Target Sandbox: true",
"Applying user overrides for fields: ['title', 'keywords']",
"Sanitizing metadata: Removing empty optional fields: ['language']",
"Metadata validated successfully.",
"Metadata stored and record status set to 'prepared'."
]
}
Response Fields:
success(boolean): Operation success statusmessage(string): Summary messagelog(array): Detailed execution log messages
Error Response:
{
"success": false,
"error": "Metadata validation failed.",
"validation_errors": ["Title is required", "Invalid upload type"],
"log": [...]
}
Preparation Process:
The endpoint follows this workflow:
- Extracts metadata using active metadata mapping configuration
- Applies user overrides from request
- Sanitizes metadata by removing empty optional fields
- Auto-constructs description if needed
- Prepares metadata for Zenodo API format
- Validates against Zenodo schema
- Stores in database and creates
zenodo_recordsentry with statusprepared
Status Codes:
200 OK: Metadata prepared successfully400 Bad Request: Validation failed or no project loaded500 Internal Server Error: Unexpected error during preparation
File Management¶
Add Source Files¶
Adds source files to the project database, optionally associating them with an existing Zenodo record.
Endpoint: POST /project/source_files/add
Decorator: @project_required
Request Body:
{
"absolute_file_paths": [
"/data/output/processed_file_001.xml",
"/data/output/processed_file_002.xml"
],
"record_id_to_associate": 15,
"pipeline_name": "xml_processor",
"step_name": "transformation"
}
Request Parameters:
absolute_file_paths(array, required): List of absolute file paths to addrecord_id_to_associate(integer, optional): Zenodo record ID to associate files withpipeline_name(string, optional): Name of the pipeline that generated these filesstep_name(string, optional): Name of the pipeline step that generated these files
Response:
{
"message": "File addition process completed.",
"added_count": 2,
"skipped_existing_path": 0,
"errors_count": 0,
"errors": []
}
Response Fields:
message(string): Summary messageadded_count(integer): Number of files successfully addedskipped_existing_path(integer): Number of files skipped because they already exist in databaseerrors_count(integer): Number of files that failed to adderrors(array): List of error messages for failed files
Duplicate Detection:
The endpoint performs intelligent duplicate detection:
Content Hash Matching: If a file's SHA256 hash matches the original source file for the associated record, it's skipped (prevents adding the source file as its own derivative)
Path Matching: If the exact file path already exists in the database, it's skipped but included in association if record_id_to_associate is provided
File Type Assignment:
derived: Files associated with a record (pipeline outputs)source: Standalone files not associated with records
Status Assignment:
pending_upload: Derived files ready for Zenodo uploadpending: Standalone source files
Record Association:
When record_id_to_associate is provided, files are added to the record_files_map table with pending upload status.
Status Codes:
200 OK: File addition process completed (check individual counts for details)400 Bad Request: Missingabsolute_file_pathsor no project loaded500 Internal Server Error: Database error during file addition
Project Settings¶
Update Project Description¶
Updates the project description field in the project_info table.
Endpoint: POST /project/update_description
Decorator: @project_required
Request Body:
Request Parameters:
description(string, required): New project description (can be empty string)
Response:
Status Codes:
200 OK: Description updated successfully400 Bad Request: Missingdescriptionfield or no project loaded500 Internal Server Error: Database operation failed
Update Project Title¶
Updates the project name in the project_info table.
Endpoint: POST /project/update_title
Decorator: @project_required
Request Body:
Request Parameters:
title(string, required): New project title (cannot be empty)
Response:
Status Codes:
200 OK: Title updated successfully400 Bad Request: Empty title or no project loaded500 Internal Server Error: Database operation failed
Dashboard & Statistics¶
Get Dashboard Statistics¶
Retrieves aggregated statistics for the project dashboard, including record counts by status and environment.
Endpoint: GET /project/dashboard_stats
Decorator: @project_required
Response:
{
"drafts_sandbox": 5,
"published_sandbox": 12,
"drafts_production": 2,
"published_production": 38,
"total_files": 250,
"files_with_metadata": 220
}
Response Fields:
drafts_sandbox(integer): Count of draft records in Zenodo sandboxpublished_sandbox(integer): Count of published records in Zenodo sandboxdrafts_production(integer): Count of draft records in Zenodo productionpublished_production(integer): Count of published records in Zenodo productiontotal_files(integer): Total number of files in the projectfiles_with_metadata(integer): Count of distinct files that have metadata values
Status Codes:
200 OK: Statistics retrieved successfully400 Bad Request: No project loaded500 Internal Server Error: Database query failed
Get Published Records¶
Retrieves the 10 most recently published Zenodo records from the project.
Endpoint: GET /project/published_records
Decorator: @project_required
Response:
[
{
"record_title": "Medieval Manuscript Collection - Volume 5",
"zenodo_doi": "10.5281/zenodo.1234567",
"zenodo_record_id": "1234567",
"publication_date": "2025-10-20T15:30:00Z"
},
{
"record_title": "3D Model - Ancient Artifact",
"zenodo_doi": "10.5281/zenodo.7654321",
"zenodo_record_id": "7654321",
"publication_date": "2025-10-19T09:15:00Z"
}
]
Response Fields:
Each record contains:
record_title(string): Title of the published recordzenodo_doi(string): Digital Object Identifierzenodo_record_id(string): Zenodo record IDpublication_date(string): ISO 8601 timestamp of last update
Ordering:
Records are ordered by last_updated_timestamp in descending order.
Status Codes:
200 OK: Records retrieved successfully400 Bad Request: No project loaded500 Internal Server Error: Database query failed
Error Handling¶
Common Error Scenarios¶
Project Not Loaded:
Most endpoints return 400 Bad Request with message "No HDPC loaded" when no project is active.
Invalid File Path:
During project creation, if dataInPath does not exist: 400 Bad Request with message "The specified Input Data Directory does not exist: {path}".
Database Creation Failed:
500 Internal Server Error with message "Failed to create the HDPC database schema from YAML."
Transaction Rollback:
If an error occurs during project creation, the transaction is rolled back and the partially created .hdpc file is deleted.
Database Schema Integration¶
Tables Used¶
The API interacts with numerous database tables:
project_info: Core project metadataproject_configuration: Key-value configuration storagefile_scan_settings: Scan options and modality settingssource_files: File metadata and hierarchical relationshipszenodo_records: Zenodo record metadata and statusmetadata_mapping_files: Metadata mapping configurationsmetadata_values: Extracted metadata valuesrecord_files_map: File-to-record associationsbatches: Batch processing recordsproject_pipelines: Pipeline step configurationsapi_log: API activity logging
Transaction Management¶
Critical operations like project creation use explicit transaction control with BEGIN TRANSACTION, COMMIT, and ROLLBACK for data integrity.
Usage Examples¶
Example 1: Create Project with 3D Models¶
POST /project/create-and-scan
Content-Type: application/json
{
"projectName": "Cultural Heritage 3D Models",
"shortCode": "CH3D",
"hdpcPath": "/projects/heritage_3d.hdpc",
"modalities": ["3d_model"],
"dataInPath": "/data/3d_models",
"dataOutPath": "/data/output",
"batchEntity": "subdirectory",
"scanOptions": {
"extensions": [".obj", ".mtl", ".jpg", ".png"],
"primarysourceext": ".obj",
"bundling": {"enabled": false},
"obj_options": {
"add_mtl": true,
"add_textures": true,
"archive_subdirectories": true
}
}
}
Example 2: Load and Inspect Project¶
Then:
POST /project/inspect
Content-Type: application/json
{
"hdpcPath": "/projects/heritage_3d.hdpc",
"showFiles": false
}
Example 3: Prepare Metadata with Overrides¶
POST /project/prepare_metadata_for_file
Content-Type: application/json
{
"source_file_db_id": 42,
"target_is_sandbox": false,
"overrides": {
"title": "Medieval Manuscript - Enhanced Edition",
"description": "High-resolution scan with enhanced metadata",
"keywords": ["medieval", "manuscript", "heritage"]
}
}
Example 4: Add Derived Files from Pipeline¶
POST /project/source_files/add
Content-Type: application/json
{
"absolute_file_paths": [
"/output/processed/manuscript_001_validated.xml",
"/output/processed/manuscript_001_metadata.json"
],
"record_id_to_associate": 15,
"pipeline_name": "xml_validation_pipeline",
"step_name": "validation"
}
Configuration¶
Schema File Location¶
The HDPC database schema is loaded from: {CONFIG_FILE_PATH}/hdpc_schema.yaml
File Hash Calculation¶
SHA256 hashes are calculated for all files during scanning and file addition using the calculate_file_hash utility.
MIME Type Detection¶
MIME types are automatically detected using the get_file_mime_type utility function.