Project Management API Reference¶

The Heritage Data Processor Project Management API provides comprehensive endpoints for creating, loading, and managing HDPC (.hdpc) projects, including hierarchical file scanning, validation, metadata preparation, and Zenodo integration.

Base URL¶

Endpoints use various paths: /hdpc, /project_info, and /project.

Project Lifecycle¶

Load HDPC Database¶

Loads an existing .hdpc project database file into the application.

Endpoint: POST /hdpc/load

Request Body:

{
  "path": "/path/to/project.hdpc"
}

Request Parameters:

path (string, required): Absolute file path to the .hdpc database file

Response:

{
  "message": "HDPC loaded successfully",
  "project_name": "Medieval Manuscripts Project",
  "project_id": 1
}

Response Fields:

message (string): Confirmation message
project_name (string): Name of the loaded project. Defaults to "Unknown Project" if project_info table is empty
project_id (integer, nullable): Database ID of the project, or null if not available

Status Codes:

200 OK: Project loaded successfully
400 Bad Request: Path not provided
500 Internal Server Error: Failed to load HDPC (file invalid or not found)

Get Project Info¶

Retrieves basic project information from the loaded HDPC database.

Endpoint: GET /project_info

Response:

{
  "project_id": 1,
  "project_name": "Medieval Manuscripts Project",
  "description": "Digital preservation of medieval manuscript collection",
  "hdpc_schema_version": "1.2.0"
}

Response Fields:

project_id (integer): Project database identifier
project_name (string): Project name
description (string): Project description
hdpc_schema_version (string): Version of the HDPC database schema

Empty Response:

Returns empty object {} if no project info exists.

Status Codes:

200 OK: Project info retrieved successfully
400 Bad Request: No HDPC loaded

Get Project Details with Modality¶

Retrieves project information including the configured data modality.

Endpoint: GET /project_details_with_modality

Response:

{
  "project_id": 1,
  "project_name": "3D Models Archive",
  "description": "Archive of 3D cultural heritage models",
  "hdpc_schema_version": "1.2.0",
  "modality": "[\"3d_model\", \"image\"]"
}

Response Fields:

project_id (integer): Project database identifier
project_name (string): Project name
description (string): Project description
hdpc_schema_version (string): Database schema version
modality (string): JSON-serialized array of data modalities, or "Not Set" if not configured

Status Codes:

200 OK: Project details retrieved successfully
400 Bad Request: No HDPC loaded
500 Internal Server Error: Could not retrieve project info

Project Creation¶

Create and Scan Project¶

Creates a new HDPC project database with comprehensive file scanning, validation, hierarchical structure detection, and optional asset archiving.

Endpoint: POST /project/create-and-scan

Request Body:

{
  "projectName": "3D Models Collection",
  "shortCode": "3DMODELS",
  "hdpcPath": "/projects/3d_models.hdpc",
  "modalities": ["3d_model", "image"],
  "dataInPath": "/data/input/models",
  "dataOutPath": "/data/output",
  "batchEntity": "subdirectory",
  "scanOptions": {
    "extensions": [".obj", ".mtl", ".jpg", ".png"],
    "primarysourceext": ".obj",
    "bundling": {
      "enabled": true,
      "strategy": "stem_match"
    },
    "obj_options": {
      "add_mtl": true,
      "add_textures": true,
      "archive_subdirectories": true
    }
  }
}

Request Parameters:

projectName (string, required): Human-readable project name
shortCode (string, required): Short code identifier for the project
hdpcPath (string, required): Path where the .hdpc file will be created
modalities (array, required): List of data modalities (e.g., ["3d_model"], ["text", "image"])
dataInPath (string, required): Input data directory path for file scanning
dataOutPath (string, required): Output data directory path for generated files
batchEntity (string, required): File processing mode: "root", "subdirectory", or "hybrid"
scanOptions (object, required): File scanning configuration
extensions (array): List of file extensions to scan (with dot prefix)
primarysourceext (string): Primary source file extension for each group
bundling (object): Bundling configuration
- enabled (boolean): Whether to bundle congruent files
- strategy (string): Bundling strategy (e.g., "stem_match")
obj_options (object): Options for 3D model processing
- add_mtl (boolean): Whether to scan for MTL files
- add_textures (boolean): Whether to scan for texture files
- archive_subdirectories (boolean): Whether to archive texture subdirectories into ZIP files

Response:

{
  "success": true,
  "message": "Project created and 47 files scanned.",
  "projectId": 1,
  "filesAdded": 47,
  "foundFiles": [
    {
      "name": "model_001.obj",
      "path": "/data/input/models/model_001.obj",
      "type": "primary_source",
      "status": "Valid",
      "is_primary_source": true,
      "relative_path": "model_001.obj",
      "validation_report": {},
      "children": [
        {
          "name": "model_001.mtl",
          "path": "/data/input/models/model_001.mtl",
          "type": "primary",
          "status": "Valid",
          "children": [
            {
              "name": "model_001_textures.zip",
              "path": "/data/output/archives/model_001_textures.zip",
              "type": "archive",
              "status": "Valid",
              "children": [...]
            }
          ]
        }
      ]
    }
  ]
}

Response Fields:

success (boolean): Operation success status
message (string): Summary message with file count
projectId (integer): Created project database ID
filesAdded (integer): Total number of files added to database
foundFiles (array): Hierarchical file structure array

Status Codes:

201 Created: Project created successfully
400 Bad Request: Missing required parameters or input directory does not exist
500 Internal Server Error: Failed to create database schema or unexpected error during creation

Batch Entity Modes¶

The batchEntity parameter controls how files are grouped for processing:

Root Mode: Scans only files in the root of dataInPath. When bundling is enabled, groups files by common stem (e.g., model.obj, model.mtl, model.jpg become one group).

Subdirectory Mode: Treats each subdirectory as a separate processing group. All files within a subdirectory are grouped together recursively.

Hybrid Mode: Combines both approaches - scans root files with optional bundling and processes each subdirectory as a group.

File Validation Statuses¶

During scanning, files are validated and assigned one of these statuses:

Valid: File passed all validation checks
Invalid: File failed basic validation (corrupted, empty, wrong format)
Problems: File has multiple issues including validation failures and missing dependencies
MTL Missing: OBJ file missing its referenced MTL file
Textures Missing: MTL file missing some referenced texture files
File Conflict: Multiple non-identical files found for the same texture reference

Hierarchical File Structure¶

The scanner builds hierarchical relationships:

Primary Source: The main file for a record (e.g., .obj file)

Primary Dependencies: Direct dependencies (e.g., .mtl file referenced by OBJ)

Secondary Dependencies: Nested dependencies (e.g., textures referenced by MTL)

Archive Files: ZIP archives containing grouped assets

Archived Files: Individual files within archives

Archival Logic¶

When obj_options.archive_subdirectories is enabled, texture files in subdirectories are automatically archived:

Detects texture files in subdirectories relative to MTL file
Creates ZIP archive in {dataOutPath}/archives/ directory
Archive name format: {base_stem}_{subdirectory_name}.zip
Validates ZIP archive and marks archived files with archive_name metadata
Maintains hierarchy: MTL → Archive → Archived Textures

Project Inspection¶

Inspect Project¶

Performs comprehensive analysis of an HDPC database, returning detailed statistics, file trees, and metadata about all project components.

Endpoint: POST /project/inspect

Request Body:

{
  "hdpcPath": "/projects/my_project.hdpc",
  "showFiles": true
}

Request Parameters:

hdpcPath (string, required): Path to the .hdpc database file to inspect
showFiles (boolean, optional): Whether to include complete file tree in response. Defaults to false

Response:

{
  "success": true,
  "hdpc_path": "/projects/my_project.hdpc",
  "project_info": {
    "project_id": 1,
    "project_name": "3D Models Archive",
    "project_short_code": "3DMA",
    "description": "Archive description",
    "creation_timestamp": "2025-10-15T10:00:00Z",
    "last_modified_timestamp": "2025-10-21T13:00:00Z",
    "hdpc_schema_version": "1.2.0"
  },
  "configuration": [...],
  "scan_settings": [...],
  "file_statistics": {
    "total_files": 150,
    "primary_files": 50,
    "root_files": 50,
    "associated_files": 100,
    "total_size_bytes": 2147483648,
    "unique_file_types": 8,
    "unique_statuses": 5
  },
  "file_status_breakdown": [...],
  "file_type_breakdown": [...],
  "mime_type_breakdown": [...],
  "zenodo_statistics": {...},
  "metadata_mappings": [...],
  "batches": [...],
  "pipeline_steps": [...],
  "recent_api_activity": [...],
  "file_tree": [...]
}

Response Fields:

success (boolean): Operation success status
hdpc_path (string): Absolute path to the inspected database
project_info (object): Complete project metadata from project_info table
configuration (array): All project configuration key-value pairs
scan_settings (array): File scan settings with modality and scan options
file_statistics (object): Aggregated file statistics
file_status_breakdown (array): Count of files grouped by status
file_type_breakdown (array): Count of files grouped by type
mime_type_breakdown (array): Top 10 MIME types by file count
zenodo_statistics (object): Zenodo records statistics
metadata_mappings (array): Configured metadata mappings
batches (array): Processing batches
pipeline_steps (array): Configured pipeline steps
recent_api_activity (array): 10 most recent API log entries
file_tree (array, optional): Complete hierarchical file tree (only if showFiles is true)

File Tree Structure:

When showFiles is true, the file_tree array contains recursive file objects:

{
  "file_id": 1,
  "filename": "model.obj",
  "relative_path": "models/model.obj",
  "file_type": "primary_source",
  "status": "Valid",
  "is_primary_source": true,
  "size_bytes": 1048576,
  "mime_type": "model/obj",
  "parent_file_id": null,
  "added_timestamp": "2025-10-15T10:30:00Z",
  "children": [...]
}

Status Codes:

200 OK: Inspection completed successfully
400 Bad Request: Missing hdpcPath parameter
404 Not Found: HDPC file not found at specified path
500 Internal Server Error: Database error or unexpected error during inspection

Zenodo Integration¶

Get Uploads Tab Counts¶

Retrieves counts for different stages of the Zenodo upload workflow, used to populate UI tabs.

Endpoint: GET /project/uploads_tab_counts

Query Parameters:

is_sandbox (string, optional): Whether to count sandbox records. Defaults to "true". Accepts "true" or "false"

Response:

{
  "pending_preparation": 15,
  "pending_operations": 8,
  "drafts": 12,
  "published": 45,
  "versioning": 0
}

Response Fields:

pending_preparation (integer): Source files without prepared metadata
pending_operations (integer): Records with prepared metadata but no Zenodo draft created
drafts (integer): Active Zenodo drafts
published (integer): Unique published Zenodo concept records
versioning (integer): Records eligible for versioning (currently always 0)

Pending Preparation Query:

Counts root-level source files with statuses indicating they need metadata preparation, excluding files that already have records in non-failed states.

Status Codes:

200 OK: Counts retrieved successfully
400 Bad Request: No HDPC loaded
500 Internal Server Error: Database query failed

Match Files for Versioning¶

Matches files in a directory against published Zenodo records to facilitate new version creation.

Endpoint: POST /project/match_files_for_versioning

Decorator: @project_required

Request Body:

{
  "directory_path": "/data/new_versions",
  "match_method": "filename"
}

Request Parameters:

directory_path (string, required): Path to directory containing potential new version files
match_method (string, optional): Matching strategy. Options: "filename" (default) or "hashcode"

Response:

{
  "success": true,
  "matches": [
    {
      "concept_rec_id": "7891234",
      "record_title": "Medieval Manuscript - Volume 1",
      "matched_file_path": "/data/new_versions/manuscript_v2.xml"
    }
  ]
}

Response Fields:

success (boolean): Operation success status
matches (array): List of matched file-record pairs
concept_rec_id (string): Zenodo concept record ID for versioning
record_title (string): Title of the published record
matched_file_path (string): Absolute path to the matched file

Matching Methods:

Filename: Matches if the file in the directory has the same name as the source file of a published record

Hashcode: Matches if the SHA256 hash of the file content equals the hash of a published record's source file

Empty Response:

Returns empty matches array if no published records exist or no matches are found.

Status Codes:

200 OK: Matching completed successfully
400 Bad Request: Missing directory_path or no project loaded
404 Not Found: Directory not found at specified path
500 Internal Server Error: Matching operation failed

Metadata Management¶

Preview Prepared Metadata¶

Performs a dry run of metadata preparation without saving to the database, allowing users to preview the result.

Endpoint: POST /project/preview_prepared_metadata

Request Body:

{
  "source_file_db_id": 42
}

Request Parameters:

source_file_db_id (integer, required): Database ID of the source file

Response:

{
  "success": true,
  "prepared_metadata": {
    "metadata": {
      "title": "Medieval Manuscript 001",
      "upload_type": "dataset",
      "description": "Zenodo record for the data file: Medieval Manuscript 001.",
      "creators": [{"name": "Smith, John", "affiliation": "University"}],
      "access_right": "open"
    }
  },
  "filename": "manuscript_001.xml"
}

Response Fields:

success (boolean): Operation success status
prepared_metadata (object): Complete Zenodo API payload that would be submitted
filename (string): Name of the source file

Description Auto-Construction:

If the metadata mapping contains a construct_later flag for description, the endpoint automatically generates: "Zenodo record for the data file: {title}."

Status Codes:

200 OK: Metadata preview generated successfully
400 Bad Request: No HDPC loaded or missing source_file_db_id
404 Not Found: No active metadata mapping configured or source file not found
500 Internal Server Error: Metadata preparation failed

Prepare Metadata for File¶

Prepares and validates metadata for a source file, storing it in the database and creating a Zenodo record entry.

Endpoint: POST /project/prepare_metadata_for_file

Decorator: @project_required

Request Body:

{
  "source_file_db_id": 42,
  "target_is_sandbox": true,
  "overrides": {
    "title": "Custom Title Override",
    "keywords": ["heritage", "manuscript", "medieval"]
  }
}

Request Parameters:

source_file_db_id (integer, required): Database ID of the source file
target_is_sandbox (boolean, optional): Whether to prepare for Zenodo sandbox. Defaults to true
overrides (object, optional): User-provided metadata field overrides

Response:

{
  "success": true,
  "message": "Metadata prepared and validated successfully.",
  "log": [
    "Prepare metadata for File ID: 42, Target Sandbox: true",
    "Applying user overrides for fields: ['title', 'keywords']",
    "Sanitizing metadata: Removing empty optional fields: ['language']",
    "Metadata validated successfully.",
    "Metadata stored and record status set to 'prepared'."
  ]
}

Response Fields:

success (boolean): Operation success status
message (string): Summary message
log (array): Detailed execution log messages

Error Response:

{
  "success": false,
  "error": "Metadata validation failed.",
  "validation_errors": ["Title is required", "Invalid upload type"],
  "log": [...]
}

Preparation Process:

The endpoint follows this workflow:

Extracts metadata using active metadata mapping configuration
Applies user overrides from request
Sanitizes metadata by removing empty optional fields
Auto-constructs description if needed
Prepares metadata for Zenodo API format
Validates against Zenodo schema
Stores in database and creates zenodo_records entry with status prepared

Status Codes:

200 OK: Metadata prepared successfully
400 Bad Request: Validation failed or no project loaded
500 Internal Server Error: Unexpected error during preparation

File Management¶

Add Source Files¶

Adds source files to the project database, optionally associating them with an existing Zenodo record.

Endpoint: POST /project/source_files/add

Decorator: @project_required

Request Body:

{
  "absolute_file_paths": [
    "/data/output/processed_file_001.xml",
    "/data/output/processed_file_002.xml"
  ],
  "record_id_to_associate": 15,
  "pipeline_name": "xml_processor",
  "step_name": "transformation"
}

Request Parameters:

absolute_file_paths (array, required): List of absolute file paths to add
record_id_to_associate (integer, optional): Zenodo record ID to associate files with
pipeline_name (string, optional): Name of the pipeline that generated these files
step_name (string, optional): Name of the pipeline step that generated these files

Response:

{
  "message": "File addition process completed.",
  "added_count": 2,
  "skipped_existing_path": 0,
  "errors_count": 0,
  "errors": []
}

Response Fields:

message (string): Summary message
added_count (integer): Number of files successfully added
skipped_existing_path (integer): Number of files skipped because they already exist in database
errors_count (integer): Number of files that failed to add
errors (array): List of error messages for failed files

Duplicate Detection:

The endpoint performs intelligent duplicate detection:

Content Hash Matching: If a file's SHA256 hash matches the original source file for the associated record, it's skipped (prevents adding the source file as its own derivative)

Path Matching: If the exact file path already exists in the database, it's skipped but included in association if record_id_to_associate is provided

File Type Assignment:

derived: Files associated with a record (pipeline outputs)
source: Standalone files not associated with records

Status Assignment:

pending_upload: Derived files ready for Zenodo upload
pending: Standalone source files

Record Association:

When record_id_to_associate is provided, files are added to the record_files_map table with pending upload status.

Status Codes:

200 OK: File addition process completed (check individual counts for details)
400 Bad Request: Missing absolute_file_paths or no project loaded
500 Internal Server Error: Database error during file addition

Project Settings¶

Update Project Description¶

Updates the project description field in the project_info table.

Endpoint: POST /project/update_description

Decorator: @project_required

Request Body:

{
  "description": "Updated project description with more details about the collection."
}

Request Parameters:

description (string, required): New project description (can be empty string)

Response:

{
  "success": true,
  "message": "Project description updated successfully."
}

Status Codes:

200 OK: Description updated successfully
400 Bad Request: Missing description field or no project loaded
500 Internal Server Error: Database operation failed

Update Project Title¶

Updates the project name in the project_info table.

Endpoint: POST /project/update_title

Decorator: @project_required

Request Body:

{
  "title": "New Project Title"
}

Request Parameters:

title (string, required): New project title (cannot be empty)

Response:

{
  "success": true,
  "message": "Project title updated successfully."
}

Status Codes:

200 OK: Title updated successfully
400 Bad Request: Empty title or no project loaded
500 Internal Server Error: Database operation failed

Dashboard & Statistics¶

Get Dashboard Statistics¶

Retrieves aggregated statistics for the project dashboard, including record counts by status and environment.

Endpoint: GET /project/dashboard_stats

Decorator: @project_required

Response:

{
  "drafts_sandbox": 5,
  "published_sandbox": 12,
  "drafts_production": 2,
  "published_production": 38,
  "total_files": 250,
  "files_with_metadata": 220
}

Response Fields:

drafts_sandbox (integer): Count of draft records in Zenodo sandbox
published_sandbox (integer): Count of published records in Zenodo sandbox
drafts_production (integer): Count of draft records in Zenodo production
published_production (integer): Count of published records in Zenodo production
total_files (integer): Total number of files in the project
files_with_metadata (integer): Count of distinct files that have metadata values

Status Codes:

200 OK: Statistics retrieved successfully
400 Bad Request: No project loaded
500 Internal Server Error: Database query failed

Get Published Records¶

Retrieves the 10 most recently published Zenodo records from the project.

Endpoint: GET /project/published_records

Decorator: @project_required

Response:

[
  {
    "record_title": "Medieval Manuscript Collection - Volume 5",
    "zenodo_doi": "10.5281/zenodo.1234567",
    "zenodo_record_id": "1234567",
    "publication_date": "2025-10-20T15:30:00Z"
  },
  {
    "record_title": "3D Model - Ancient Artifact",
    "zenodo_doi": "10.5281/zenodo.7654321",
    "zenodo_record_id": "7654321",
    "publication_date": "2025-10-19T09:15:00Z"
  }
]

Response Fields:

Each record contains:

record_title (string): Title of the published record
zenodo_doi (string): Digital Object Identifier
zenodo_record_id (string): Zenodo record ID
publication_date (string): ISO 8601 timestamp of last update

Ordering:

Records are ordered by last_updated_timestamp in descending order.

Status Codes:

200 OK: Records retrieved successfully
400 Bad Request: No project loaded
500 Internal Server Error: Database query failed

Error Handling¶

Common Error Scenarios¶

Project Not Loaded:

Most endpoints return 400 Bad Request with message "No HDPC loaded" when no project is active.

Invalid File Path:

During project creation, if dataInPath does not exist: 400 Bad Request with message "The specified Input Data Directory does not exist: {path}".

Database Creation Failed:

500 Internal Server Error with message "Failed to create the HDPC database schema from YAML."

Transaction Rollback:

If an error occurs during project creation, the transaction is rolled back and the partially created .hdpc file is deleted.

Database Schema Integration¶

Tables Used¶

The API interacts with numerous database tables:

project_info: Core project metadata
project_configuration: Key-value configuration storage
file_scan_settings: Scan options and modality settings
source_files: File metadata and hierarchical relationships
zenodo_records: Zenodo record metadata and status
metadata_mapping_files: Metadata mapping configurations
metadata_values: Extracted metadata values
record_files_map: File-to-record associations
batches: Batch processing records
project_pipelines: Pipeline step configurations
api_log: API activity logging

Transaction Management¶

Critical operations like project creation use explicit transaction control with BEGIN TRANSACTION, COMMIT, and ROLLBACK for data integrity.

Usage Examples¶

Example 1: Create Project with 3D Models¶

POST /project/create-and-scan
Content-Type: application/json

{
  "projectName": "Cultural Heritage 3D Models",
  "shortCode": "CH3D",
  "hdpcPath": "/projects/heritage_3d.hdpc",
  "modalities": ["3d_model"],
  "dataInPath": "/data/3d_models",
  "dataOutPath": "/data/output",
  "batchEntity": "subdirectory",
  "scanOptions": {
    "extensions": [".obj", ".mtl", ".jpg", ".png"],
    "primarysourceext": ".obj",
    "bundling": {"enabled": false},
    "obj_options": {
      "add_mtl": true,
      "add_textures": true,
      "archive_subdirectories": true
    }
  }
}

Example 2: Load and Inspect Project¶

POST /hdpc/load
Content-Type: application/json

{"path": "/projects/heritage_3d.hdpc"}

Then:

POST /project/inspect
Content-Type: application/json

{
  "hdpcPath": "/projects/heritage_3d.hdpc",
  "showFiles": false
}

Example 3: Prepare Metadata with Overrides¶

POST /project/prepare_metadata_for_file
Content-Type: application/json

{
  "source_file_db_id": 42,
  "target_is_sandbox": false,
  "overrides": {
    "title": "Medieval Manuscript - Enhanced Edition",
    "description": "High-resolution scan with enhanced metadata",
    "keywords": ["medieval", "manuscript", "heritage"]
  }
}

Example 4: Add Derived Files from Pipeline¶

POST /project/source_files/add
Content-Type: application/json

{
  "absolute_file_paths": [
    "/output/processed/manuscript_001_validated.xml",
    "/output/processed/manuscript_001_metadata.json"
  ],
  "record_id_to_associate": 15,
  "pipeline_name": "xml_validation_pipeline",
  "step_name": "validation"
}

Configuration¶

Schema File Location¶

The HDPC database schema is loaded from: {CONFIG_FILE_PATH}/hdpc_schema.yaml

File Hash Calculation¶

SHA256 hashes are calculated for all files during scanning and file addition using the calculate_file_hash utility.

MIME Type Detection¶

MIME types are automatically detected using the get_file_mime_type utility function.