Skip to content

Project Management API Reference

The Heritage Data Processor Project Management API provides comprehensive endpoints for creating, loading, and managing HDPC (.hdpc) projects, including hierarchical file scanning, validation, metadata preparation, and Zenodo integration.

Base URL

Endpoints use various paths: /hdpc, /project_info, and /project.


Project Lifecycle

Load HDPC Database

Loads an existing .hdpc project database file into the application.

Endpoint: POST /hdpc/load

Request Body:

{
  "path": "/path/to/project.hdpc"
}

Request Parameters:

  • path (string, required): Absolute file path to the .hdpc database file

Response:

{
  "message": "HDPC loaded successfully",
  "project_name": "Medieval Manuscripts Project",
  "project_id": 1
}

Response Fields:

  • message (string): Confirmation message
  • project_name (string): Name of the loaded project. Defaults to "Unknown Project" if project_info table is empty
  • project_id (integer, nullable): Database ID of the project, or null if not available

Status Codes:

  • 200 OK: Project loaded successfully
  • 400 Bad Request: Path not provided
  • 500 Internal Server Error: Failed to load HDPC (file invalid or not found)

Get Project Info

Retrieves basic project information from the loaded HDPC database.

Endpoint: GET /project_info

Response:

{
  "project_id": 1,
  "project_name": "Medieval Manuscripts Project",
  "description": "Digital preservation of medieval manuscript collection",
  "hdpc_schema_version": "1.2.0"
}

Response Fields:

  • project_id (integer): Project database identifier
  • project_name (string): Project name
  • description (string): Project description
  • hdpc_schema_version (string): Version of the HDPC database schema

Empty Response:

Returns empty object {} if no project info exists.

Status Codes:

  • 200 OK: Project info retrieved successfully
  • 400 Bad Request: No HDPC loaded

Get Project Details with Modality

Retrieves project information including the configured data modality.

Endpoint: GET /project_details_with_modality

Response:

{
  "project_id": 1,
  "project_name": "3D Models Archive",
  "description": "Archive of 3D cultural heritage models",
  "hdpc_schema_version": "1.2.0",
  "modality": "[\"3d_model\", \"image\"]"
}

Response Fields:

  • project_id (integer): Project database identifier
  • project_name (string): Project name
  • description (string): Project description
  • hdpc_schema_version (string): Database schema version
  • modality (string): JSON-serialized array of data modalities, or "Not Set" if not configured

Status Codes:

  • 200 OK: Project details retrieved successfully
  • 400 Bad Request: No HDPC loaded
  • 500 Internal Server Error: Could not retrieve project info

Project Creation

Create and Scan Project

Creates a new HDPC project database with comprehensive file scanning, validation, hierarchical structure detection, and optional asset archiving.

Endpoint: POST /project/create-and-scan

Request Body:

{
  "projectName": "3D Models Collection",
  "shortCode": "3DMODELS",
  "hdpcPath": "/projects/3d_models.hdpc",
  "modalities": ["3d_model", "image"],
  "dataInPath": "/data/input/models",
  "dataOutPath": "/data/output",
  "batchEntity": "subdirectory",
  "scanOptions": {
    "extensions": [".obj", ".mtl", ".jpg", ".png"],
    "primarysourceext": ".obj",
    "bundling": {
      "enabled": true,
      "strategy": "stem_match"
    },
    "obj_options": {
      "add_mtl": true,
      "add_textures": true,
      "archive_subdirectories": true
    }
  }
}

Request Parameters:

  • projectName (string, required): Human-readable project name
  • shortCode (string, required): Short code identifier for the project
  • hdpcPath (string, required): Path where the .hdpc file will be created
  • modalities (array, required): List of data modalities (e.g., ["3d_model"], ["text", "image"])
  • dataInPath (string, required): Input data directory path for file scanning
  • dataOutPath (string, required): Output data directory path for generated files
  • batchEntity (string, required): File processing mode: "root", "subdirectory", or "hybrid"
  • scanOptions (object, required): File scanning configuration
  • extensions (array): List of file extensions to scan (with dot prefix)
  • primarysourceext (string): Primary source file extension for each group
  • bundling (object): Bundling configuration
    • enabled (boolean): Whether to bundle congruent files
    • strategy (string): Bundling strategy (e.g., "stem_match")
  • obj_options (object): Options for 3D model processing
    • add_mtl (boolean): Whether to scan for MTL files
    • add_textures (boolean): Whether to scan for texture files
    • archive_subdirectories (boolean): Whether to archive texture subdirectories into ZIP files

Response:

{
  "success": true,
  "message": "Project created and 47 files scanned.",
  "projectId": 1,
  "filesAdded": 47,
  "foundFiles": [
    {
      "name": "model_001.obj",
      "path": "/data/input/models/model_001.obj",
      "type": "primary_source",
      "status": "Valid",
      "is_primary_source": true,
      "relative_path": "model_001.obj",
      "validation_report": {},
      "children": [
        {
          "name": "model_001.mtl",
          "path": "/data/input/models/model_001.mtl",
          "type": "primary",
          "status": "Valid",
          "children": [
            {
              "name": "model_001_textures.zip",
              "path": "/data/output/archives/model_001_textures.zip",
              "type": "archive",
              "status": "Valid",
              "children": [...]
            }
          ]
        }
      ]
    }
  ]
}

Response Fields:

  • success (boolean): Operation success status
  • message (string): Summary message with file count
  • projectId (integer): Created project database ID
  • filesAdded (integer): Total number of files added to database
  • foundFiles (array): Hierarchical file structure array

Status Codes:

  • 201 Created: Project created successfully
  • 400 Bad Request: Missing required parameters or input directory does not exist
  • 500 Internal Server Error: Failed to create database schema or unexpected error during creation

Batch Entity Modes

The batchEntity parameter controls how files are grouped for processing:

Root Mode: Scans only files in the root of dataInPath. When bundling is enabled, groups files by common stem (e.g., model.obj, model.mtl, model.jpg become one group).

Subdirectory Mode: Treats each subdirectory as a separate processing group. All files within a subdirectory are grouped together recursively.

Hybrid Mode: Combines both approaches - scans root files with optional bundling and processes each subdirectory as a group.


File Validation Statuses

During scanning, files are validated and assigned one of these statuses:

  • Valid: File passed all validation checks
  • Invalid: File failed basic validation (corrupted, empty, wrong format)
  • Problems: File has multiple issues including validation failures and missing dependencies
  • MTL Missing: OBJ file missing its referenced MTL file
  • Textures Missing: MTL file missing some referenced texture files
  • File Conflict: Multiple non-identical files found for the same texture reference

Hierarchical File Structure

The scanner builds hierarchical relationships:

Primary Source: The main file for a record (e.g., .obj file)

Primary Dependencies: Direct dependencies (e.g., .mtl file referenced by OBJ)

Secondary Dependencies: Nested dependencies (e.g., textures referenced by MTL)

Archive Files: ZIP archives containing grouped assets

Archived Files: Individual files within archives


Archival Logic

When obj_options.archive_subdirectories is enabled, texture files in subdirectories are automatically archived:

  1. Detects texture files in subdirectories relative to MTL file
  2. Creates ZIP archive in {dataOutPath}/archives/ directory
  3. Archive name format: {base_stem}_{subdirectory_name}.zip
  4. Validates ZIP archive and marks archived files with archive_name metadata
  5. Maintains hierarchy: MTL → Archive → Archived Textures

Project Inspection

Inspect Project

Performs comprehensive analysis of an HDPC database, returning detailed statistics, file trees, and metadata about all project components.

Endpoint: POST /project/inspect

Request Body:

{
  "hdpcPath": "/projects/my_project.hdpc",
  "showFiles": true
}

Request Parameters:

  • hdpcPath (string, required): Path to the .hdpc database file to inspect
  • showFiles (boolean, optional): Whether to include complete file tree in response. Defaults to false

Response:

{
  "success": true,
  "hdpc_path": "/projects/my_project.hdpc",
  "project_info": {
    "project_id": 1,
    "project_name": "3D Models Archive",
    "project_short_code": "3DMA",
    "description": "Archive description",
    "creation_timestamp": "2025-10-15T10:00:00Z",
    "last_modified_timestamp": "2025-10-21T13:00:00Z",
    "hdpc_schema_version": "1.2.0"
  },
  "configuration": [...],
  "scan_settings": [...],
  "file_statistics": {
    "total_files": 150,
    "primary_files": 50,
    "root_files": 50,
    "associated_files": 100,
    "total_size_bytes": 2147483648,
    "unique_file_types": 8,
    "unique_statuses": 5
  },
  "file_status_breakdown": [...],
  "file_type_breakdown": [...],
  "mime_type_breakdown": [...],
  "zenodo_statistics": {...},
  "metadata_mappings": [...],
  "batches": [...],
  "pipeline_steps": [...],
  "recent_api_activity": [...],
  "file_tree": [...]
}

Response Fields:

  • success (boolean): Operation success status
  • hdpc_path (string): Absolute path to the inspected database
  • project_info (object): Complete project metadata from project_info table
  • configuration (array): All project configuration key-value pairs
  • scan_settings (array): File scan settings with modality and scan options
  • file_statistics (object): Aggregated file statistics
  • file_status_breakdown (array): Count of files grouped by status
  • file_type_breakdown (array): Count of files grouped by type
  • mime_type_breakdown (array): Top 10 MIME types by file count
  • zenodo_statistics (object): Zenodo records statistics
  • metadata_mappings (array): Configured metadata mappings
  • batches (array): Processing batches
  • pipeline_steps (array): Configured pipeline steps
  • recent_api_activity (array): 10 most recent API log entries
  • file_tree (array, optional): Complete hierarchical file tree (only if showFiles is true)

File Tree Structure:

When showFiles is true, the file_tree array contains recursive file objects:

{
  "file_id": 1,
  "filename": "model.obj",
  "relative_path": "models/model.obj",
  "file_type": "primary_source",
  "status": "Valid",
  "is_primary_source": true,
  "size_bytes": 1048576,
  "mime_type": "model/obj",
  "parent_file_id": null,
  "added_timestamp": "2025-10-15T10:30:00Z",
  "children": [...]
}

Status Codes:

  • 200 OK: Inspection completed successfully
  • 400 Bad Request: Missing hdpcPath parameter
  • 404 Not Found: HDPC file not found at specified path
  • 500 Internal Server Error: Database error or unexpected error during inspection

Zenodo Integration

Get Uploads Tab Counts

Retrieves counts for different stages of the Zenodo upload workflow, used to populate UI tabs.

Endpoint: GET /project/uploads_tab_counts

Query Parameters:

  • is_sandbox (string, optional): Whether to count sandbox records. Defaults to "true". Accepts "true" or "false"

Response:

{
  "pending_preparation": 15,
  "pending_operations": 8,
  "drafts": 12,
  "published": 45,
  "versioning": 0
}

Response Fields:

  • pending_preparation (integer): Source files without prepared metadata
  • pending_operations (integer): Records with prepared metadata but no Zenodo draft created
  • drafts (integer): Active Zenodo drafts
  • published (integer): Unique published Zenodo concept records
  • versioning (integer): Records eligible for versioning (currently always 0)

Pending Preparation Query:

Counts root-level source files with statuses indicating they need metadata preparation, excluding files that already have records in non-failed states.

Status Codes:

  • 200 OK: Counts retrieved successfully
  • 400 Bad Request: No HDPC loaded
  • 500 Internal Server Error: Database query failed

Match Files for Versioning

Matches files in a directory against published Zenodo records to facilitate new version creation.

Endpoint: POST /project/match_files_for_versioning

Decorator: @project_required

Request Body:

{
  "directory_path": "/data/new_versions",
  "match_method": "filename"
}

Request Parameters:

  • directory_path (string, required): Path to directory containing potential new version files
  • match_method (string, optional): Matching strategy. Options: "filename" (default) or "hashcode"

Response:

{
  "success": true,
  "matches": [
    {
      "concept_rec_id": "7891234",
      "record_title": "Medieval Manuscript - Volume 1",
      "matched_file_path": "/data/new_versions/manuscript_v2.xml"
    }
  ]
}

Response Fields:

  • success (boolean): Operation success status
  • matches (array): List of matched file-record pairs
  • concept_rec_id (string): Zenodo concept record ID for versioning
  • record_title (string): Title of the published record
  • matched_file_path (string): Absolute path to the matched file

Matching Methods:

Filename: Matches if the file in the directory has the same name as the source file of a published record

Hashcode: Matches if the SHA256 hash of the file content equals the hash of a published record's source file

Empty Response:

Returns empty matches array if no published records exist or no matches are found.

Status Codes:

  • 200 OK: Matching completed successfully
  • 400 Bad Request: Missing directory_path or no project loaded
  • 404 Not Found: Directory not found at specified path
  • 500 Internal Server Error: Matching operation failed

Metadata Management

Preview Prepared Metadata

Performs a dry run of metadata preparation without saving to the database, allowing users to preview the result.

Endpoint: POST /project/preview_prepared_metadata

Request Body:

{
  "source_file_db_id": 42
}

Request Parameters:

  • source_file_db_id (integer, required): Database ID of the source file

Response:

{
  "success": true,
  "prepared_metadata": {
    "metadata": {
      "title": "Medieval Manuscript 001",
      "upload_type": "dataset",
      "description": "Zenodo record for the data file: Medieval Manuscript 001.",
      "creators": [{"name": "Smith, John", "affiliation": "University"}],
      "access_right": "open"
    }
  },
  "filename": "manuscript_001.xml"
}

Response Fields:

  • success (boolean): Operation success status
  • prepared_metadata (object): Complete Zenodo API payload that would be submitted
  • filename (string): Name of the source file

Description Auto-Construction:

If the metadata mapping contains a construct_later flag for description, the endpoint automatically generates: "Zenodo record for the data file: {title}."

Status Codes:

  • 200 OK: Metadata preview generated successfully
  • 400 Bad Request: No HDPC loaded or missing source_file_db_id
  • 404 Not Found: No active metadata mapping configured or source file not found
  • 500 Internal Server Error: Metadata preparation failed

Prepare Metadata for File

Prepares and validates metadata for a source file, storing it in the database and creating a Zenodo record entry.

Endpoint: POST /project/prepare_metadata_for_file

Decorator: @project_required

Request Body:

{
  "source_file_db_id": 42,
  "target_is_sandbox": true,
  "overrides": {
    "title": "Custom Title Override",
    "keywords": ["heritage", "manuscript", "medieval"]
  }
}

Request Parameters:

  • source_file_db_id (integer, required): Database ID of the source file
  • target_is_sandbox (boolean, optional): Whether to prepare for Zenodo sandbox. Defaults to true
  • overrides (object, optional): User-provided metadata field overrides

Response:

{
  "success": true,
  "message": "Metadata prepared and validated successfully.",
  "log": [
    "Prepare metadata for File ID: 42, Target Sandbox: true",
    "Applying user overrides for fields: ['title', 'keywords']",
    "Sanitizing metadata: Removing empty optional fields: ['language']",
    "Metadata validated successfully.",
    "Metadata stored and record status set to 'prepared'."
  ]
}

Response Fields:

  • success (boolean): Operation success status
  • message (string): Summary message
  • log (array): Detailed execution log messages

Error Response:

{
  "success": false,
  "error": "Metadata validation failed.",
  "validation_errors": ["Title is required", "Invalid upload type"],
  "log": [...]
}

Preparation Process:

The endpoint follows this workflow:

  1. Extracts metadata using active metadata mapping configuration
  2. Applies user overrides from request
  3. Sanitizes metadata by removing empty optional fields
  4. Auto-constructs description if needed
  5. Prepares metadata for Zenodo API format
  6. Validates against Zenodo schema
  7. Stores in database and creates zenodo_records entry with status prepared

Status Codes:

  • 200 OK: Metadata prepared successfully
  • 400 Bad Request: Validation failed or no project loaded
  • 500 Internal Server Error: Unexpected error during preparation

File Management

Add Source Files

Adds source files to the project database, optionally associating them with an existing Zenodo record.

Endpoint: POST /project/source_files/add

Decorator: @project_required

Request Body:

{
  "absolute_file_paths": [
    "/data/output/processed_file_001.xml",
    "/data/output/processed_file_002.xml"
  ],
  "record_id_to_associate": 15,
  "pipeline_name": "xml_processor",
  "step_name": "transformation"
}

Request Parameters:

  • absolute_file_paths (array, required): List of absolute file paths to add
  • record_id_to_associate (integer, optional): Zenodo record ID to associate files with
  • pipeline_name (string, optional): Name of the pipeline that generated these files
  • step_name (string, optional): Name of the pipeline step that generated these files

Response:

{
  "message": "File addition process completed.",
  "added_count": 2,
  "skipped_existing_path": 0,
  "errors_count": 0,
  "errors": []
}

Response Fields:

  • message (string): Summary message
  • added_count (integer): Number of files successfully added
  • skipped_existing_path (integer): Number of files skipped because they already exist in database
  • errors_count (integer): Number of files that failed to add
  • errors (array): List of error messages for failed files

Duplicate Detection:

The endpoint performs intelligent duplicate detection:

Content Hash Matching: If a file's SHA256 hash matches the original source file for the associated record, it's skipped (prevents adding the source file as its own derivative)

Path Matching: If the exact file path already exists in the database, it's skipped but included in association if record_id_to_associate is provided

File Type Assignment:

  • derived: Files associated with a record (pipeline outputs)
  • source: Standalone files not associated with records

Status Assignment:

  • pending_upload: Derived files ready for Zenodo upload
  • pending: Standalone source files

Record Association:

When record_id_to_associate is provided, files are added to the record_files_map table with pending upload status.

Status Codes:

  • 200 OK: File addition process completed (check individual counts for details)
  • 400 Bad Request: Missing absolute_file_paths or no project loaded
  • 500 Internal Server Error: Database error during file addition

Project Settings

Update Project Description

Updates the project description field in the project_info table.

Endpoint: POST /project/update_description

Decorator: @project_required

Request Body:

{
  "description": "Updated project description with more details about the collection."
}

Request Parameters:

  • description (string, required): New project description (can be empty string)

Response:

{
  "success": true,
  "message": "Project description updated successfully."
}

Status Codes:

  • 200 OK: Description updated successfully
  • 400 Bad Request: Missing description field or no project loaded
  • 500 Internal Server Error: Database operation failed

Update Project Title

Updates the project name in the project_info table.

Endpoint: POST /project/update_title

Decorator: @project_required

Request Body:

{
  "title": "New Project Title"
}

Request Parameters:

  • title (string, required): New project title (cannot be empty)

Response:

{
  "success": true,
  "message": "Project title updated successfully."
}

Status Codes:

  • 200 OK: Title updated successfully
  • 400 Bad Request: Empty title or no project loaded
  • 500 Internal Server Error: Database operation failed

Dashboard & Statistics

Get Dashboard Statistics

Retrieves aggregated statistics for the project dashboard, including record counts by status and environment.

Endpoint: GET /project/dashboard_stats

Decorator: @project_required

Response:

{
  "drafts_sandbox": 5,
  "published_sandbox": 12,
  "drafts_production": 2,
  "published_production": 38,
  "total_files": 250,
  "files_with_metadata": 220
}

Response Fields:

  • drafts_sandbox (integer): Count of draft records in Zenodo sandbox
  • published_sandbox (integer): Count of published records in Zenodo sandbox
  • drafts_production (integer): Count of draft records in Zenodo production
  • published_production (integer): Count of published records in Zenodo production
  • total_files (integer): Total number of files in the project
  • files_with_metadata (integer): Count of distinct files that have metadata values

Status Codes:

  • 200 OK: Statistics retrieved successfully
  • 400 Bad Request: No project loaded
  • 500 Internal Server Error: Database query failed

Get Published Records

Retrieves the 10 most recently published Zenodo records from the project.

Endpoint: GET /project/published_records

Decorator: @project_required

Response:

[
  {
    "record_title": "Medieval Manuscript Collection - Volume 5",
    "zenodo_doi": "10.5281/zenodo.1234567",
    "zenodo_record_id": "1234567",
    "publication_date": "2025-10-20T15:30:00Z"
  },
  {
    "record_title": "3D Model - Ancient Artifact",
    "zenodo_doi": "10.5281/zenodo.7654321",
    "zenodo_record_id": "7654321",
    "publication_date": "2025-10-19T09:15:00Z"
  }
]

Response Fields:

Each record contains:

  • record_title (string): Title of the published record
  • zenodo_doi (string): Digital Object Identifier
  • zenodo_record_id (string): Zenodo record ID
  • publication_date (string): ISO 8601 timestamp of last update

Ordering:

Records are ordered by last_updated_timestamp in descending order.

Status Codes:

  • 200 OK: Records retrieved successfully
  • 400 Bad Request: No project loaded
  • 500 Internal Server Error: Database query failed

Error Handling

Common Error Scenarios

Project Not Loaded:

Most endpoints return 400 Bad Request with message "No HDPC loaded" when no project is active.

Invalid File Path:

During project creation, if dataInPath does not exist: 400 Bad Request with message "The specified Input Data Directory does not exist: {path}".

Database Creation Failed:

500 Internal Server Error with message "Failed to create the HDPC database schema from YAML."

Transaction Rollback:

If an error occurs during project creation, the transaction is rolled back and the partially created .hdpc file is deleted.


Database Schema Integration

Tables Used

The API interacts with numerous database tables:

  • project_info: Core project metadata
  • project_configuration: Key-value configuration storage
  • file_scan_settings: Scan options and modality settings
  • source_files: File metadata and hierarchical relationships
  • zenodo_records: Zenodo record metadata and status
  • metadata_mapping_files: Metadata mapping configurations
  • metadata_values: Extracted metadata values
  • record_files_map: File-to-record associations
  • batches: Batch processing records
  • project_pipelines: Pipeline step configurations
  • api_log: API activity logging

Transaction Management

Critical operations like project creation use explicit transaction control with BEGIN TRANSACTION, COMMIT, and ROLLBACK for data integrity.


Usage Examples

Example 1: Create Project with 3D Models

POST /project/create-and-scan
Content-Type: application/json

{
  "projectName": "Cultural Heritage 3D Models",
  "shortCode": "CH3D",
  "hdpcPath": "/projects/heritage_3d.hdpc",
  "modalities": ["3d_model"],
  "dataInPath": "/data/3d_models",
  "dataOutPath": "/data/output",
  "batchEntity": "subdirectory",
  "scanOptions": {
    "extensions": [".obj", ".mtl", ".jpg", ".png"],
    "primarysourceext": ".obj",
    "bundling": {"enabled": false},
    "obj_options": {
      "add_mtl": true,
      "add_textures": true,
      "archive_subdirectories": true
    }
  }
}

Example 2: Load and Inspect Project

POST /hdpc/load
Content-Type: application/json

{"path": "/projects/heritage_3d.hdpc"}

Then:

POST /project/inspect
Content-Type: application/json

{
  "hdpcPath": "/projects/heritage_3d.hdpc",
  "showFiles": false
}

Example 3: Prepare Metadata with Overrides

POST /project/prepare_metadata_for_file
Content-Type: application/json

{
  "source_file_db_id": 42,
  "target_is_sandbox": false,
  "overrides": {
    "title": "Medieval Manuscript - Enhanced Edition",
    "description": "High-resolution scan with enhanced metadata",
    "keywords": ["medieval", "manuscript", "heritage"]
  }
}

Example 4: Add Derived Files from Pipeline

POST /project/source_files/add
Content-Type: application/json

{
  "absolute_file_paths": [
    "/output/processed/manuscript_001_validated.xml",
    "/output/processed/manuscript_001_metadata.json"
  ],
  "record_id_to_associate": 15,
  "pipeline_name": "xml_validation_pipeline",
  "step_name": "validation"
}

Configuration

Schema File Location

The HDPC database schema is loaded from: {CONFIG_FILE_PATH}/hdpc_schema.yaml

File Hash Calculation

SHA256 hashes are calculated for all files during scanning and file addition using the calculate_file_hash utility.

MIME Type Detection

MIME types are automatically detected using the get_file_mime_type utility function.