Skip to content

Heritage Data Processor - Command-Line Interface Documentation

Overview

The Heritage Data Processor (HDP) provides a comprehensive command-line interface for managing cultural heritage data workflows, including file processing, Zenodo integration, pipeline execution, and component management.

All commands start with hdp (e.g., hdp upload, hdp process). The CLI automatically manages the backend server, starting it when needed unless explicitly disabled with --no-auto-server.


Table of Contents


Global Options

These options can be used with any command:

Option Default Description
--server-host HOST 127.0.0.1 Server host address
--server-port PORT 5001 Server port number
--server-config FILE config.yaml Path to server configuration file
--no-auto-server False Disable automatic server startup (requires manual hdp-server execution)

Example:

hdp upload --hdpc project.hdpc --input-dir /data/files --server-port 8080


Core Workflow Commands

upload

Add files to your project, prepare metadata, and create Zenodo drafts.

Usage:

hdp upload --hdpc <PROJECT_FILE> --input-dir <DIRECTORY> [OPTIONS]

Required Arguments:

Argument Description
--hdpc PATH Path to your .hdpc project file
--input-dir PATH Directory containing files to upload

Optional Arguments:

Argument Description
--extensions EXT [EXT...] File extensions to include (e.g., .jpg .png .obj)
--recursive Scan subdirectories recursively
--sandbox Use Zenodo Sandbox environment (default)
--production Use Zenodo Production environment

Examples:

Upload all supported files from a directory:

hdp upload --hdpc myproject.hdpc --input-dir /data/images

Upload specific file types recursively:

hdp upload --hdpc myproject.hdpc --input-dir /data/3d_models \
  --extensions .obj .mtl .png --recursive

Upload to production environment:

hdp upload --hdpc myproject.hdpc --input-dir /data/artifacts --production

Workflow: 1. Loads the specified HDPC project 2. Scans the input directory for matching files 3. Adds files to the project database 4. Prepares metadata for each file 5. Creates Zenodo drafts for all prepared records


process

Execute a pipeline on existing Zenodo drafts with optional filtering.

Usage:

hdp process --hdpc <PROJECT_FILE> --pipeline <PIPELINE_NAME> [OPTIONS]

Required Arguments:

Argument Description
--hdpc PATH Path to your .hdpc project file
--pipeline NAME Name identifier of the pipeline to execute

Environment Selection:

Argument Description
--sandbox Target Sandbox environment drafts (default)
--production Target Production environment drafts

Filtering Options:

Argument Description
--search TERM General search term to filter records by title or filename
--title-pattern PATTERN Pattern to match record titles (use % as wildcard)
--since DATE Filter records created on or after this date (YYYY-MM-DD)
--until DATE Filter records created on or before this date (YYYY-MM-DD)

Examples:

Process all drafts with a pipeline:

hdp process --hdpc myproject.hdpc --pipeline image-enhancement

Process only recent drafts:

hdp process --hdpc myproject.hdpc --pipeline metadata-enrichment \
  --since 2025-10-01

Process drafts matching a title pattern:

hdp process --hdpc myproject.hdpc --pipeline 3d-optimization \
  --title-pattern "Artifact%" --sandbox

Process specific records by search term:

hdp process --hdpc myproject.hdpc --pipeline thumbnail-generation \
  --search "pottery"


publish

Publish ready Zenodo drafts to make them publicly accessible.

Usage:

hdp publish {--all | --record-ids ID [ID...]} --hdpc <PROJECT_FILE> [OPTIONS]

Required Arguments (Mutually Exclusive):

Argument Description
--all Publish ALL ready drafts (all files uploaded)
--record-ids ID [ID...] Space-separated list of specific local record database IDs to publish

Other Required:

Argument Description
--hdpc PATH Path to your .hdpc project file

Environment Selection:

Argument Description
--sandbox Target Sandbox environment (default)
--production Target Production environment

Examples:

Publish all ready drafts:

hdp publish --all --hdpc myproject.hdpc --sandbox

Publish specific records:

hdp publish --record-ids 45 67 89 --hdpc myproject.hdpc

Publish to production:

hdp publish --all --hdpc myproject.hdpc --production

Safety Features: - Prompts for confirmation before publishing - Only publishes drafts with all files successfully uploaded - Provides detailed success/failure report for each record


create

Initialize a new HDPC project with automatic file scanning and configuration.

Usage:

hdp create --hdpc-path <PATH> --project-name <NAME> --short-code <CODE> \
  --modality <TYPE> --input-dir <DIR> --output-dir <DIR> [OPTIONS]

Required Arguments:

Argument Description
--hdpc-path PATH Path where the .hdpc project file will be created
--project-name NAME Descriptive name for the project
--short-code CODE Short unique code for the project (e.g., MyProject2025)
--modality TYPE Primary data modality (see choices below)
--input-dir PATH Directory containing source data files to scan
--output-dir PATH Directory where processed outputs will be stored

Modality Choices: - 3D Model - Image / Photography - Audio - Video - Text / Document

File Scanning Options:

Argument Description
--extensions EXT [EXT...] File extensions to scan (e.g., .obj .mtl .png). If omitted, uses modality defaults
--primary-ext EXT Primary source file extension (e.g., .obj for 3D models)
--batch-entity MODE File grouping mode: root, subdirectory, or hybrid (default: root)

Bundling Options:

Argument Description
--enable-bundling Enable automatic file bundling (default: enabled)
--no-bundling Disable automatic file bundling
--bundling-strategy TYPE Strategy: smart (recommended), strict, or loose (default: smart)

3D Model Specific Options:

Argument Description
--add-mtl Scan for and include MTL files with OBJ files (default: enabled)
--no-add-mtl Do not scan for MTL files
--add-textures Scan for and include texture files (default: enabled)
--no-add-textures Do not scan for texture files
--archive-textures Archive texture subdirectories into ZIP files
--texture-paths PATH [PATH...] Additional directories to search for texture files

Examples:

Create a basic 3D model project:

hdp create --hdpc-path /projects/museum.hdpc \
  --project-name "Museum Artifacts 2025" \
  --short-code "MA2025" \
  --modality "3D Model" \
  --input-dir /data/3d_scans \
  --output-dir /data/processed

Create with specific file types and bundling:

hdp create --hdpc-path /projects/photos.hdpc \
  --project-name "Archaeological Photography" \
  --short-code "ArchPhoto2025" \
  --modality "Image / Photography" \
  --input-dir /data/photos \
  --output-dir /data/output \
  --extensions .jpg .png .tif \
  --batch-entity subdirectory \
  --bundling-strategy strict

Create 3D project with texture handling:

hdp create --hdpc-path /projects/sculptures.hdpc \
  --project-name "Digital Sculptures" \
  --short-code "DS2025" \
  --modality "3D Model" \
  --input-dir /data/models \
  --output-dir /data/results \
  --add-textures \
  --archive-textures \
  --texture-paths /data/textures /data/materials


inspect

Analyze and display comprehensive information about an HDPC project.

Usage:

hdp inspect --hdpc-path <PATH> [OPTIONS]

Required Arguments:

Argument Description
--hdpc-path PATH Path to the .hdpc project file to inspect

Optional Arguments:

Argument Description
--show-files Display complete file tree with bundles and processing states (verbose)

Examples:

Basic project inspection:

hdp inspect --hdpc-path myproject.hdpc

Detailed inspection with file tree:

hdp inspect --hdpc-path myproject.hdpc --show-files

Output Sections: - Project Information: Name, ID, version, timestamps - Configuration: Key-value configuration pairs - File Statistics: Total files, sizes, types - File Status Breakdown: Counts by processing status - File Type Breakdown: Distribution of file types - MIME Type Breakdown: Top 10 MIME types - Zenodo Statistics: Draft/published record counts - Metadata Mappings: Configured mapping templates - Batches: Batch processing information - File Tree: Hierarchical file structure (if --show-files enabled)


run-pipeline

Execute a pipeline directly on local files without creating Zenodo drafts first.

Usage:

hdp run-pipeline --hdpc <PROJECT_FILE> --pipeline <PIPELINE_NAME> \
  --input-dir <DIRECTORY> [OPTIONS]

Required Arguments:

Argument Description
--hdpc PATH Path to your .hdpc project file
--pipeline NAME Name identifier of the pipeline to execute
--input-dir PATH Directory containing the input files to process

Optional Arguments:

Argument Description
--extensions EXT [EXT...] File extensions to filter by (e.g., .jpg .png)
--recursive Scan for files in subdirectories
--sandbox Execute in Sandbox environment (default)
--production Execute in Production environment

Examples:

Run pipeline on all files in directory:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline batch-thumbnail \
  --input-dir /data/images

Run pipeline on specific file types recursively:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline format-conversion \
  --input-dir /data/scans \
  --extensions .tif .tiff \
  --recursive

Use Cases: - Quick batch processing without Zenodo integration - Testing pipelines on local data - Processing files that won't be published - Direct file transformations and analysis


create-version

Create new versions of existing published Zenodo records with updated files.

Usage:

hdp create-version --hdpc <PROJECT_FILE> --input-dir <DIRECTORY> \
  --pipeline <PIPELINE_NAME> [OPTIONS]

Required Arguments:

Argument Description
--hdpc PATH Path to your .hdpc project file
--input-dir PATH Directory containing the new or updated source files
--pipeline NAME Name of the pipeline to execute for creating the new version

Optional Arguments:

Argument Description
--match-method METHOD Method to match files with existing records: filename or hashcode (default: filename)
--sandbox Target Sandbox environment records (default)
--production Target Production environment records

Examples:

Create versions using filename matching:

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/updated_models \
  --pipeline version-update \
  --match-method filename

Create versions using content hash matching:

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/corrected_files \
  --pipeline version-correction \
  --match-method hashcode \
  --production

Workflow: 1. Scans input directory for files 2. Matches files to existing published records using specified method 3. Creates new draft versions for each matched record 4. Executes pipeline on the new versions 5. New versions remain as drafts until published with hdp publish

Matching Methods: - filename: Matches based on original filename (faster, less precise) - hashcode: Matches based on file content hash (slower, more accurate)


Component Management Commands

component-list

List all available and installed HDP pipeline components.

Usage:

hdp component-list [OPTIONS]

Optional Arguments:

Argument Description
--details Show detailed information for each component
--category CATEGORY Filter by component category
--installed-only Show only installed components
--available-only Show only available (not installed) components

Examples:

List all components:

hdp component-list

Show detailed component information:

hdp component-list --details

List only installed components:

hdp component-list --installed-only

Filter by category:

hdp component-list --category "Image Processing"


component-install

Install one or more HDP pipeline components.

Usage:

hdp component-install {--components NAME [NAME...] | --all} [OPTIONS]

Required Arguments (Mutually Exclusive):

Argument Description
--components NAME [NAME...] Space-separated list of component names to install
--all Install all available components

Optional Arguments:

Argument Description
--skip-install-script Skip running component's install.py script
--stop-on-error Stop installation process if any component fails

Examples:

Install specific components:

hdp component-install --components image-enhancer metadata-extractor

Install all available components:

hdp component-install --all

Install with error handling:

hdp component-install --components 3d-converter --stop-on-error

Skip installation scripts (dependencies only):

hdp component-install --components custom-processor --skip-install-script

Installation Process: 1. Validates component structure 2. Creates isolated Python virtual environment 3. Installs required dependencies 4. Runs component-specific installation script 5. Registers component in database


component-uninstall

Uninstall one or more HDP pipeline components.

Usage:

hdp component-uninstall --components NAME [NAME...]

Required Arguments:

Argument Description
--components NAME [NAME...] Space-separated list of component names to uninstall

Examples:

Uninstall a single component:

hdp component-uninstall --components old-processor

Uninstall multiple components:

hdp component-uninstall --components deprecated-tool unused-filter legacy-converter

Warning: Uninstallation removes the component's virtual environment, dependencies, and registration. This action cannot be undone.


component-download

Download pipeline components from the remote Zenodo repository.

Usage:

hdp component-download [--components NAME [NAME...]] [OPTIONS]

Optional Arguments:

Argument Description
--components NAME [NAME...] Specific component names to download. If omitted, lists available components
--install Automatically install components after downloading

Examples:

List available remote components:

hdp component-download

Download specific components:

hdp component-download --components advanced-filter custom-pipeline

Download and install automatically:

hdp component-download --components image-processor --install

Workflow: 1. Fetches component catalog from remote repository 2. Downloads specified component packages (ZIP archives) 3. Extracts components to local directory 4. Optionally installs components if --install flag is used


component-info

Display comprehensive information about a specific component.

Usage:

hdp component-info <COMPONENT_NAME>

Required Arguments:

Argument Description
COMPONENT_NAME Name of the component to inspect

Examples:

View component details:

hdp component-info image-enhancer

View 3D processing component:

hdp component-info mesh-optimizer

Output Sections: - Basic Information: Name, label, version, category, description - Authors: Contributors with affiliations and ORCID identifiers - Contact: Email and support information - License: License type and URL - Status: Development status and availability - Tags/Keywords: Searchable metadata tags - Installation Status: Whether installed and installation details - Validation Status: Component validation errors (if any) - Inputs: Required and optional input specifications (first 10) - Outputs: Generated output specifications (first 10) - Parameter Groups: Configurable parameters organized by category - Requirements: Python packages, system dependencies, resource requirements - Execution Settings: Timeout, idempotency, error handling behavior


Search for components by name, description, or keywords.

Usage:

hdp component-search <QUERY> [OPTIONS]

Required Arguments:

Argument Description
QUERY Search query string (searches name, label, description, tags)

Optional Arguments:

Argument Description
--category CATEGORY Filter results by component category

Examples:

Search for image processing components:

hdp component-search "image processing"

Search with category filter:

hdp component-search "3d" --category "Mesh Processing"

Search for metadata tools:

hdp component-search metadata

Search Behavior: - Case-insensitive matching - Searches across name, label, description, and tags - Returns both installed and available components - Displays installation status for each result


component-update

Check for and optionally install updates for installed components.

Usage:

hdp component-update [OPTIONS]

Optional Arguments:

Argument Description
--install-updates Automatically install available updates

Examples:

Check for updates:

hdp component-update

Check and install updates automatically:

hdp component-update --install-updates

Update Process: 1. Compares local component versions with remote repository 2. Lists components with available updates 3. Displays version changes and release notes 4. Optionally downloads and installs updates

Output: - Current version → Latest version for each component - Zenodo record URL for release notes - Success/failure status for each update


Server Management

hdp-server

Start the HDP Flask backend server independently (separate from CLI commands).

Usage:

hdp-server [OPTIONS]

Optional Arguments:

Argument Description
--host HOST Server host address (default: 127.0.0.1)
--port PORT Server port number (default: 5001)
--config FILE Path to configuration file (default: config.yaml)
--alpha-features Enable experimental alpha features

Examples:

Start server with defaults:

hdp-server

Start on custom port:

hdp-server --port 8080

Start with specific configuration and alpha features:

hdp-server --config /path/to/custom-config.yaml --alpha-features

Notes: - The CLI automatically starts the server when needed unless --no-auto-server is used - Manual server startup is useful for: - Long-running server sessions - Development and debugging - Custom server configurations - Multiple concurrent CLI operations


Workflows and Examples

Complete Upload-to-Publish Workflow

Step 1: Create a new project

hdp create --hdpc-path /projects/artifacts.hdpc \
  --project-name "Museum Artifacts 2025" \
  --short-code "MA2025" \
  --modality "3D Model" \
  --input-dir /data/3d_scans \
  --output-dir /data/processed

Step 2: Upload files and create drafts

hdp upload --hdpc /projects/artifacts.hdpc \
  --input-dir /data/3d_scans \
  --extensions .obj .mtl .png \
  --recursive \
  --sandbox

Step 3: Execute processing pipeline

hdp process --hdpc /projects/artifacts.hdpc \
  --pipeline mesh-optimization \
  --sandbox

Step 4: Inspect project status

hdp inspect --hdpc-path /projects/artifacts.hdpc

Step 5: Publish ready drafts

hdp publish --all --hdpc /projects/artifacts.hdpc --sandbox

Component Installation and Usage Workflow

Step 1: Search for components

hdp component-search "image enhancement"

Step 2: View component details

hdp component-info image-enhancer

Step 3: Install component

hdp component-install --components image-enhancer

Step 4: Use component in pipeline

hdp process --hdpc myproject.hdpc --pipeline image-enhancement

Versioning Workflow

Step 1: Update local files

# User updates files in /data/updated_files/

Step 2: Create new versions

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/updated_files \
  --pipeline version-update \
  --match-method filename \
  --production

Step 3: Review drafts

hdp inspect --hdpc-path myproject.hdpc

Step 4: Publish new versions

hdp publish --all --hdpc myproject.hdpc --production

Batch Processing Workflow

Direct pipeline execution on local files:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline thumbnail-generator \
  --input-dir /data/raw_images \
  --extensions .jpg .png \
  --recursive


Environment Variables

The following environment variables can be used to configure HDP behavior:

Variable Description Default
HDP_SERVER_HOST Default server host 127.0.0.1
HDP_SERVER_PORT Default server port 5001
HDP_CONFIG_PATH Default configuration file path config.yaml
HDP_LOG_LEVEL Logging level (DEBUG, INFO, WARNING, ERROR) INFO

Example:

export HDP_SERVER_PORT=8080
export HDP_LOG_LEVEL=DEBUG
hdp upload --hdpc project.hdpc --input-dir /data/files


Exit Codes

The CLI uses standard exit codes to indicate command execution status:

Code Meaning
0 Success
1 General error (API failure, invalid arguments, processing error)
130 User interruption (Ctrl+C)

Tips and Best Practices

1. Always use Sandbox first - Test workflows in Sandbox environment before publishing to Production - Sandbox allows unlimited testing without affecting permanent records

2. Use meaningful project names and short codes - Helps identify projects in logs and databases - Short codes should be unique and descriptive

3. Filter processing strategically - Use --search, --title-pattern, --since, and --until to process specific subsets - Reduces processing time and resource usage

4. Regular project inspection - Run hdp inspect periodically to monitor project status - Use --show-files to verify file structure and bundling

5. Component updates - Regularly check for component updates with hdp component-update - Keep components updated for bug fixes and new features

6. Batch operations - Use run-pipeline for quick batch processing without Zenodo integration - Use uploadprocesspublish for full archival workflow

7. Error handling - Check exit codes in scripts: if [ $? -ne 0 ]; then echo "Error occurred"; fi - Review logs for detailed error information


Troubleshooting

Server connection errors:

# Check if server is running
hdp inspect --hdpc-path test.hdpc

# Start server manually if needed
hdp-server --port 5001

# Use --no-auto-server if managing server separately
hdp upload --hdpc project.hdpc --input-dir /data --no-auto-server

Component installation failures:

# Install with verbose output
hdp component-install --components problematic-component

# Skip installation script if it's causing issues
hdp component-install --components component-name --skip-install-script

File matching issues in versioning: ```bash

Try alternative matching method

hdp create-version --hdpc project.hdpc \ --input-dir /data/files \ --pipeline update \ --match-method hashcode # Instead of filename ```****


Version: 0.1.0-alpha.4 Last Updated: October 14, 2025