Heritage Data Processor - Command-Line Interface Documentation¶

Overview¶

The Heritage Data Processor (HDP) provides a comprehensive command-line interface for managing cultural heritage data workflows, including file processing, Zenodo integration, pipeline execution, and component management.

All commands start with hdp (e.g., hdp upload, hdp process). The CLI automatically manages the backend server, starting it when needed unless explicitly disabled with --no-auto-server.

Table of Contents¶

Heritage Data Processor - Command-Line Interface Documentation
Overview
Table of Contents
Global Options
Core Workflow Commands
- upload
- process
- publish
- create
- inspect
- run-pipeline
- create-version
Component Management Commands
Server Management
- hdp-server
Workflows and Examples
Environment Variables
Exit Codes
Tips and Best Practices
Troubleshooting
Try alternative matching method

Global Options¶

These options can be used with any command:

Option	Default	Description
`--server-host HOST`	`127.0.0.1`	Server host address
`--server-port PORT`	`5001`	Server port number
`--server-config FILE`	`config.yaml`	Path to server configuration file
`--no-auto-server`	False	Disable automatic server startup (requires manual `hdp-server` execution)

Example:

hdp upload --hdpc project.hdpc --input-dir /data/files --server-port 8080

Core Workflow Commands¶

upload¶

Add files to your project, prepare metadata, and create Zenodo drafts.

Usage:

hdp upload --hdpc <PROJECT_FILE> --input-dir <DIRECTORY> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc PATH`	Path to your `.hdpc` project file
`--input-dir PATH`	Directory containing files to upload

Optional Arguments:

Argument	Description
`--extensions EXT [EXT...]`	File extensions to include (e.g., `.jpg .png .obj`)
`--recursive`	Scan subdirectories recursively
`--sandbox`	Use Zenodo Sandbox environment (default)
`--production`	Use Zenodo Production environment

Examples:

Upload all supported files from a directory:

hdp upload --hdpc myproject.hdpc --input-dir /data/images

Upload specific file types recursively:

hdp upload --hdpc myproject.hdpc --input-dir /data/3d_models \
  --extensions .obj .mtl .png --recursive

Upload to production environment:

hdp upload --hdpc myproject.hdpc --input-dir /data/artifacts --production

Workflow: 1. Loads the specified HDPC project 2. Scans the input directory for matching files 3. Adds files to the project database 4. Prepares metadata for each file 5. Creates Zenodo drafts for all prepared records

process¶

Execute a pipeline on existing Zenodo drafts with optional filtering.

Usage:

hdp process --hdpc <PROJECT_FILE> --pipeline <PIPELINE_NAME> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc PATH`	Path to your `.hdpc` project file
`--pipeline NAME`	Name identifier of the pipeline to execute

Environment Selection:

Argument	Description
`--sandbox`	Target Sandbox environment drafts (default)
`--production`	Target Production environment drafts

Filtering Options:

Argument	Description
`--search TERM`	General search term to filter records by title or filename
`--title-pattern PATTERN`	Pattern to match record titles (use `%` as wildcard)
`--since DATE`	Filter records created on or after this date (YYYY-MM-DD)
`--until DATE`	Filter records created on or before this date (YYYY-MM-DD)

Examples:

Process all drafts with a pipeline:

hdp process --hdpc myproject.hdpc --pipeline image-enhancement

Process only recent drafts:

hdp process --hdpc myproject.hdpc --pipeline metadata-enrichment \
  --since 2025-10-01

Process drafts matching a title pattern:

hdp process --hdpc myproject.hdpc --pipeline 3d-optimization \
  --title-pattern "Artifact%" --sandbox

Process specific records by search term:

hdp process --hdpc myproject.hdpc --pipeline thumbnail-generation \
  --search "pottery"

publish¶

Publish ready Zenodo drafts to make them publicly accessible.

Usage:

hdp publish {--all | --record-ids ID [ID...]} --hdpc <PROJECT_FILE> [OPTIONS]

Required Arguments (Mutually Exclusive):

Argument	Description
`--all`	Publish ALL ready drafts (all files uploaded)
`--record-ids ID [ID...]`	Space-separated list of specific local record database IDs to publish

Other Required:

Argument	Description
`--hdpc PATH`	Path to your `.hdpc` project file

Environment Selection:

Argument	Description
`--sandbox`	Target Sandbox environment (default)
`--production`	Target Production environment

Examples:

Publish all ready drafts:

hdp publish --all --hdpc myproject.hdpc --sandbox

Publish specific records:

hdp publish --record-ids 45 67 89 --hdpc myproject.hdpc

Publish to production:

hdp publish --all --hdpc myproject.hdpc --production

Safety Features: - Prompts for confirmation before publishing - Only publishes drafts with all files successfully uploaded - Provides detailed success/failure report for each record

create¶

Initialize a new HDPC project with automatic file scanning and configuration.

Usage:

hdp create --hdpc-path <PATH> --project-name <NAME> --short-code <CODE> \
  --modality <TYPE> --input-dir <DIR> --output-dir <DIR> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc-path PATH`	Path where the `.hdpc` project file will be created
`--project-name NAME`	Descriptive name for the project
`--short-code CODE`	Short unique code for the project (e.g., `MyProject2025`)
`--modality TYPE`	Primary data modality (see choices below)
`--input-dir PATH`	Directory containing source data files to scan
`--output-dir PATH`	Directory where processed outputs will be stored

Modality Choices: - 3D Model - Image / Photography - Audio - Video - Text / Document

File Scanning Options:

Argument	Description
`--extensions EXT [EXT...]`	File extensions to scan (e.g., `.obj .mtl .png`). If omitted, uses modality defaults
`--primary-ext EXT`	Primary source file extension (e.g., `.obj` for 3D models)
`--batch-entity MODE`	File grouping mode: `root`, `subdirectory`, or `hybrid` (default: `root`)

Bundling Options:

Argument	Description
`--enable-bundling`	Enable automatic file bundling (default: enabled)
`--no-bundling`	Disable automatic file bundling
`--bundling-strategy TYPE`	Strategy: `smart` (recommended), `strict`, or `loose` (default: `smart`)

3D Model Specific Options:

Argument	Description
`--add-mtl`	Scan for and include MTL files with OBJ files (default: enabled)
`--no-add-mtl`	Do not scan for MTL files
`--add-textures`	Scan for and include texture files (default: enabled)
`--no-add-textures`	Do not scan for texture files
`--archive-textures`	Archive texture subdirectories into ZIP files
`--texture-paths PATH [PATH...]`	Additional directories to search for texture files

Examples:

Create a basic 3D model project:

hdp create --hdpc-path /projects/museum.hdpc \
  --project-name "Museum Artifacts 2025" \
  --short-code "MA2025" \
  --modality "3D Model" \
  --input-dir /data/3d_scans \
  --output-dir /data/processed

Create with specific file types and bundling:

hdp create --hdpc-path /projects/photos.hdpc \
  --project-name "Archaeological Photography" \
  --short-code "ArchPhoto2025" \
  --modality "Image / Photography" \
  --input-dir /data/photos \
  --output-dir /data/output \
  --extensions .jpg .png .tif \
  --batch-entity subdirectory \
  --bundling-strategy strict

Create 3D project with texture handling:

hdp create --hdpc-path /projects/sculptures.hdpc \
  --project-name "Digital Sculptures" \
  --short-code "DS2025" \
  --modality "3D Model" \
  --input-dir /data/models \
  --output-dir /data/results \
  --add-textures \
  --archive-textures \
  --texture-paths /data/textures /data/materials

inspect¶

Analyze and display comprehensive information about an HDPC project.

Usage:

hdp inspect --hdpc-path <PATH> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc-path PATH`	Path to the `.hdpc` project file to inspect

Optional Arguments:

Argument	Description
`--show-files`	Display complete file tree with bundles and processing states (verbose)

Examples:

Basic project inspection:

hdp inspect --hdpc-path myproject.hdpc

Detailed inspection with file tree:

hdp inspect --hdpc-path myproject.hdpc --show-files

Output Sections: - Project Information: Name, ID, version, timestamps - Configuration: Key-value configuration pairs - File Statistics: Total files, sizes, types - File Status Breakdown: Counts by processing status - File Type Breakdown: Distribution of file types - MIME Type Breakdown: Top 10 MIME types - Zenodo Statistics: Draft/published record counts - Metadata Mappings: Configured mapping templates - Batches: Batch processing information - File Tree: Hierarchical file structure (if --show-files enabled)

run-pipeline¶

Execute a pipeline directly on local files without creating Zenodo drafts first.

Usage:

hdp run-pipeline --hdpc <PROJECT_FILE> --pipeline <PIPELINE_NAME> \
  --input-dir <DIRECTORY> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc PATH`	Path to your `.hdpc` project file
`--pipeline NAME`	Name identifier of the pipeline to execute
`--input-dir PATH`	Directory containing the input files to process

Optional Arguments:

Argument	Description
`--extensions EXT [EXT...]`	File extensions to filter by (e.g., `.jpg .png`)
`--recursive`	Scan for files in subdirectories
`--sandbox`	Execute in Sandbox environment (default)
`--production`	Execute in Production environment

Examples:

Run pipeline on all files in directory:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline batch-thumbnail \
  --input-dir /data/images

Run pipeline on specific file types recursively:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline format-conversion \
  --input-dir /data/scans \
  --extensions .tif .tiff \
  --recursive

Use Cases: - Quick batch processing without Zenodo integration - Testing pipelines on local data - Processing files that won't be published - Direct file transformations and analysis

create-version¶

Create new versions of existing published Zenodo records with updated files.

Usage:

hdp create-version --hdpc <PROJECT_FILE> --input-dir <DIRECTORY> \
  --pipeline <PIPELINE_NAME> [OPTIONS]

Required Arguments:

Argument	Description
`--hdpc PATH`	Path to your `.hdpc` project file
`--input-dir PATH`	Directory containing the new or updated source files
`--pipeline NAME`	Name of the pipeline to execute for creating the new version

Optional Arguments:

Argument	Description
`--match-method METHOD`	Method to match files with existing records: `filename` or `hashcode` (default: `filename`)
`--sandbox`	Target Sandbox environment records (default)
`--production`	Target Production environment records

Examples:

Create versions using filename matching:

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/updated_models \
  --pipeline version-update \
  --match-method filename

Create versions using content hash matching:

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/corrected_files \
  --pipeline version-correction \
  --match-method hashcode \
  --production

Workflow: 1. Scans input directory for files 2. Matches files to existing published records using specified method 3. Creates new draft versions for each matched record 4. Executes pipeline on the new versions 5. New versions remain as drafts until published with hdp publish

Matching Methods: - filename: Matches based on original filename (faster, less precise) - hashcode: Matches based on file content hash (slower, more accurate)

Component Management Commands¶

component-list¶

List all available and installed HDP pipeline components.

Usage:

hdp component-list [OPTIONS]

Optional Arguments:

Argument	Description
`--details`	Show detailed information for each component
`--category CATEGORY`	Filter by component category
`--installed-only`	Show only installed components
`--available-only`	Show only available (not installed) components

Examples:

List all components:

hdp component-list

Show detailed component information:

hdp component-list --details

List only installed components:

hdp component-list --installed-only

Filter by category:

hdp component-list --category "Image Processing"

component-install¶

Install one or more HDP pipeline components.

Usage:

hdp component-install {--components NAME [NAME...] | --all} [OPTIONS]

Required Arguments (Mutually Exclusive):

Argument	Description
`--components NAME [NAME...]`	Space-separated list of component names to install
`--all`	Install all available components

Optional Arguments:

Argument	Description
`--skip-install-script`	Skip running component's `install.py` script
`--stop-on-error`	Stop installation process if any component fails

Examples:

Install specific components:

hdp component-install --components image-enhancer metadata-extractor

Install all available components:

hdp component-install --all

Install with error handling:

hdp component-install --components 3d-converter --stop-on-error

Skip installation scripts (dependencies only):

hdp component-install --components custom-processor --skip-install-script

Installation Process: 1. Validates component structure 2. Creates isolated Python virtual environment 3. Installs required dependencies 4. Runs component-specific installation script 5. Registers component in database

component-uninstall¶

Uninstall one or more HDP pipeline components.

Usage:

hdp component-uninstall --components NAME [NAME...]

Required Arguments:

Argument	Description
`--components NAME [NAME...]`	Space-separated list of component names to uninstall

Examples:

Uninstall a single component:

hdp component-uninstall --components old-processor

Uninstall multiple components:

hdp component-uninstall --components deprecated-tool unused-filter legacy-converter

Warning: Uninstallation removes the component's virtual environment, dependencies, and registration. This action cannot be undone.

component-download¶

Download pipeline components from the remote Zenodo repository.

Usage:

hdp component-download [--components NAME [NAME...]] [OPTIONS]

Optional Arguments:

Argument	Description
`--components NAME [NAME...]`	Specific component names to download. If omitted, lists available components
`--install`	Automatically install components after downloading

Examples:

List available remote components:

hdp component-download

Download specific components:

hdp component-download --components advanced-filter custom-pipeline

Download and install automatically:

hdp component-download --components image-processor --install

Workflow: 1. Fetches component catalog from remote repository 2. Downloads specified component packages (ZIP archives) 3. Extracts components to local directory 4. Optionally installs components if --install flag is used

component-info¶

Display comprehensive information about a specific component.

Usage:

hdp component-info <COMPONENT_NAME>

Required Arguments:

Argument	Description
`COMPONENT_NAME`	Name of the component to inspect

Examples:

View component details:

hdp component-info image-enhancer

View 3D processing component:

hdp component-info mesh-optimizer

Output Sections: - Basic Information: Name, label, version, category, description - Authors: Contributors with affiliations and ORCID identifiers - Contact: Email and support information - License: License type and URL - Status: Development status and availability - Tags/Keywords: Searchable metadata tags - Installation Status: Whether installed and installation details - Validation Status: Component validation errors (if any) - Inputs: Required and optional input specifications (first 10) - Outputs: Generated output specifications (first 10) - Parameter Groups: Configurable parameters organized by category - Requirements: Python packages, system dependencies, resource requirements - Execution Settings: Timeout, idempotency, error handling behavior

component-search¶

Search for components by name, description, or keywords.

Usage:

hdp component-search <QUERY> [OPTIONS]

Required Arguments:

Argument	Description
`QUERY`	Search query string (searches name, label, description, tags)

Optional Arguments:

Argument	Description
`--category CATEGORY`	Filter results by component category

Examples:

Search for image processing components:

hdp component-search "image processing"

Search with category filter:

hdp component-search "3d" --category "Mesh Processing"

Search for metadata tools:

hdp component-search metadata

Search Behavior: - Case-insensitive matching - Searches across name, label, description, and tags - Returns both installed and available components - Displays installation status for each result

component-update¶

Check for and optionally install updates for installed components.

Usage:

hdp component-update [OPTIONS]

Optional Arguments:

Argument	Description
`--install-updates`	Automatically install available updates

Examples:

Check for updates:

hdp component-update

Check and install updates automatically:

hdp component-update --install-updates

Update Process: 1. Compares local component versions with remote repository 2. Lists components with available updates 3. Displays version changes and release notes 4. Optionally downloads and installs updates

Output: - Current version → Latest version for each component - Zenodo record URL for release notes - Success/failure status for each update

Server Management¶

hdp-server¶

Start the HDP Flask backend server independently (separate from CLI commands).

Usage:

hdp-server [OPTIONS]

Optional Arguments:

Argument	Description
`--host HOST`	Server host address (default: `127.0.0.1`)
`--port PORT`	Server port number (default: `5001`)
`--config FILE`	Path to configuration file (default: `config.yaml`)
`--alpha-features`	Enable experimental alpha features

Examples:

Start server with defaults:

hdp-server

Start on custom port:

hdp-server --port 8080

Start with specific configuration and alpha features:

hdp-server --config /path/to/custom-config.yaml --alpha-features

Notes: - The CLI automatically starts the server when needed unless --no-auto-server is used - Manual server startup is useful for: - Long-running server sessions - Development and debugging - Custom server configurations - Multiple concurrent CLI operations

Workflows and Examples¶

Complete Upload-to-Publish Workflow¶

Step 1: Create a new project

hdp create --hdpc-path /projects/artifacts.hdpc \
  --project-name "Museum Artifacts 2025" \
  --short-code "MA2025" \
  --modality "3D Model" \
  --input-dir /data/3d_scans \
  --output-dir /data/processed

Step 2: Upload files and create drafts

hdp upload --hdpc /projects/artifacts.hdpc \
  --input-dir /data/3d_scans \
  --extensions .obj .mtl .png \
  --recursive \
  --sandbox

Step 3: Execute processing pipeline

hdp process --hdpc /projects/artifacts.hdpc \
  --pipeline mesh-optimization \
  --sandbox

Step 4: Inspect project status

hdp inspect --hdpc-path /projects/artifacts.hdpc

Step 5: Publish ready drafts

hdp publish --all --hdpc /projects/artifacts.hdpc --sandbox

Component Installation and Usage Workflow¶

Step 1: Search for components

hdp component-search "image enhancement"

Step 2: View component details

hdp component-info image-enhancer

Step 3: Install component

hdp component-install --components image-enhancer

Step 4: Use component in pipeline

hdp process --hdpc myproject.hdpc --pipeline image-enhancement

Versioning Workflow¶

Step 1: Update local files

# User updates files in /data/updated_files/

Step 2: Create new versions

hdp create-version --hdpc myproject.hdpc \
  --input-dir /data/updated_files \
  --pipeline version-update \
  --match-method filename \
  --production

Step 3: Review drafts

hdp inspect --hdpc-path myproject.hdpc

Step 4: Publish new versions

hdp publish --all --hdpc myproject.hdpc --production

Batch Processing Workflow¶

Direct pipeline execution on local files:

hdp run-pipeline --hdpc myproject.hdpc \
  --pipeline thumbnail-generator \
  --input-dir /data/raw_images \
  --extensions .jpg .png \
  --recursive

Environment Variables¶

The following environment variables can be used to configure HDP behavior:

Variable	Description	Default
`HDP_SERVER_HOST`	Default server host	`127.0.0.1`
`HDP_SERVER_PORT`	Default server port	`5001`
`HDP_CONFIG_PATH`	Default configuration file path	`config.yaml`
`HDP_LOG_LEVEL`	Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)	`INFO`

Example:

export HDP_SERVER_PORT=8080
export HDP_LOG_LEVEL=DEBUG
hdp upload --hdpc project.hdpc --input-dir /data/files

Exit Codes¶

The CLI uses standard exit codes to indicate command execution status:

Code	Meaning
`0`	Success
`1`	General error (API failure, invalid arguments, processing error)
`130`	User interruption (Ctrl+C)

Tips and Best Practices¶

1. Always use Sandbox first - Test workflows in Sandbox environment before publishing to Production - Sandbox allows unlimited testing without affecting permanent records

2. Use meaningful project names and short codes - Helps identify projects in logs and databases - Short codes should be unique and descriptive

3. Filter processing strategically - Use --search, --title-pattern, --since, and --until to process specific subsets - Reduces processing time and resource usage

4. Regular project inspection - Run hdp inspect periodically to monitor project status - Use --show-files to verify file structure and bundling

5. Component updates - Regularly check for component updates with hdp component-update - Keep components updated for bug fixes and new features

6. Batch operations - Use run-pipeline for quick batch processing without Zenodo integration - Use upload → process → publish for full archival workflow

7. Error handling - Check exit codes in scripts: if [ $? -ne 0 ]; then echo "Error occurred"; fi - Review logs for detailed error information

Troubleshooting¶

Server connection errors:

# Check if server is running
hdp inspect --hdpc-path test.hdpc

# Start server manually if needed
hdp-server --port 5001

# Use --no-auto-server if managing server separately
hdp upload --hdpc project.hdpc --input-dir /data --no-auto-server

Component installation failures:

# Install with verbose output
hdp component-install --components problematic-component

# Skip installation script if it's causing issues
hdp component-install --components component-name --skip-install-script

File matching issues in versioning: ```bash

Try alternative matching method¶

hdp create-version --hdpc project.hdpc \ --input-dir /data/files \ --pipeline update \ --match-method hashcode # Instead of filename ```****

Version: 0.1.0-alpha.4 Last Updated: October 14, 2025