Heritage Data Processor - Command-Line Interface Documentation¶
Overview¶
The Heritage Data Processor (HDP) provides a comprehensive command-line interface for managing cultural heritage data workflows, including file processing, Zenodo integration, pipeline execution, and component management.
All commands start with hdp (e.g., hdp upload, hdp process). The CLI automatically manages the backend server, starting it when needed unless explicitly disabled with --no-auto-server.
Table of Contents¶
- Heritage Data Processor - Command-Line Interface Documentation
- Overview
- Table of Contents
- Global Options
- Core Workflow Commands
- Component Management Commands
- Server Management
- Workflows and Examples
- Environment Variables
- Exit Codes
- Tips and Best Practices
- Troubleshooting
- Try alternative matching method
Global Options¶
These options can be used with any command:
| Option | Default | Description |
|---|---|---|
--server-host HOST |
127.0.0.1 |
Server host address |
--server-port PORT |
5001 |
Server port number |
--server-config FILE |
config.yaml |
Path to server configuration file |
--no-auto-server |
False | Disable automatic server startup (requires manual hdp-server execution) |
Example:
Core Workflow Commands¶
upload¶
Add files to your project, prepare metadata, and create Zenodo drafts.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
--hdpc PATH |
Path to your .hdpc project file |
--input-dir PATH |
Directory containing files to upload |
Optional Arguments:
| Argument | Description |
|---|---|
--extensions EXT [EXT...] |
File extensions to include (e.g., .jpg .png .obj) |
--recursive |
Scan subdirectories recursively |
--sandbox |
Use Zenodo Sandbox environment (default) |
--production |
Use Zenodo Production environment |
Examples:
Upload all supported files from a directory:
Upload specific file types recursively:
hdp upload --hdpc myproject.hdpc --input-dir /data/3d_models \
--extensions .obj .mtl .png --recursive
Upload to production environment:
Workflow: 1. Loads the specified HDPC project 2. Scans the input directory for matching files 3. Adds files to the project database 4. Prepares metadata for each file 5. Creates Zenodo drafts for all prepared records
process¶
Execute a pipeline on existing Zenodo drafts with optional filtering.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
--hdpc PATH |
Path to your .hdpc project file |
--pipeline NAME |
Name identifier of the pipeline to execute |
Environment Selection:
| Argument | Description |
|---|---|
--sandbox |
Target Sandbox environment drafts (default) |
--production |
Target Production environment drafts |
Filtering Options:
| Argument | Description |
|---|---|
--search TERM |
General search term to filter records by title or filename |
--title-pattern PATTERN |
Pattern to match record titles (use % as wildcard) |
--since DATE |
Filter records created on or after this date (YYYY-MM-DD) |
--until DATE |
Filter records created on or before this date (YYYY-MM-DD) |
Examples:
Process all drafts with a pipeline:
Process only recent drafts:
Process drafts matching a title pattern:
hdp process --hdpc myproject.hdpc --pipeline 3d-optimization \
--title-pattern "Artifact%" --sandbox
Process specific records by search term:
publish¶
Publish ready Zenodo drafts to make them publicly accessible.
Usage:
Required Arguments (Mutually Exclusive):
| Argument | Description |
|---|---|
--all |
Publish ALL ready drafts (all files uploaded) |
--record-ids ID [ID...] |
Space-separated list of specific local record database IDs to publish |
Other Required:
| Argument | Description |
|---|---|
--hdpc PATH |
Path to your .hdpc project file |
Environment Selection:
| Argument | Description |
|---|---|
--sandbox |
Target Sandbox environment (default) |
--production |
Target Production environment |
Examples:
Publish all ready drafts:
Publish specific records:
Publish to production:
Safety Features: - Prompts for confirmation before publishing - Only publishes drafts with all files successfully uploaded - Provides detailed success/failure report for each record
create¶
Initialize a new HDPC project with automatic file scanning and configuration.
Usage:
hdp create --hdpc-path <PATH> --project-name <NAME> --short-code <CODE> \
--modality <TYPE> --input-dir <DIR> --output-dir <DIR> [OPTIONS]
Required Arguments:
| Argument | Description |
|---|---|
--hdpc-path PATH |
Path where the .hdpc project file will be created |
--project-name NAME |
Descriptive name for the project |
--short-code CODE |
Short unique code for the project (e.g., MyProject2025) |
--modality TYPE |
Primary data modality (see choices below) |
--input-dir PATH |
Directory containing source data files to scan |
--output-dir PATH |
Directory where processed outputs will be stored |
Modality Choices:
- 3D Model
- Image / Photography
- Audio
- Video
- Text / Document
File Scanning Options:
| Argument | Description |
|---|---|
--extensions EXT [EXT...] |
File extensions to scan (e.g., .obj .mtl .png). If omitted, uses modality defaults |
--primary-ext EXT |
Primary source file extension (e.g., .obj for 3D models) |
--batch-entity MODE |
File grouping mode: root, subdirectory, or hybrid (default: root) |
Bundling Options:
| Argument | Description |
|---|---|
--enable-bundling |
Enable automatic file bundling (default: enabled) |
--no-bundling |
Disable automatic file bundling |
--bundling-strategy TYPE |
Strategy: smart (recommended), strict, or loose (default: smart) |
3D Model Specific Options:
| Argument | Description |
|---|---|
--add-mtl |
Scan for and include MTL files with OBJ files (default: enabled) |
--no-add-mtl |
Do not scan for MTL files |
--add-textures |
Scan for and include texture files (default: enabled) |
--no-add-textures |
Do not scan for texture files |
--archive-textures |
Archive texture subdirectories into ZIP files |
--texture-paths PATH [PATH...] |
Additional directories to search for texture files |
Examples:
Create a basic 3D model project:
hdp create --hdpc-path /projects/museum.hdpc \
--project-name "Museum Artifacts 2025" \
--short-code "MA2025" \
--modality "3D Model" \
--input-dir /data/3d_scans \
--output-dir /data/processed
Create with specific file types and bundling:
hdp create --hdpc-path /projects/photos.hdpc \
--project-name "Archaeological Photography" \
--short-code "ArchPhoto2025" \
--modality "Image / Photography" \
--input-dir /data/photos \
--output-dir /data/output \
--extensions .jpg .png .tif \
--batch-entity subdirectory \
--bundling-strategy strict
Create 3D project with texture handling:
hdp create --hdpc-path /projects/sculptures.hdpc \
--project-name "Digital Sculptures" \
--short-code "DS2025" \
--modality "3D Model" \
--input-dir /data/models \
--output-dir /data/results \
--add-textures \
--archive-textures \
--texture-paths /data/textures /data/materials
inspect¶
Analyze and display comprehensive information about an HDPC project.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
--hdpc-path PATH |
Path to the .hdpc project file to inspect |
Optional Arguments:
| Argument | Description |
|---|---|
--show-files |
Display complete file tree with bundles and processing states (verbose) |
Examples:
Basic project inspection:
Detailed inspection with file tree:
Output Sections:
- Project Information: Name, ID, version, timestamps
- Configuration: Key-value configuration pairs
- File Statistics: Total files, sizes, types
- File Status Breakdown: Counts by processing status
- File Type Breakdown: Distribution of file types
- MIME Type Breakdown: Top 10 MIME types
- Zenodo Statistics: Draft/published record counts
- Metadata Mappings: Configured mapping templates
- Batches: Batch processing information
- File Tree: Hierarchical file structure (if --show-files enabled)
run-pipeline¶
Execute a pipeline directly on local files without creating Zenodo drafts first.
Usage:
hdp run-pipeline --hdpc <PROJECT_FILE> --pipeline <PIPELINE_NAME> \
--input-dir <DIRECTORY> [OPTIONS]
Required Arguments:
| Argument | Description |
|---|---|
--hdpc PATH |
Path to your .hdpc project file |
--pipeline NAME |
Name identifier of the pipeline to execute |
--input-dir PATH |
Directory containing the input files to process |
Optional Arguments:
| Argument | Description |
|---|---|
--extensions EXT [EXT...] |
File extensions to filter by (e.g., .jpg .png) |
--recursive |
Scan for files in subdirectories |
--sandbox |
Execute in Sandbox environment (default) |
--production |
Execute in Production environment |
Examples:
Run pipeline on all files in directory:
Run pipeline on specific file types recursively:
hdp run-pipeline --hdpc myproject.hdpc \
--pipeline format-conversion \
--input-dir /data/scans \
--extensions .tif .tiff \
--recursive
Use Cases: - Quick batch processing without Zenodo integration - Testing pipelines on local data - Processing files that won't be published - Direct file transformations and analysis
create-version¶
Create new versions of existing published Zenodo records with updated files.
Usage:
hdp create-version --hdpc <PROJECT_FILE> --input-dir <DIRECTORY> \
--pipeline <PIPELINE_NAME> [OPTIONS]
Required Arguments:
| Argument | Description |
|---|---|
--hdpc PATH |
Path to your .hdpc project file |
--input-dir PATH |
Directory containing the new or updated source files |
--pipeline NAME |
Name of the pipeline to execute for creating the new version |
Optional Arguments:
| Argument | Description |
|---|---|
--match-method METHOD |
Method to match files with existing records: filename or hashcode (default: filename) |
--sandbox |
Target Sandbox environment records (default) |
--production |
Target Production environment records |
Examples:
Create versions using filename matching:
hdp create-version --hdpc myproject.hdpc \
--input-dir /data/updated_models \
--pipeline version-update \
--match-method filename
Create versions using content hash matching:
hdp create-version --hdpc myproject.hdpc \
--input-dir /data/corrected_files \
--pipeline version-correction \
--match-method hashcode \
--production
Workflow:
1. Scans input directory for files
2. Matches files to existing published records using specified method
3. Creates new draft versions for each matched record
4. Executes pipeline on the new versions
5. New versions remain as drafts until published with hdp publish
Matching Methods: - filename: Matches based on original filename (faster, less precise) - hashcode: Matches based on file content hash (slower, more accurate)
Component Management Commands¶
component-list¶
List all available and installed HDP pipeline components.
Usage:
Optional Arguments:
| Argument | Description |
|---|---|
--details |
Show detailed information for each component |
--category CATEGORY |
Filter by component category |
--installed-only |
Show only installed components |
--available-only |
Show only available (not installed) components |
Examples:
List all components:
Show detailed component information:
List only installed components:
Filter by category:
component-install¶
Install one or more HDP pipeline components.
Usage:
Required Arguments (Mutually Exclusive):
| Argument | Description |
|---|---|
--components NAME [NAME...] |
Space-separated list of component names to install |
--all |
Install all available components |
Optional Arguments:
| Argument | Description |
|---|---|
--skip-install-script |
Skip running component's install.py script |
--stop-on-error |
Stop installation process if any component fails |
Examples:
Install specific components:
Install all available components:
Install with error handling:
Skip installation scripts (dependencies only):
Installation Process: 1. Validates component structure 2. Creates isolated Python virtual environment 3. Installs required dependencies 4. Runs component-specific installation script 5. Registers component in database
component-uninstall¶
Uninstall one or more HDP pipeline components.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
--components NAME [NAME...] |
Space-separated list of component names to uninstall |
Examples:
Uninstall a single component:
Uninstall multiple components:
Warning: Uninstallation removes the component's virtual environment, dependencies, and registration. This action cannot be undone.
component-download¶
Download pipeline components from the remote Zenodo repository.
Usage:
Optional Arguments:
| Argument | Description |
|---|---|
--components NAME [NAME...] |
Specific component names to download. If omitted, lists available components |
--install |
Automatically install components after downloading |
Examples:
List available remote components:
Download specific components:
Download and install automatically:
Workflow:
1. Fetches component catalog from remote repository
2. Downloads specified component packages (ZIP archives)
3. Extracts components to local directory
4. Optionally installs components if --install flag is used
component-info¶
Display comprehensive information about a specific component.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
COMPONENT_NAME |
Name of the component to inspect |
Examples:
View component details:
View 3D processing component:
Output Sections: - Basic Information: Name, label, version, category, description - Authors: Contributors with affiliations and ORCID identifiers - Contact: Email and support information - License: License type and URL - Status: Development status and availability - Tags/Keywords: Searchable metadata tags - Installation Status: Whether installed and installation details - Validation Status: Component validation errors (if any) - Inputs: Required and optional input specifications (first 10) - Outputs: Generated output specifications (first 10) - Parameter Groups: Configurable parameters organized by category - Requirements: Python packages, system dependencies, resource requirements - Execution Settings: Timeout, idempotency, error handling behavior
component-search¶
Search for components by name, description, or keywords.
Usage:
Required Arguments:
| Argument | Description |
|---|---|
QUERY |
Search query string (searches name, label, description, tags) |
Optional Arguments:
| Argument | Description |
|---|---|
--category CATEGORY |
Filter results by component category |
Examples:
Search for image processing components:
Search with category filter:
Search for metadata tools:
Search Behavior: - Case-insensitive matching - Searches across name, label, description, and tags - Returns both installed and available components - Displays installation status for each result
component-update¶
Check for and optionally install updates for installed components.
Usage:
Optional Arguments:
| Argument | Description |
|---|---|
--install-updates |
Automatically install available updates |
Examples:
Check for updates:
Check and install updates automatically:
Update Process: 1. Compares local component versions with remote repository 2. Lists components with available updates 3. Displays version changes and release notes 4. Optionally downloads and installs updates
Output: - Current version → Latest version for each component - Zenodo record URL for release notes - Success/failure status for each update
Server Management¶
hdp-server¶
Start the HDP Flask backend server independently (separate from CLI commands).
Usage:
Optional Arguments:
| Argument | Description |
|---|---|
--host HOST |
Server host address (default: 127.0.0.1) |
--port PORT |
Server port number (default: 5001) |
--config FILE |
Path to configuration file (default: config.yaml) |
--alpha-features |
Enable experimental alpha features |
Examples:
Start server with defaults:
Start on custom port:
Start with specific configuration and alpha features:
Notes:
- The CLI automatically starts the server when needed unless --no-auto-server is used
- Manual server startup is useful for:
- Long-running server sessions
- Development and debugging
- Custom server configurations
- Multiple concurrent CLI operations
Workflows and Examples¶
Complete Upload-to-Publish Workflow¶
Step 1: Create a new project
hdp create --hdpc-path /projects/artifacts.hdpc \
--project-name "Museum Artifacts 2025" \
--short-code "MA2025" \
--modality "3D Model" \
--input-dir /data/3d_scans \
--output-dir /data/processed
Step 2: Upload files and create drafts
hdp upload --hdpc /projects/artifacts.hdpc \
--input-dir /data/3d_scans \
--extensions .obj .mtl .png \
--recursive \
--sandbox
Step 3: Execute processing pipeline
Step 4: Inspect project status
Step 5: Publish ready drafts
Component Installation and Usage Workflow¶
Step 1: Search for components
Step 2: View component details
Step 3: Install component
Step 4: Use component in pipeline
Versioning Workflow¶
Step 1: Update local files
Step 2: Create new versions
hdp create-version --hdpc myproject.hdpc \
--input-dir /data/updated_files \
--pipeline version-update \
--match-method filename \
--production
Step 3: Review drafts
Step 4: Publish new versions
Batch Processing Workflow¶
Direct pipeline execution on local files:
hdp run-pipeline --hdpc myproject.hdpc \
--pipeline thumbnail-generator \
--input-dir /data/raw_images \
--extensions .jpg .png \
--recursive
Environment Variables¶
The following environment variables can be used to configure HDP behavior:
| Variable | Description | Default |
|---|---|---|
HDP_SERVER_HOST |
Default server host | 127.0.0.1 |
HDP_SERVER_PORT |
Default server port | 5001 |
HDP_CONFIG_PATH |
Default configuration file path | config.yaml |
HDP_LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) |
INFO |
Example:
export HDP_SERVER_PORT=8080
export HDP_LOG_LEVEL=DEBUG
hdp upload --hdpc project.hdpc --input-dir /data/files
Exit Codes¶
The CLI uses standard exit codes to indicate command execution status:
| Code | Meaning |
|---|---|
0 |
Success |
1 |
General error (API failure, invalid arguments, processing error) |
130 |
User interruption (Ctrl+C) |
Tips and Best Practices¶
1. Always use Sandbox first - Test workflows in Sandbox environment before publishing to Production - Sandbox allows unlimited testing without affecting permanent records
2. Use meaningful project names and short codes - Helps identify projects in logs and databases - Short codes should be unique and descriptive
3. Filter processing strategically
- Use --search, --title-pattern, --since, and --until to process specific subsets
- Reduces processing time and resource usage
4. Regular project inspection
- Run hdp inspect periodically to monitor project status
- Use --show-files to verify file structure and bundling
5. Component updates
- Regularly check for component updates with hdp component-update
- Keep components updated for bug fixes and new features
6. Batch operations
- Use run-pipeline for quick batch processing without Zenodo integration
- Use upload → process → publish for full archival workflow
7. Error handling
- Check exit codes in scripts: if [ $? -ne 0 ]; then echo "Error occurred"; fi
- Review logs for detailed error information
Troubleshooting¶
Server connection errors:
# Check if server is running
hdp inspect --hdpc-path test.hdpc
# Start server manually if needed
hdp-server --port 5001
# Use --no-auto-server if managing server separately
hdp upload --hdpc project.hdpc --input-dir /data --no-auto-server
Component installation failures:
# Install with verbose output
hdp component-install --components problematic-component
# Skip installation script if it's causing issues
hdp component-install --components component-name --skip-install-script
File matching issues in versioning: ```bash
Try alternative matching method¶
hdp create-version --hdpc project.hdpc \ --input-dir /data/files \ --pipeline update \ --match-method hashcode # Instead of filename ```****
Version: 0.1.0-alpha.4 Last Updated: October 14, 2025