Skip to content

Heritage Data Processor CLI: create Command Guide

Overview

The create command is the primary CLI tool for initializing new Heritage Data Processor (HDP) projects. It performs a complete, atomic project setup in a single operation, creating the .hdpc database file, scanning source files, validating them, and preparing them for further processing.

This guide focuses on 3D model workflows, which is the most common use case for heritage digitization projects.


Table of Contents

  1. Basic Concepts
  2. Command Syntax
  3. Required Arguments
  4. Batch Entity Modes
  5. Bundling Strategies
  6. 3D Model-Specific Options
  7. Practical Examples
  8. Troubleshooting

Basic Concepts

What is a .hdpc Project?

An .hdpc file is a SQLite database that contains: - Project metadata: Name, short code, timestamps - File inventory: Scanned source files with hierarchical relationships - Configuration: Paths, modality templates, scan options - Metadata mappings: Field mappings for publication workflows

Modality Templates

Modality templates are predefined configurations that specify: - Which file extensions to scan initially - Default scan behaviors for that data type - Common validation rules

For 3D models, the default modality is "3D Model", which includes extensions like .obj, .mtl, .glb, .gltf, .fbx, .ply, and .stl.

Batch Entity Modes

The batch entity mode determines how files are grouped into records (publishable units):

Mode Behavior Use Case
root Each file in the root directory becomes a separate record Individual artifacts with no subfolder organization
subdirectory Each subdirectory becomes one record, containing all its files Collections organized by artifact/site in folders
hybrid Combines both: root files as separate records + subdirectories as grouped records Mixed datasets with both standalone files and organized collections

Command Syntax

python main.py create \\
  --hdpc-path <path_to_hdpc_file> \\
  --project-name <descriptive_name> \\
  --short-code <unique_code> \\
  --input-dir <source_directory> \\
  --output-dir <output_directory> \\
  [OPTIONS]

Required Arguments

--hdpc-path

Path where the .hdpc project file will be created.

  • Must end with .hdpc extension
  • Parent directory must exist
  • File will be created by the command (must not already exist)

Example:

--hdpc-path ./projects/museum_collection.hdpc


--project-name

Descriptive human-readable name for the project.

  • Can contain spaces and special characters
  • Will be used in reports and publication metadata
  • Should be meaningful and descriptive

Example:

--project-name "Ancient Greek Pottery Collection 2025"


--short-code

Short unique identifier for the project.

  • Typically alphanumeric, no spaces
  • Used for internal references and file naming
  • Should be concise (e.g., institution code + year)

Example:

--short-code "AGPC2025"


--input-dir

Directory containing the source data files to scan.

  • Must be an existing directory
  • All files matching the modality extensions will be scanned
  • Subdirectories are scanned based on --batch-entity mode

Example:

--input-dir ./test_data/root_mode_examples/stem_bundling


--output-dir

Directory where processed outputs will be stored.

  • Will be created if it doesn't exist
  • Used for Zenodo uploads, derivatives, and exports
  • Should be separate from input directory

Example:

--output-dir ./output/processed_collections


Batch Entity Modes

Mode 1: root (Default)

Behavior: Each file in the root level of --input-dir becomes a separate record.

When to use: - Individual artifact scans stored as separate files - No subfolder organization - Each 3D model represents a distinct publishable item

Example directory structure:

test_data/root_mode_examples/no_bundling/
├── artifact_photo.png          → Record 1
├── building_scan.glb           → Record 2
├── statue_model.obj            → Record 3
└── terrain_data.fbx            → Record 4

Command:

python main.py create \\
  --hdpc-path ./projects/individual_artifacts.hdpc \\
  --project-name "Individual Artifacts" \\
  --short-code "INDART2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/individual \\
  --batch-entity root

Result: 4 separate records, one per file.


Mode 2: subdirectory

Behavior: Each subdirectory in --input-dir becomes one record, containing all files within it.

When to use: - Files are pre-organized into folders by artifact/site - Each folder represents one publishable unit - Multiple related files (OBJ + MTL + textures) belong together

Example directory structure:

test_data/subdirectory_mode_examples/
├── archaelogical_site_001/        → Record 1
│   ├── excavation_photo_001.png
│   ├── excavation_photo_002.png
│   ├── site_overview.obj
│   ├── site_overview.mtl
│   └── textures/
│       ├── stone_texture.jpg
│       └── wood_normal.jpg
├── archaelogical_site_002/        → Record 2
│   ├── artifact_scan.glb
│   ├── context_photo.png
│   └── documentation.pdf
└── museum_collection_item_045/    → Record 3
    ├── detail_scan_001.obj
    ├── detail_scan_001.mtl
    ├── main_model.fbx
    └── reference_images/
        ├── front_view.jpg
        └── side_view.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/site_collections.hdpc \\
  --project-name "Archaeological Site Collections" \\
  --short-code "ASC2025" \\
  --input-dir ./test_data/subdirectory_mode_examples \\
  --output-dir ./output/sites \\
  --batch-entity subdirectory

Result: 3 records (one per subdirectory), each containing multiple files.


Mode 3: hybrid

Behavior: Combines both approaches: - Files in the root directory become individual records - Each subdirectory becomes one grouped record

When to use: - Mixed dataset with both standalone artifacts and collections - Flexibility for differently organized data

Example directory structure:

test_data/hybrid_mode_examples/
├── standalone_artifact_001.obj    → Record 1 (standalone)
├── standalone_artifact_001.mtl
├── standalone_reference_photo.png → Record 2 (standalone)
├── excavation_batch_alpha/        → Record 3 (grouped)
│   ├── fragment_001.obj
│   ├── fragment_002.obj
│   ├── fragment_003.obj
│   └── shared_textures/
│       ├── clay_texture.jpg
│       └── weathering_normal.jpg
└── excavation_batch_beta/         → Record 4 (grouped)
    ├── complete_vessel.glb
    ├── vessel_fragments.fbx
    └── documentation.txt

Command:

python main.py create \\
  --hdpc-path ./projects/mixed_collection.hdpc \\
  --project-name "Mixed Artifact Collection" \\
  --short-code "MAC2025" \\
  --input-dir ./test_data/hybrid_mode_examples \\
  --output-dir ./output/mixed \\
  --batch-entity hybrid

Result: 4 records: 2 standalone + 2 grouped subdirectories.


Bundling Strategies

Bundling determines how related files are grouped together within a record. This is especially important for 3D models, where one logical artifact might consist of multiple files (e.g., model.obj + model.mtl + textures).

Key Concepts

  • Primary Source File: The main file (e.g., .obj file)
  • Associated Files: Supporting files (MTL, textures, etc.)
  • Bundling Strategy: The rule for determining which files belong together

Strategy 1: stem (Default)

Rule: Files with the exact same base filename (stem) are bundled together.

Example:

temple_column.obj       ← Primary
temple_column.mtl       ← Associated (same stem)
temple_column.jpg       ← Associated (same stem)
ceramic_bowl.fbx        ← Separate bundle

When to use: - Standard naming convention where related files share the same name - Most common scenario for 3D models exported from software

Command:

python main.py create \\
  --hdpc-path ./projects/stem_bundle.hdpc \\
  --project-name "Stem Bundling Example" \\
  --short-code "STEM2025" \\
  --input-dir ./test_data/root_mode_examples/stem_bundling \\
  --output-dir ./output/stem \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy stem

Result: Files grouped by identical stems.


Strategy 2: pattern

Rule: Files matching a regex pattern are bundled together by extracting a common identifier.

Example pattern: (.+?)_Section_[A-Z] matches files like:

Athens_Temple_Section_A.obj    → Bundle: "Athens_Temple"
Athens_Temple_Section_B.obj    → Bundle: "Athens_Temple"
Athens_Temple_materials.mtl    → Bundle: "Athens_Temple"
Rome_Forum_Column_01.obj       → Bundle: "Rome_Forum"
Rome_Forum_Column_02.obj       → Bundle: "Rome_Forum"
Rome_Forum_materials.mtl       → Bundle: "Rome_Forum"

When to use: - Complex naming schemes with prefixes/suffixes - Multi-part models (e.g., large building sections) - Need custom grouping logic

Command:

python main.py create \\
  --hdpc-path ./projects/pattern_bundle.hdpc \\
  --project-name "Pattern Bundling Example" \\
  --short-code "PATT2025" \\
  --input-dir ./test_data/bundling_strategies/pattern_matching \\
  --output-dir ./output/pattern \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_(Section|Column)_[A-Z0-9]+"

Result: Files grouped by extracted identifier from regex pattern.


Strategy 3: prefix_suffix

Rule: Remove specified prefix and/or suffix patterns from filenames, then bundle by resulting core name.

Example with prefix v\\d+_ and suffix _(hiRes|lowRes):

v1_pottery_fragment.obj       → Core: "pottery_fragment"
v2_pottery_fragment.obj       → Core: "pottery_fragment"
v3_pottery_fragment.obj       → Core: "pottery_fragment"
pottery_fragment.mtl          → Core: "pottery_fragment"

n001_vase_hiRes.obj           → Core: "vase"
n001_vase_lowRes.obj          → Core: "vase"
n001_vase.mtl                 → Core: "vase"

When to use: - Version numbers at the start of filenames - Resolution indicators (hiRes/lowRes) - Inventory prefixes (n001_, n002_)

Command (prefix removal):

python main.py create \\
  --hdpc-path ./projects/prefix_bundle.hdpc \\
  --project-name "Prefix Removal Example" \\
  --short-code "PREF2025" \\
  --input-dir ./test_data/bundling_strategies/prefix_removal \\
  --output-dir ./output/prefix \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-prefix "v\\d+_"

Command (suffix removal):

python main.py create \\
  --hdpc-path ./projects/suffix_bundle.hdpc \\
  --project-name "Suffix Removal Example" \\
  --short-code "SUFF2025" \\
  --input-dir ./test_data/bundling_strategies/suffix_removal \\
  --output-dir ./output/suffix \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-suffix "_(edge|obverse|reverse)_scan"

Result: Files grouped after removing prefix/suffix patterns.


Strategy 4: core_identifier

Rule: Extract a core identifier from filenames using a specific pattern (e.g., site042, site093), ignoring descriptive suffixes.

Example:

site042_structure_detail.obj   → Bundle: "site042"
site042_structure_main.fbx     → Bundle: "site042"
site093_artifact_complete.glb  → Bundle: "site093"
site093_artifact_fragment.obj  → Bundle: "site093"

When to use: - Site/excavation number prefixes - Catalog identifiers embedded in filenames - Need to group by administrative code

Command:

python main.py create \\
  --hdpc-path ./projects/core_id_bundle.hdpc \\
  --project-name "Core Identifier Example" \\
  --short-code "CORE2025" \\
  --input-dir ./test_data/bundling_strategies/core_identifier \\
  --output-dir ./output/core_id \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy core_identifier

Result: Files grouped by extracted core identifier (e.g., site042).


3D Model-Specific Options

--add-mtl / --no-add-mtl

Purpose: Automatically scan for and include .mtl (material) files associated with .obj files.

Default: True (enabled)

Behavior: - When an .obj file is found, the scanner looks for a corresponding .mtl file with the same stem - Example: roman_statue.obj → scans for roman_statue.mtl

When to disable: - OBJ files don't use materials - MTL files are stored separately or managed differently

Command (enabled):

python main.py create \\
  --hdpc-path ./projects/with_mtl.hdpc \\
  --project-name "OBJ with Materials" \\
  --short-code "MTL2025" \\
  --input-dir ./test_data/complex_dependencies/obj_with_dependencies \\
  --output-dir ./output/with_mtl \\
  --add-mtl

Command (disabled):

python main.py create \\
  --hdpc-path ./projects/no_mtl.hdpc \\
  --project-name "OBJ without Materials" \\
  --short-code "NOMTL2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/no_mtl \\
  --no-add-mtl


--add-textures / --no-add-textures

Purpose: Automatically scan for and include texture image files referenced in .mtl files.

Default: True (enabled)

Behavior: - Parses .mtl files to extract texture references (e.g., map_Kd marble_diffuse.jpg) - Searches for these texture files in: - Same directory as the MTL file - Subdirectories (e.g., textures/, Materials/) - Additional paths specified with --texture-paths

Supported texture maps: - Diffuse: map_Kd - Specular: map_Ks - Normal: map_Bump, bump - Roughness: map_Ns - Ambient: map_Ka

When to disable: - Textures are stored in a separate archive - Texture files are too large or not needed for publication

Command (enabled with custom search paths):

python main.py create \\
  --hdpc-path ./projects/with_textures.hdpc \\
  --project-name "OBJ with Textures" \\
  --short-code "TEX2025" \\
  --input-dir ./test_data/complex_dependencies/obj_with_dependencies \\
  --output-dir ./output/with_textures \\
  --add-textures \\
  --texture-paths ./additional_textures ./shared_materials


--archive-textures

Purpose: Archive texture subdirectories into ZIP files for more efficient storage and upload.

Default: False (disabled)

Behavior: - When a texture subdirectory is detected (e.g., textures/, Materials/), it's compressed into a .zip archive - The archive is included as a child file in the hierarchy - Original texture files are still tracked but archived

When to use: - Large texture collections (many small files) - Zenodo uploads (fewer files = faster uploads) - Organized texture folders

Example directory:

archaelogical_site_001/
├── site_overview.obj
├── site_overview.mtl
└── textures/              ← Will be archived to textures.zip
    ├── stone_texture.jpg
    ├── wood_normal.jpg
    └── roof.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/archived_textures.hdpc \\
  --project-name "Archived Textures Example" \\
  --short-code "ARCH2025" \\
  --input-dir ./test_data/subdirectory_mode_examples/archaelogical_site_001 \\
  --output-dir ./output/archived \\
  --batch-entity subdirectory \\
  --add-textures \\
  --archive-textures

Result: textures/ directory archived to textures.zip in the file hierarchy.


--texture-paths

Purpose: Specify additional directories to search for texture files.

Default: None (only searches in model directory)

Behavior: - When texture files are not found in the default locations, the scanner searches these additional paths - Useful for shared texture libraries or centralized material repositories

Command:

python main.py create \\
  --hdpc-path ./projects/shared_textures.hdpc \\
  --project-name "Shared Texture Library" \\
  --short-code "SHARE2025" \\
  --input-dir ./models \\
  --output-dir ./output/shared \\
  --add-textures \\
  --texture-paths ./common_textures ./material_library


Practical Examples

Example 1: Simple Individual Artifacts (Root Mode)

Scenario: 4 individual artifact scans, each as separate files, no bundling needed.

Directory:

root_mode_examples/no_bundling/
├── artifact_photo.png
├── building_scan.glb
├── statue_model.obj
└── terrain_data.fbx

Command:

python main.py create \\
  --hdpc-path ./projects/simple_artifacts.hdpc \\
  --project-name "Simple Individual Artifacts" \\
  --short-code "SIA2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/simple_artifacts \\
  --batch-entity root

Expected Result: - 4 separate records created - Each file becomes its own publishable unit - No bundling or complex dependencies


Example 2: OBJ Models with MTL and Textures (Stem Bundling)

Scenario: Standard 3D model workflow with OBJ files, their MTL materials, and texture images.

Directory:

root_mode_examples/stem_bundling/
├── temple_column.obj
├── temple_column.mtl
├── temple_column.jpg
├── ceramic_bowl.fbx
└── ancient_tablet.glb

Command:

python main.py create \\
  --hdpc-path ./projects/obj_with_materials.hdpc \\
  --project-name "OBJ Models with Materials" \\
  --short-code "OBJMAT2025" \\
  --input-dir ./test_data/root_mode_examples/stem_bundling \\
  --output-dir ./output/obj_materials \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy stem \\
  --add-mtl \\
  --add-textures

Expected Result: - Record 1: temple_column (OBJ + MTL + JPG bundled together) - Record 2: ceramic_bowl (FBX standalone) - Record 3: ancient_tablet (GLB standalone)


Example 3: Complex OBJ with Multiple Textures (Subdirectory Mode)

Scenario: One artifact per subdirectory, with texture files in a separate subfolder.

Directory:

subdirectory_mode_examples/archaelogical_site_001/
├── site_overview.obj
├── site_overview.mtl
├── excavation_photo_001.png
├── excavation_photo_002.png
└── textures/
    ├── stone_texture.jpg
    ├── wood_normal.jpg
    └── roof.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/complex_site.hdpc \\
  --project-name "Archaeological Site with Textures" \\
  --short-code "ASTEXT2025" \\
  --input-dir ./test_data/subdirectory_mode_examples \\
  --output-dir ./output/complex_site \\
  --batch-entity subdirectory \\
  --add-mtl \\
  --add-textures \\
  --archive-textures

Expected Result: - Record 1: archaelogical_site_001 containing: - Primary: site_overview.obj - Associated: site_overview.mtl - Associated: excavation_photo_001.png, excavation_photo_002.png - Archived: textures.zip (containing all 3 texture files)


Example 4: Pattern Bundling for Multi-Part Models

Scenario: Large building model split into multiple sections (Section A, Section B), sharing common materials.

Directory:

bundling_strategies/pattern_matching/
├── Athens_Temple_Section_A.obj
├── Athens_Temple_Section_B.obj
├── Athens_Temple_materials.mtl
├── Rome_Forum_Column_01.obj
├── Rome_Forum_Column_02.obj
└── Rome_Forum_materials.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/multipart_models.hdpc \\
  --project-name "Multi-Part Building Models" \\
  --short-code "MULTI2025" \\
  --input-dir ./test_data/bundling_strategies/pattern_matching \\
  --output-dir ./output/multipart \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_(Section|Column)_[A-Z0-9]+"

Expected Result: - Record 1: Athens_Temple (Section A OBJ + Section B OBJ + materials MTL) - Record 2: Rome_Forum (Column 01 OBJ + Column 02 OBJ + materials MTL)


Example 5: Version Control (Prefix Removal)

Scenario: Multiple scan versions of the same artifact, with version prefixes v1_, v2_, v3_.

Directory:

bundling_strategies/prefix_removal/
├── v1_pottery_fragment.obj
├── v2_pottery_fragment.obj
├── v3_pottery_fragment.obj
└── pottery_fragment.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/versioned_scans.hdpc \\
  --project-name "Versioned Pottery Scans" \\
  --short-code "VERS2025" \\
  --input-dir ./test_data/bundling_strategies/prefix_removal \\
  --output-dir ./output/versioned \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-prefix "v\\d+_"

Expected Result: - Record 1: pottery_fragment (all 3 OBJ versions + shared MTL bundled together)


Example 6: High/Low Resolution Models (Suffix Removal)

Scenario: Each artifact has high-resolution and low-resolution variants.

Directory:

bundling_strategies/suffix_removal/
├── coin_obverse_scan.obj
├── coin_reverse_scan.obj
├── coin_edge_scan.obj
└── coin.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/multiview_scans.hdpc \\
  --project-name "Multi-View Coin Scans" \\
  --short-code "COIN2025" \\
  --input-dir ./test_data/bundling_strategies/suffix_removal \\
  --output-dir ./output/multiview \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-suffix "_(obverse|reverse|edge)_scan"

Expected Result: - Record 1: coin (obverse + reverse + edge OBJ scans + shared MTL)


Example 7: Hybrid Mode (Mixed Organization)

Scenario: Root-level standalone artifacts + subdirectories for grouped collections.

Directory:

hybrid_mode_examples/
├── standalone_artifact_001.obj
├── standalone_artifact_001.mtl
├── standalone_reference_photo.png
├── excavation_batch_alpha/
│   ├── fragment_001.obj
│   ├── fragment_002.obj
│   ├── fragment_003.obj
│   └── shared_textures/
│       ├── clay_texture.jpg
│       └── weathering_normal.jpg
└── excavation_batch_beta/
    ├── complete_vessel.glb
    ├── vessel_fragments.fbx
    └── documentation.txt

Command:

python main.py create \\
  --hdpc-path ./projects/hybrid_collection.hdpc \\
  --project-name "Hybrid Artifact Collection" \\
  --short-code "HYB2025" \\
  --input-dir ./test_data/hybrid_mode_examples \\
  --output-dir ./output/hybrid \\
  --batch-entity hybrid \\
  --enable-bundling \\
  --bundling-strategy stem \\
  --add-mtl \\
  --add-textures \\
  --archive-textures

Expected Result: - Record 1: standalone_artifact_001 (OBJ + MTL bundled) - Record 2: standalone_reference_photo (standalone PNG) - Record 3: excavation_batch_alpha (3 OBJ fragments + archived textures) - Record 4: excavation_batch_beta (GLB + FBX + TXT)


Troubleshooting

Issue: "No files found matching extensions"

Cause: The specified extensions don't match any files in the input directory.

Solution: 1. Verify --input-dir path is correct 2. Check that files have the expected extensions (e.g., .obj, .glb) 3. Manually specify extensions with --extensions .obj .mtl .png

Example:

python main.py create \\
  --hdpc-path ./projects/custom_ext.hdpc \\
  --project-name "Custom Extensions" \\
  --short-code "CUST2025" \\
  --input-dir ./my_data \\
  --output-dir ./output \\
  --extensions .obj .mtl .fbx .glb .png .jpg


Issue: "MTL file not found for OBJ"

Cause: OBJ file exists but corresponding MTL file is missing.

Solution: 1. Verify MTL file has the same stem as OBJ (e.g., model.objmodel.mtl) 2. Check if MTL file is in a different directory 3. Use --no-add-mtl if materials are not needed


Issue: "Texture files not found"

Cause: MTL file references textures that don't exist or are in a different location.

Solution: 1. Check MTL file contents for texture paths 2. Ensure texture files are in the same directory or subdirectory 3. Use --texture-paths to specify additional search locations:

python main.py create \\
  --hdpc-path ./projects/missing_tex.hdpc \\
  --project-name "Missing Textures Fix" \\
  --short-code "TEX2025" \\
  --input-dir ./models \\
  --output-dir ./output \\
  --add-textures \\
  --texture-paths ./external_textures ./shared_materials

Issue: "Files not bundled correctly"

Cause: Bundling strategy doesn't match the filename convention.

Solution: 1. Verify filenames match the expected pattern 2. Try different bundling strategies: - stem: Exact filename match - pattern: Custom regex pattern - prefix_suffix: Remove prefixes/suffixes 3. Test pattern with --bundling-pattern flag

Example (debugging pattern):

# Test pattern bundling with verbose output
python main.py create \\
  --hdpc-path ./projects/test_pattern.hdpc \\
  --project-name "Test Pattern Bundling" \\
  --short-code "TESTPATT" \\
  --input-dir ./test_data \\
  --output-dir ./output \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_Section_[A-Z]"


Issue: "Too many small files in record"

Cause: Large texture directories creating many individual file entries.

Solution: Use --archive-textures to compress texture folders:

python main.py create \\
  --hdpc-path ./projects/archived.hdpc \\
  --project-name "Archived Texture Folders" \\
  --short-code "ARCH2025" \\
  --input-dir ./large_textures \\
  --output-dir ./output \\
  --add-textures \\
  --archive-textures

Issue: "Project already exists"

Cause: .hdpc file already exists at the specified path.

Solution: 1. Delete or rename the existing .hdpc file 2. Choose a different --hdpc-path 3. Use the existing project with other commands (e.g., upload, process)


Advanced Tips

Tip 1: Test with Small Dataset First

Before processing large collections, test with a small subset:

# Create test subdirectory with a few files
mkdir -p ./test_subset
cp -r ./test_data/subdirectory_mode_examples/archaelogical_site_001 ./test_subset/

# Test command
python main.py create \\
  --hdpc-path ./projects/test.hdpc \\
  --project-name "Test Run" \\
  --short-code "TEST" \\
  --input-dir ./test_subset \\
  --output-dir ./output/test \\
  --batch-entity subdirectory

Tip 2: Use Absolute Paths for Clarity

Relative paths can be confusing. Use absolute paths for production:

python main.py create \\
  --hdpc-path /full/path/to/projects/production.hdpc \\
  --project-name "Production Collection" \\
  --short-code "PROD2025" \\
  --input-dir /full/path/to/source_data \\
  --output-dir /full/path/to/output

Tip 3: Organize Output by Project

Create dedicated output directories for each project:

mkdir -p ./output/museum_pottery
python main.py create \\
  --hdpc-path ./projects/museum_pottery.hdpc \\
  --project-name "Museum Pottery Collection" \\
  --short-code "MPC2025" \\
  --input-dir ./source/pottery \\
  --output-dir ./output/museum_pottery

Tip 4: Document Your Bundling Strategy

Save your bundling configuration in a README:

# bundling_config.txt
Project: Archaeological Site Scans
Strategy: pattern
Pattern: (.+?)_Section_[A-Z0-9]+
Reasoning: Large building models split into lettered sections

Summary

The create command provides a powerful, flexible way to initialize Heritage Data Processor projects with comprehensive file scanning and validation. Key takeaways:

  1. Batch Entity Mode determines how files are grouped into records
  2. Bundling Strategy controls how related files are associated within records
  3. 3D Model Options enable automatic MTL and texture scanning
  4. Test with small datasets before processing large collections
  5. Choose bundling strategy based on your filename conventions

For further assistance, consult the test data examples or contact the development team.