Heritage Data Processor CLI: `create` Command Guide¶

Overview¶

The create command is the primary CLI tool for initializing new Heritage Data Processor (HDP) projects. It performs a complete, atomic project setup in a single operation, creating the .hdpc database file, scanning source files, validating them, and preparing them for further processing.

This guide focuses on 3D model workflows, which is the most common use case for heritage digitization projects.

Basic Concepts¶

What is a `.hdpc` Project?¶

An .hdpc file is a SQLite database that contains: - Project metadata: Name, short code, timestamps - File inventory: Scanned source files with hierarchical relationships - Configuration: Paths, modality templates, scan options - Metadata mappings: Field mappings for publication workflows

Modality Templates¶

Modality templates are predefined configurations that specify: - Which file extensions to scan initially - Default scan behaviors for that data type - Common validation rules

For 3D models, the default modality is "3D Model", which includes extensions like .obj, .mtl, .glb, .gltf, .fbx, .ply, and .stl.

Batch Entity Modes¶

The batch entity mode determines how files are grouped into records (publishable units):

Mode	Behavior	Use Case
root	Each file in the root directory becomes a separate record	Individual artifacts with no subfolder organization
subdirectory	Each subdirectory becomes one record, containing all its files	Collections organized by artifact/site in folders
hybrid	Combines both: root files as separate records + subdirectories as grouped records	Mixed datasets with both standalone files and organized collections

Command Syntax¶

python main.py create \\
  --hdpc-path <path_to_hdpc_file> \\
  --project-name <descriptive_name> \\
  --short-code <unique_code> \\
  --input-dir <source_directory> \\
  --output-dir <output_directory> \\
  [OPTIONS]

Required Arguments¶

`--hdpc-path`¶

Path where the .hdpc project file will be created.

Must end with .hdpc extension
Parent directory must exist
File will be created by the command (must not already exist)

Example:

--hdpc-path ./projects/museum_collection.hdpc

`--project-name`¶

Descriptive human-readable name for the project.

Can contain spaces and special characters
Will be used in reports and publication metadata
Should be meaningful and descriptive

Example:

--project-name "Ancient Greek Pottery Collection 2025"

`--short-code`¶

Short unique identifier for the project.

Typically alphanumeric, no spaces
Used for internal references and file naming
Should be concise (e.g., institution code + year)

Example:

--short-code "AGPC2025"

`--input-dir`¶

Directory containing the source data files to scan.

Must be an existing directory
All files matching the modality extensions will be scanned
Subdirectories are scanned based on --batch-entity mode

Example:

--input-dir ./test_data/root_mode_examples/stem_bundling

`--output-dir`¶

Directory where processed outputs will be stored.

Will be created if it doesn't exist
Used for Zenodo uploads, derivatives, and exports
Should be separate from input directory

Example:

--output-dir ./output/processed_collections

Batch Entity Modes¶

Mode 1: `root` (Default)¶

Behavior: Each file in the root level of --input-dir becomes a separate record.

When to use: - Individual artifact scans stored as separate files - No subfolder organization - Each 3D model represents a distinct publishable item

Example directory structure:

test_data/root_mode_examples/no_bundling/
├── artifact_photo.png          → Record 1
├── building_scan.glb           → Record 2
├── statue_model.obj            → Record 3
└── terrain_data.fbx            → Record 4

Command:

python main.py create \\
  --hdpc-path ./projects/individual_artifacts.hdpc \\
  --project-name "Individual Artifacts" \\
  --short-code "INDART2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/individual \\
  --batch-entity root

Result: 4 separate records, one per file.

Mode 2: `subdirectory`¶

Behavior: Each subdirectory in --input-dir becomes one record, containing all files within it.

When to use: - Files are pre-organized into folders by artifact/site - Each folder represents one publishable unit - Multiple related files (OBJ + MTL + textures) belong together

Example directory structure:

test_data/subdirectory_mode_examples/
├── archaelogical_site_001/        → Record 1
│   ├── excavation_photo_001.png
│   ├── excavation_photo_002.png
│   ├── site_overview.obj
│   ├── site_overview.mtl
│   └── textures/
│       ├── stone_texture.jpg
│       └── wood_normal.jpg
├── archaelogical_site_002/        → Record 2
│   ├── artifact_scan.glb
│   ├── context_photo.png
│   └── documentation.pdf
└── museum_collection_item_045/    → Record 3
    ├── detail_scan_001.obj
    ├── detail_scan_001.mtl
    ├── main_model.fbx
    └── reference_images/
        ├── front_view.jpg
        └── side_view.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/site_collections.hdpc \\
  --project-name "Archaeological Site Collections" \\
  --short-code "ASC2025" \\
  --input-dir ./test_data/subdirectory_mode_examples \\
  --output-dir ./output/sites \\
  --batch-entity subdirectory

Result: 3 records (one per subdirectory), each containing multiple files.

Mode 3: `hybrid`¶

Behavior: Combines both approaches: - Files in the root directory become individual records - Each subdirectory becomes one grouped record

When to use: - Mixed dataset with both standalone artifacts and collections - Flexibility for differently organized data

Example directory structure:

test_data/hybrid_mode_examples/
├── standalone_artifact_001.obj    → Record 1 (standalone)
├── standalone_artifact_001.mtl
├── standalone_reference_photo.png → Record 2 (standalone)
├── excavation_batch_alpha/        → Record 3 (grouped)
│   ├── fragment_001.obj
│   ├── fragment_002.obj
│   ├── fragment_003.obj
│   └── shared_textures/
│       ├── clay_texture.jpg
│       └── weathering_normal.jpg
└── excavation_batch_beta/         → Record 4 (grouped)
    ├── complete_vessel.glb
    ├── vessel_fragments.fbx
    └── documentation.txt

Command:

python main.py create \\
  --hdpc-path ./projects/mixed_collection.hdpc \\
  --project-name "Mixed Artifact Collection" \\
  --short-code "MAC2025" \\
  --input-dir ./test_data/hybrid_mode_examples \\
  --output-dir ./output/mixed \\
  --batch-entity hybrid

Result: 4 records: 2 standalone + 2 grouped subdirectories.

Bundling Strategies¶

Bundling determines how related files are grouped together within a record. This is especially important for 3D models, where one logical artifact might consist of multiple files (e.g., model.obj + model.mtl + textures).

Key Concepts¶

Primary Source File: The main file (e.g., .obj file)
Associated Files: Supporting files (MTL, textures, etc.)
Bundling Strategy: The rule for determining which files belong together

Strategy 1: `stem` (Default)¶

Rule: Files with the exact same base filename (stem) are bundled together.

Example:

temple_column.obj       ← Primary
temple_column.mtl       ← Associated (same stem)
temple_column.jpg       ← Associated (same stem)
ceramic_bowl.fbx        ← Separate bundle

When to use: - Standard naming convention where related files share the same name - Most common scenario for 3D models exported from software

Command:

python main.py create \\
  --hdpc-path ./projects/stem_bundle.hdpc \\
  --project-name "Stem Bundling Example" \\
  --short-code "STEM2025" \\
  --input-dir ./test_data/root_mode_examples/stem_bundling \\
  --output-dir ./output/stem \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy stem

Result: Files grouped by identical stems.

Strategy 2: `pattern`¶

Rule: Files matching a regex pattern are bundled together by extracting a common identifier.

Example pattern: (.+?)_Section_[A-Z] matches files like:

Athens_Temple_Section_A.obj    → Bundle: "Athens_Temple"
Athens_Temple_Section_B.obj    → Bundle: "Athens_Temple"
Athens_Temple_materials.mtl    → Bundle: "Athens_Temple"
Rome_Forum_Column_01.obj       → Bundle: "Rome_Forum"
Rome_Forum_Column_02.obj       → Bundle: "Rome_Forum"
Rome_Forum_materials.mtl       → Bundle: "Rome_Forum"

When to use: - Complex naming schemes with prefixes/suffixes - Multi-part models (e.g., large building sections) - Need custom grouping logic

Command:

python main.py create \\
  --hdpc-path ./projects/pattern_bundle.hdpc \\
  --project-name "Pattern Bundling Example" \\
  --short-code "PATT2025" \\
  --input-dir ./test_data/bundling_strategies/pattern_matching \\
  --output-dir ./output/pattern \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_(Section|Column)_[A-Z0-9]+"

Result: Files grouped by extracted identifier from regex pattern.

Strategy 3: `prefix_suffix`¶

Rule: Remove specified prefix and/or suffix patterns from filenames, then bundle by resulting core name.

Example with prefix v\\d+_ and suffix _(hiRes|lowRes):

v1_pottery_fragment.obj       → Core: "pottery_fragment"
v2_pottery_fragment.obj       → Core: "pottery_fragment"
v3_pottery_fragment.obj       → Core: "pottery_fragment"
pottery_fragment.mtl          → Core: "pottery_fragment"

n001_vase_hiRes.obj           → Core: "vase"
n001_vase_lowRes.obj          → Core: "vase"
n001_vase.mtl                 → Core: "vase"

When to use: - Version numbers at the start of filenames - Resolution indicators (hiRes/lowRes) - Inventory prefixes (n001_, n002_)

Command (prefix removal):

python main.py create \\
  --hdpc-path ./projects/prefix_bundle.hdpc \\
  --project-name "Prefix Removal Example" \\
  --short-code "PREF2025" \\
  --input-dir ./test_data/bundling_strategies/prefix_removal \\
  --output-dir ./output/prefix \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-prefix "v\\d+_"

Command (suffix removal):

python main.py create \\
  --hdpc-path ./projects/suffix_bundle.hdpc \\
  --project-name "Suffix Removal Example" \\
  --short-code "SUFF2025" \\
  --input-dir ./test_data/bundling_strategies/suffix_removal \\
  --output-dir ./output/suffix \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-suffix "_(edge|obverse|reverse)_scan"

Result: Files grouped after removing prefix/suffix patterns.

Strategy 4: `core_identifier`¶

Rule: Extract a core identifier from filenames using a specific pattern (e.g., site042, site093), ignoring descriptive suffixes.

Example:

site042_structure_detail.obj   → Bundle: "site042"
site042_structure_main.fbx     → Bundle: "site042"
site093_artifact_complete.glb  → Bundle: "site093"
site093_artifact_fragment.obj  → Bundle: "site093"

When to use: - Site/excavation number prefixes - Catalog identifiers embedded in filenames - Need to group by administrative code

Command:

python main.py create \\
  --hdpc-path ./projects/core_id_bundle.hdpc \\
  --project-name "Core Identifier Example" \\
  --short-code "CORE2025" \\
  --input-dir ./test_data/bundling_strategies/core_identifier \\
  --output-dir ./output/core_id \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy core_identifier

Result: Files grouped by extracted core identifier (e.g., site042).

3D Model-Specific Options¶

`--add-mtl` / `--no-add-mtl`¶

Purpose: Automatically scan for and include .mtl (material) files associated with .obj files.

Default: True (enabled)

Behavior: - When an .obj file is found, the scanner looks for a corresponding .mtl file with the same stem - Example: roman_statue.obj → scans for roman_statue.mtl

When to disable: - OBJ files don't use materials - MTL files are stored separately or managed differently

Command (enabled):

python main.py create \\
  --hdpc-path ./projects/with_mtl.hdpc \\
  --project-name "OBJ with Materials" \\
  --short-code "MTL2025" \\
  --input-dir ./test_data/complex_dependencies/obj_with_dependencies \\
  --output-dir ./output/with_mtl \\
  --add-mtl

Command (disabled):

python main.py create \\
  --hdpc-path ./projects/no_mtl.hdpc \\
  --project-name "OBJ without Materials" \\
  --short-code "NOMTL2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/no_mtl \\
  --no-add-mtl

`--add-textures` / `--no-add-textures`¶

Purpose: Automatically scan for and include texture image files referenced in .mtl files.

Default: True (enabled)

Behavior: - Parses .mtl files to extract texture references (e.g., map_Kd marble_diffuse.jpg) - Searches for these texture files in: - Same directory as the MTL file - Subdirectories (e.g., textures/, Materials/) - Additional paths specified with --texture-paths

Supported texture maps: - Diffuse: map_Kd - Specular: map_Ks - Normal: map_Bump, bump - Roughness: map_Ns - Ambient: map_Ka

When to disable: - Textures are stored in a separate archive - Texture files are too large or not needed for publication

Command (enabled with custom search paths):

python main.py create \\
  --hdpc-path ./projects/with_textures.hdpc \\
  --project-name "OBJ with Textures" \\
  --short-code "TEX2025" \\
  --input-dir ./test_data/complex_dependencies/obj_with_dependencies \\
  --output-dir ./output/with_textures \\
  --add-textures \\
  --texture-paths ./additional_textures ./shared_materials

`--archive-textures`¶

Purpose: Archive texture subdirectories into ZIP files for more efficient storage and upload.

Default: False (disabled)

Behavior: - When a texture subdirectory is detected (e.g., textures/, Materials/), it's compressed into a .zip archive - The archive is included as a child file in the hierarchy - Original texture files are still tracked but archived

When to use: - Large texture collections (many small files) - Zenodo uploads (fewer files = faster uploads) - Organized texture folders

Example directory:

archaelogical_site_001/
├── site_overview.obj
├── site_overview.mtl
└── textures/              ← Will be archived to textures.zip
    ├── stone_texture.jpg
    ├── wood_normal.jpg
    └── roof.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/archived_textures.hdpc \\
  --project-name "Archived Textures Example" \\
  --short-code "ARCH2025" \\
  --input-dir ./test_data/subdirectory_mode_examples/archaelogical_site_001 \\
  --output-dir ./output/archived \\
  --batch-entity subdirectory \\
  --add-textures \\
  --archive-textures

Result: textures/ directory archived to textures.zip in the file hierarchy.

`--texture-paths`¶

Purpose: Specify additional directories to search for texture files.

Default: None (only searches in model directory)

Behavior: - When texture files are not found in the default locations, the scanner searches these additional paths - Useful for shared texture libraries or centralized material repositories

Command:

python main.py create \\
  --hdpc-path ./projects/shared_textures.hdpc \\
  --project-name "Shared Texture Library" \\
  --short-code "SHARE2025" \\
  --input-dir ./models \\
  --output-dir ./output/shared \\
  --add-textures \\
  --texture-paths ./common_textures ./material_library

Practical Examples¶

Example 1: Simple Individual Artifacts (Root Mode)¶

Scenario: 4 individual artifact scans, each as separate files, no bundling needed.

Directory:

root_mode_examples/no_bundling/
├── artifact_photo.png
├── building_scan.glb
├── statue_model.obj
└── terrain_data.fbx

Command:

python main.py create \\
  --hdpc-path ./projects/simple_artifacts.hdpc \\
  --project-name "Simple Individual Artifacts" \\
  --short-code "SIA2025" \\
  --input-dir ./test_data/root_mode_examples/no_bundling \\
  --output-dir ./output/simple_artifacts \\
  --batch-entity root

Expected Result: - 4 separate records created - Each file becomes its own publishable unit - No bundling or complex dependencies

Example 2: OBJ Models with MTL and Textures (Stem Bundling)¶

Scenario: Standard 3D model workflow with OBJ files, their MTL materials, and texture images.

Directory:

root_mode_examples/stem_bundling/
├── temple_column.obj
├── temple_column.mtl
├── temple_column.jpg
├── ceramic_bowl.fbx
└── ancient_tablet.glb

Command:

python main.py create \\
  --hdpc-path ./projects/obj_with_materials.hdpc \\
  --project-name "OBJ Models with Materials" \\
  --short-code "OBJMAT2025" \\
  --input-dir ./test_data/root_mode_examples/stem_bundling \\
  --output-dir ./output/obj_materials \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy stem \\
  --add-mtl \\
  --add-textures

Expected Result: - Record 1: temple_column (OBJ + MTL + JPG bundled together) - Record 2: ceramic_bowl (FBX standalone) - Record 3: ancient_tablet (GLB standalone)

Example 3: Complex OBJ with Multiple Textures (Subdirectory Mode)¶

Scenario: One artifact per subdirectory, with texture files in a separate subfolder.

Directory:

subdirectory_mode_examples/archaelogical_site_001/
├── site_overview.obj
├── site_overview.mtl
├── excavation_photo_001.png
├── excavation_photo_002.png
└── textures/
    ├── stone_texture.jpg
    ├── wood_normal.jpg
    └── roof.jpg

Command:

python main.py create \\
  --hdpc-path ./projects/complex_site.hdpc \\
  --project-name "Archaeological Site with Textures" \\
  --short-code "ASTEXT2025" \\
  --input-dir ./test_data/subdirectory_mode_examples \\
  --output-dir ./output/complex_site \\
  --batch-entity subdirectory \\
  --add-mtl \\
  --add-textures \\
  --archive-textures

Expected Result: - Record 1: archaelogical_site_001 containing: - Primary: site_overview.obj - Associated: site_overview.mtl - Associated: excavation_photo_001.png, excavation_photo_002.png - Archived: textures.zip (containing all 3 texture files)

Example 4: Pattern Bundling for Multi-Part Models¶

Scenario: Large building model split into multiple sections (Section A, Section B), sharing common materials.

Directory:

bundling_strategies/pattern_matching/
├── Athens_Temple_Section_A.obj
├── Athens_Temple_Section_B.obj
├── Athens_Temple_materials.mtl
├── Rome_Forum_Column_01.obj
├── Rome_Forum_Column_02.obj
└── Rome_Forum_materials.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/multipart_models.hdpc \\
  --project-name "Multi-Part Building Models" \\
  --short-code "MULTI2025" \\
  --input-dir ./test_data/bundling_strategies/pattern_matching \\
  --output-dir ./output/multipart \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_(Section|Column)_[A-Z0-9]+"

Expected Result: - Record 1: Athens_Temple (Section A OBJ + Section B OBJ + materials MTL) - Record 2: Rome_Forum (Column 01 OBJ + Column 02 OBJ + materials MTL)

Example 5: Version Control (Prefix Removal)¶

Scenario: Multiple scan versions of the same artifact, with version prefixes v1_, v2_, v3_.

Directory:

bundling_strategies/prefix_removal/
├── v1_pottery_fragment.obj
├── v2_pottery_fragment.obj
├── v3_pottery_fragment.obj
└── pottery_fragment.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/versioned_scans.hdpc \\
  --project-name "Versioned Pottery Scans" \\
  --short-code "VERS2025" \\
  --input-dir ./test_data/bundling_strategies/prefix_removal \\
  --output-dir ./output/versioned \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-prefix "v\\d+_"

Expected Result: - Record 1: pottery_fragment (all 3 OBJ versions + shared MTL bundled together)

Example 6: High/Low Resolution Models (Suffix Removal)¶

Scenario: Each artifact has high-resolution and low-resolution variants.

Directory:

bundling_strategies/suffix_removal/
├── coin_obverse_scan.obj
├── coin_reverse_scan.obj
├── coin_edge_scan.obj
└── coin.mtl

Command:

python main.py create \\
  --hdpc-path ./projects/multiview_scans.hdpc \\
  --project-name "Multi-View Coin Scans" \\
  --short-code "COIN2025" \\
  --input-dir ./test_data/bundling_strategies/suffix_removal \\
  --output-dir ./output/multiview \\
  --batch-entity root \\
  --enable-bundling \\
  --bundling-strategy prefix_suffix \\
  --bundling-suffix "_(obverse|reverse|edge)_scan"

Expected Result: - Record 1: coin (obverse + reverse + edge OBJ scans + shared MTL)

Example 7: Hybrid Mode (Mixed Organization)¶

Scenario: Root-level standalone artifacts + subdirectories for grouped collections.

Directory:

hybrid_mode_examples/
├── standalone_artifact_001.obj
├── standalone_artifact_001.mtl
├── standalone_reference_photo.png
├── excavation_batch_alpha/
│   ├── fragment_001.obj
│   ├── fragment_002.obj
│   ├── fragment_003.obj
│   └── shared_textures/
│       ├── clay_texture.jpg
│       └── weathering_normal.jpg
└── excavation_batch_beta/
    ├── complete_vessel.glb
    ├── vessel_fragments.fbx
    └── documentation.txt

Command:

python main.py create \\
  --hdpc-path ./projects/hybrid_collection.hdpc \\
  --project-name "Hybrid Artifact Collection" \\
  --short-code "HYB2025" \\
  --input-dir ./test_data/hybrid_mode_examples \\
  --output-dir ./output/hybrid \\
  --batch-entity hybrid \\
  --enable-bundling \\
  --bundling-strategy stem \\
  --add-mtl \\
  --add-textures \\
  --archive-textures

Expected Result: - Record 1: standalone_artifact_001 (OBJ + MTL bundled) - Record 2: standalone_reference_photo (standalone PNG) - Record 3: excavation_batch_alpha (3 OBJ fragments + archived textures) - Record 4: excavation_batch_beta (GLB + FBX + TXT)

Troubleshooting¶

Issue: "No files found matching extensions"¶

Cause: The specified extensions don't match any files in the input directory.

Solution: 1. Verify --input-dir path is correct 2. Check that files have the expected extensions (e.g., .obj, .glb) 3. Manually specify extensions with --extensions .obj .mtl .png

Example:

python main.py create \\
  --hdpc-path ./projects/custom_ext.hdpc \\
  --project-name "Custom Extensions" \\
  --short-code "CUST2025" \\
  --input-dir ./my_data \\
  --output-dir ./output \\
  --extensions .obj .mtl .fbx .glb .png .jpg

Issue: "MTL file not found for OBJ"¶

Cause: OBJ file exists but corresponding MTL file is missing.

Solution: 1. Verify MTL file has the same stem as OBJ (e.g., model.obj → model.mtl) 2. Check if MTL file is in a different directory 3. Use --no-add-mtl if materials are not needed

Issue: "Texture files not found"¶

Cause: MTL file references textures that don't exist or are in a different location.

Solution: 1. Check MTL file contents for texture paths 2. Ensure texture files are in the same directory or subdirectory 3. Use --texture-paths to specify additional search locations:

python main.py create \\
  --hdpc-path ./projects/missing_tex.hdpc \\
  --project-name "Missing Textures Fix" \\
  --short-code "TEX2025" \\
  --input-dir ./models \\
  --output-dir ./output \\
  --add-textures \\
  --texture-paths ./external_textures ./shared_materials

Issue: "Files not bundled correctly"¶

Cause: Bundling strategy doesn't match the filename convention.

Solution: 1. Verify filenames match the expected pattern 2. Try different bundling strategies: - stem: Exact filename match - pattern: Custom regex pattern - prefix_suffix: Remove prefixes/suffixes 3. Test pattern with --bundling-pattern flag

Example (debugging pattern):

# Test pattern bundling with verbose output
python main.py create \\
  --hdpc-path ./projects/test_pattern.hdpc \\
  --project-name "Test Pattern Bundling" \\
  --short-code "TESTPATT" \\
  --input-dir ./test_data \\
  --output-dir ./output \\
  --enable-bundling \\
  --bundling-strategy pattern \\
  --bundling-pattern "(.+?)_Section_[A-Z]"

Issue: "Too many small files in record"¶

Cause: Large texture directories creating many individual file entries.

Solution: Use --archive-textures to compress texture folders:

python main.py create \\
  --hdpc-path ./projects/archived.hdpc \\
  --project-name "Archived Texture Folders" \\
  --short-code "ARCH2025" \\
  --input-dir ./large_textures \\
  --output-dir ./output \\
  --add-textures \\
  --archive-textures

Issue: "Project already exists"¶

Cause: .hdpc file already exists at the specified path.

Solution: 1. Delete or rename the existing .hdpc file 2. Choose a different --hdpc-path 3. Use the existing project with other commands (e.g., upload, process)

Advanced Tips¶

Tip 1: Test with Small Dataset First¶

Before processing large collections, test with a small subset:

# Create test subdirectory with a few files
mkdir -p ./test_subset
cp -r ./test_data/subdirectory_mode_examples/archaelogical_site_001 ./test_subset/

# Test command
python main.py create \\
  --hdpc-path ./projects/test.hdpc \\
  --project-name "Test Run" \\
  --short-code "TEST" \\
  --input-dir ./test_subset \\
  --output-dir ./output/test \\
  --batch-entity subdirectory

Tip 2: Use Absolute Paths for Clarity¶

Relative paths can be confusing. Use absolute paths for production:

python main.py create \\
  --hdpc-path /full/path/to/projects/production.hdpc \\
  --project-name "Production Collection" \\
  --short-code "PROD2025" \\
  --input-dir /full/path/to/source_data \\
  --output-dir /full/path/to/output

Tip 3: Organize Output by Project¶

Create dedicated output directories for each project:

mkdir -p ./output/museum_pottery
python main.py create \\
  --hdpc-path ./projects/museum_pottery.hdpc \\
  --project-name "Museum Pottery Collection" \\
  --short-code "MPC2025" \\
  --input-dir ./source/pottery \\
  --output-dir ./output/museum_pottery

Tip 4: Document Your Bundling Strategy¶

Save your bundling configuration in a README:

# bundling_config.txt
Project: Archaeological Site Scans
Strategy: pattern
Pattern: (.+?)_Section_[A-Z0-9]+
Reasoning: Large building models split into lettered sections

Summary¶

The create command provides a powerful, flexible way to initialize Heritage Data Processor projects with comprehensive file scanning and validation. Key takeaways:

Batch Entity Mode determines how files are grouped into records
Bundling Strategy controls how related files are associated within records
3D Model Options enable automatic MTL and texture scanning
Test with small datasets before processing large collections
Choose bundling strategy based on your filename conventions

For further assistance, consult the test data examples or contact the development team.

Heritage Data Processor CLI: create Command Guide¶

Overview¶

Table of Contents¶

Basic Concepts¶

What is a .hdpc Project?¶

Modality Templates¶

Batch Entity Modes¶

Command Syntax¶

Required Arguments¶

--hdpc-path¶

--project-name¶

--short-code¶

--input-dir¶

--output-dir¶

Batch Entity Modes¶

Mode 1: root (Default)¶

Mode 2: subdirectory¶

Mode 3: hybrid¶

Bundling Strategies¶

Key Concepts¶

Strategy 1: stem (Default)¶

Strategy 2: pattern¶

Strategy 3: prefix_suffix¶

Strategy 4: core_identifier¶

3D Model-Specific Options¶

--add-mtl / --no-add-mtl¶

--add-textures / --no-add-textures¶

--archive-textures¶

--texture-paths¶

Practical Examples¶

Example 1: Simple Individual Artifacts (Root Mode)¶

Example 2: OBJ Models with MTL and Textures (Stem Bundling)¶

Example 3: Complex OBJ with Multiple Textures (Subdirectory Mode)¶

Example 4: Pattern Bundling for Multi-Part Models¶

Example 5: Version Control (Prefix Removal)¶

Example 6: High/Low Resolution Models (Suffix Removal)¶

Example 7: Hybrid Mode (Mixed Organization)¶

Troubleshooting¶

Issue: "No files found matching extensions"¶

Issue: "MTL file not found for OBJ"¶

Issue: "Texture files not found"¶

Issue: "Files not bundled correctly"¶

Issue: "Too many small files in record"¶

Issue: "Project already exists"¶

Advanced Tips¶

Tip 1: Test with Small Dataset First¶

Tip 2: Use Absolute Paths for Clarity¶

Tip 3: Organize Output by Project¶

Tip 4: Document Your Bundling Strategy¶

Summary¶

Heritage Data Processor CLI: `create` Command Guide¶

What is a `.hdpc` Project?¶

`--hdpc-path`¶

`--project-name`¶

`--short-code`¶

`--input-dir`¶

`--output-dir`¶

Mode 1: `root` (Default)¶

Mode 2: `subdirectory`¶

Mode 3: `hybrid`¶

Strategy 1: `stem` (Default)¶

Strategy 2: `pattern`¶

Strategy 3: `prefix_suffix`¶

Strategy 4: `core_identifier`¶

`--add-mtl` / `--no-add-mtl`¶

`--add-textures` / `--no-add-textures`¶

`--archive-textures`¶

`--texture-paths`¶