Skip to content

Test Data: Core Identifier Bundling Strategy

Purpose

This demonstrates extracting a core pattern (like a site number) that may appear with various prefixes and suffixes in filenames.

Processing Mode Configuration

  • Batch Entity: root
  • Bundle Congruent Patterns: Yes (checked)
  • Bundling Strategy: coreidentifier
  • Core Pattern: site\d+ (extracts site numbers like site042, site093)
  • Primary Source Extension: .obj

File Extensions to Select

  • .obj
  • .fbx
  • .glb

OBJ File Options

  • Add MTL Files: No (unchecked)
  • Add Texture Files: No (unchecked)

Expected Behavior

When scanned: - 2 Zenodo records will be created (site042 and site093) - The core pattern "site042" or "site093" is extracted from various filename formats - Files with the same site number are grouped regardless of surrounding text

Bundle 1 (site042):

  • site042_structure_main.fbx (source)
  • site042_structure_detail.obj (source)

Bundle 2 (site093):

  • site093_artifact_complete.glb (source)
  • site093_artifact_fragment.obj (source)

File Count

  • Total files: 4
  • Primary sources: 4 (mixed formats)
  • Dependencies: 0

Pattern Explanation

  • Core Pattern: site\d+
  • site - Literal "site" text
  • \d+ - One or more digits
  • Result: Extracts "site042" from "site042_structure_main.fbx"
  • Matches anywhere in the filename, regardless of surrounding text

Alternative Core Patterns to Try

  • ID\d{6} - Extracts 6-digit IDs (ID000123)
  • specimen_[A-Z]+ - Extracts specimen codes (specimen_ABC)
  • \d{4}-\d{2} - Extracts date patterns (2024-03)

Use Case

Core identifier strategy is ideal when: - A specific identifier appears consistently across related files - The identifier may be embedded in different filename structures - Surrounding text varies but the core ID remains the same