Skip to content

Update Script Reference

The Heritage Data Processor Update Script (update_hdp.sh) safely updates the repository and dependencies with extensive validation, backup, and rollback capabilities.

Overview

This production-ready update script provides automated updates with safety features including dry-run mode, automatic backups, intelligent file cleanup strategies, and comprehensive error recovery.


Usage

Basic Usage

./update_hdp.sh

Updates to latest version with interactive prompts for version selection and file cleanup decisions.

Command-Line Options

./update_hdp.sh [OPTIONS]

Options:

  • --dry-run: Preview changes without applying them
  • --non-interactive: Skip interactive prompts, use safe defaults
  • --branch BRANCH: Update to specific branch. Default: main
  • --version TAG: Update to specific version tag
  • --latest: Update to latest version. Default behavior
  • --cleanup: Aggressively remove files not in target version
  • --help: Display usage information and exit

Usage Examples

Example 1: Standard Update

./update_hdp.sh

Interactive update to latest version with file preservation prompts.


Example 2: Specific Version

./update_hdp.sh --version v1.2.5

Updates directly to version 1.2.5.


Example 3: Automated Update

./update_hdp.sh --non-interactive --cleanup

Fully automated update with aggressive cleanup for CI/CD.


Example 4: Preview Changes

./update_hdp.sh --dry-run

Preview all changes without applying them.


Example 5: Branch Update

./update_hdp.sh --branch development

Updates to development branch instead of tagged release.


Requirements

Required Tools

  • git: Version control operations
  • uv: Python package management

Optional Tools

  • npm, pnpm, or yarn: Node.js package management (detected automatically)

System Resources

  • Disk Space: Minimum 500 MB free space
  • Memory: At least 100 MB available (warning if less)
  • Network: Internet connectivity for remote repository access

Update Process

Pre-Flight Checks

Comprehensive validation before any changes are made.

System Validation:

  • Tool Availability: Verifies git and uv are installed
  • Git Repository: Confirms running inside a git repository with commits
  • Disk Space: Ensures at least 500 MB available
  • Memory: Checks available RAM (warns if low)
  • Git Configuration: Validates user.name and user.email are set
  • Network: Tests connectivity to remote repository

Git State Validation:

  • Merge/Rebase: Detects ongoing merge or rebase operations
  • Remote: Verifies origin remote is configured
  • Connectivity: Tests git ls-remote to confirm repository access

Error Messages:

Git Not Configured:

🔴 Git user.name and user.email must be configured.
Run:
  git config --global user.name 'Your Name'
  git config --global user.email 'your.email@example.com'

Merge in Progress:

🔴 Merge in progress. Complete or abort the merge first:
  git merge --abort # to abort
  git merge --continue # to complete

Cannot Connect to Remote:

🔴 Cannot connect to remote repository.
Check:
 - Internet connectivity
 - VPN connection if required
 - SSH keys or credentials
 - Repository access permissions

Step 1: Version Selection

Fetches available versions and allows user to select target.

Process:

  1. Fetch Remote: Runs git fetch --all --tags --prune
  2. Current Version: Displays current tag or branch
  3. Available Versions: Lists tags sorted by semantic version
  4. Selection: Interactive or automatic based on mode

Interactive Selection:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Available Versions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  0) Latest (main branch)
  1) v1.3.0
  2) v1.2.5
  3) v1.2.4
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Select version (0-15, default: 0):

Non-Interactive Mode:

Automatically selects latest tagged version or uses --version parameter.


Step 2: File Change Analysis

Analyzes which files will be added, modified, or deleted during update.

Analysis Process:

  1. Get Target Commit: Resolves target version to commit SHA
  2. Compare Trees: Uses git diff to compare current and target
  3. Categorize Changes:
  4. Tracked files to delete (exist in current, not in target)
  5. Untracked files (not in git, may be user data)
  6. Files to modify (changed between versions)
  7. Files to add (new in target version)

Display Format:

[INFO] Files that will be updated/added: 23

Modified files:
 ~ server_app/routes/zenodo.py
 ~ requirements.txt
 ~ package.json

New files:
 + server_app/routes/pipeline_manager.py
 + docs/api/pipeline.md

Deletion Warning:

⚠️ 15 file(s) would be affected by cleanup:
 - 3 tracked file(s) removed in target version
 - 12 untracked file(s) (not in git)

Files that would be deleted:
 - [tracked] old_module.py
 - [tracked] deprecated_script.sh
 - [untracked] my_config.yaml
 - [untracked] user_data.json

File Cleanup Strategies

Three strategies for handling files that don't exist in target version:

Strategy 1: Delete All Files (clean-all)

Removes all files not present in target version, including untracked files:

  • Enabled with --cleanup flag
  • Most aggressive cleanup
  • Suitable for fresh installs or CI environments

Strategy 2: Preserve Untracked Files

Removes tracked files deleted from target, but keeps untracked files:

  • Balances cleanup with data preservation
  • Removes obsolete tracked files
  • Protects user-created files

Strategy 3: Preserve All Files

Keeps all extra files regardless of status:

  • Default in non-interactive mode
  • Safest option for production
  • Prevents accidental data loss

Interactive Selection:

Options:
  1) Delete all files (clean update)
  2) Keep untracked files only (delete tracked removals)
  3) Keep all files (preserve everything)

Choose option (1-3, default: 3):

Step 3: Backup Creation

Creates automatic backup of local changes before updating.

Stash Process:

  1. Detect Changes: Checks for unstaged, staged, and untracked changes
  2. Create Stash: Uses git stash push with descriptive message
  3. Verify Stash: Confirms stash was created successfully

Stash Naming:

Format: update-backup-{timestamp}

Example: update-backup-1698765432

Stash Strategy:

Depends on cleanup mode:

  • clean-all: Stashes including untracked files (-u flag)
  • preserve modes: Stashes tracked changes only

Dry-Run Mode:

[INFO] [DRY RUN] Would create stash: update-backup-1698765432

Step 4: Repository Update

Updates git repository to target version using appropriate strategy.

Update Strategies:

Clean Checkout (clean-all mode):

git checkout -f "$TARGET_VERSION"  # Force checkout
git clean -fd                       # Remove untracked files

Selective Cleanup (preserve-untracked mode):

git checkout "$TARGET_VERSION"     # Regular checkout
# Untracked files preserved automatically

Merge Checkout (preserve-all mode):

git checkout --merge "$TARGET_VERSION"  # Attempt merge

If merge checkout fails, uses alternative preservation method:

  1. Backup Files: Copies files that would be deleted to temp directory
  2. Checkout: Performs regular checkout
  3. Restore Files: Copies backed-up files back to their locations

Branch Handling:

If target is a branch (not a tag), performs pull after checkout:

git pull origin "$TARGET_VERSION"

Commit Information:

Displays current commit after update:

[DEBUG] Commit: abc123d (2025-10-21)

Step 5: Python Dependencies Update

Updates Python packages to match target version requirements.

Dependency Files:

Searches for in priority order:

  1. pyproject.toml (modern Python projects)
  2. requirements.txt (traditional approach)

Virtual Environment:

Searches for existing venv:

  • .venv (preferred)
  • venv (alternative)

If not found, creates .venv with Python 3.11:

uv venv .venv --python=3.11

Installation Commands:

For pyproject.toml:

uv sync

For requirements.txt:

uv pip install --python .venv -r requirements.txt

Error Handling:

Provides detailed troubleshooting for common failures:

🔴 Failed to sync Python dependencies.
Common causes:
 - Package version conflicts
 - Network issues
 - Missing build dependencies

Check the log for details: update_hdp_20251021_140000.log

Try manually:
 uv sync --verbose

Step 6: Node.js Dependencies Update

Updates Node.js packages using detected package manager.

Package Manager Detection:

Automatic detection based on lock files:

  • pnpm-lock.yamlpnpm
  • yarn.lockyarn
  • package-lock.jsonnpm
  • package.json only → npm (with warning)

Installation:

$package_manager install

Error Diagnostics:

🔴 Failed to install Node.js dependencies.
Common causes:
 - Network issues
 - Package version conflicts
 - Peer dependency issues
 - Registry authentication

Check the log: update_hdp_20251021_140000.log

Try manually:
 npm install --verbose

Step 7: Version Info Update

Updates metadata file tracking installation version.

File Location: .installed_version

Contents:

VERSION=v1.3.0
UPDATE_DATE=2025-10-21 14:00:00
COMMIT=abc123def456789

Purpose: Tracks which version is currently installed for troubleshooting.


Step 8: Restore Stashed Changes

Attempts to restore previously stashed changes.

Restoration:

git stash pop

Conflict Handling:

If conflicts occur during stash pop, provides detailed resolution instructions:

═══════════════════════════════════════════════════════════
 MANUAL INTERVENTION REQUIRED
═══════════════════════════════════════════════════════════

Your stashed changes conflict with the updated code.
Your changes are safe in: update-backup-1698765432

To resolve:
 1. Check conflicts: git status
 2. Edit conflicting files and resolve markers (<<<<, ====, >>>>)
 3. Stage resolved files: git add <file>
 4. Test your changes
 5. Drop the stash: git stash drop

To abort and restore original state:
 git reset --hard
 git stash pop

Log file: update_hdp_20251021_140000.log
═══════════════════════════════════════════════════════════

No Stash:

If no changes were stashed, skips this step:

[INFO] Step 7: No stashed changes to restore

Rollback & Recovery

Automatic Rollback

On error, attempts to restore previous state:

cleanup_on_error() {
  info "Attempting to restore previous state..."
  if [ -n "$BACKUP_STASH" ]; then
    info "Restoring from backup stash: $BACKUP_STASH"
    git stash pop "stash@{0}" 2>/dev/null || true
  fi
}

Trap Registration:

Cleanup function is registered for error signals:

trap cleanup_on_error EXIT INT TERM

Manual Rollback

If automatic rollback fails, log file contains complete history for manual recovery.

Manual Recovery Steps:

  1. Check log file for point of failure
  2. Review git status
  3. Restore from stash if needed
  4. Reset to previous commit if necessary

Dry-Run Mode

Purpose

Preview all changes without modifying anything.

Activation:

./update_hdp.sh --dry-run

Behavior:

  • Performs all checks and analysis
  • Displays what would be done
  • Skips actual git operations
  • Skips dependency installations
  • Logs all planned actions

Example Output:

⚠️ DRY RUN MODE - No changes will be made

[INFO] [DRY RUN] Would create stash: update-backup-1698765432
[INFO] [DRY RUN] Would checkout: v1.3.0
[INFO] [DRY RUN] Would delete files not in target version
[INFO] [DRY RUN] Would update Python dependencies
[INFO] [DRY RUN] Would run: npm install
[INFO] [DRY RUN] Would update version info
[INFO] [DRY RUN] Would restore stash: update-backup-1698765432

⚠️ DRY RUN completed - no changes were made
[INFO] Re-run without --dry-run to apply changes

Update Summary

Displays comprehensive summary after successful update:

═══════════════════════════════════════════════════════════
✅ Update Complete!
═══════════════════════════════════════════════════════════

Summary:
 Version: v1.3.0
 Python: Updated
 Node.js: Updated
 Duration: 87s
 Log file: update_hdp_20251021_140000.log