7  Chapter 6: Data Extraction

A save editor needs game data: weapon stats, part definitions, manufacturer information. You might assume this data lives neatly in game files, waiting to be extracted. The reality is more complicated—and more interesting.

This chapter explores what data we can extract, what we can’t, and why. Along the way, we’ll document our investigation into authoritative category mappings, including the binary analysis that revealed why some data simply doesn’t exist in extractable form.


7.1 The Game File Landscape

BL4’s data lives in Unreal Engine pak files, stored in IoStore format:

Borderlands 4/OakGame/Content/Paks/
├── pakchunk0-Windows_0_P.utoc    ← Main game assets
├── pakchunk0-Windows_0_P.ucas    ← Compressed data
├── pakchunk2-Windows_0_P.utoc    ← Audio (Wwise)
├── pakchunk3-Windows_0_P.utoc    ← Localized audio
├── global.utoc                   ← Shared engine data
└── ...

IoStore is UE5’s container format, splitting asset indices (.utoc) from compressed data (.ucas). This differs from older PAK-only formats and requires specialized tools.

!!! note BL4 uses IoStore (UE5’s format), not legacy PAK. Tools like repak won’t work on .utoc/.ucas files. You need retoc or similar IoStore-aware extractors.


7.2 What We Can Extract

Some game data extracts cleanly from pak files:

Balance data: Stat templates and modifiers for weapons, shields, and gear. These define base damage, fire rate, accuracy scales.

Naming strategies: How weapons get their prefix names. “Damage → Tortuous” mappings live in extractable assets.

Body definitions: Weapon body assets that reference parts and mesh fragments.

Loot pools: Drop tables and rarity weights for different sources.

Gestalt meshes: Visual mesh fragments that parts reference.

These assets follow Unreal’s content structure:

OakGame/Content/
├── Gear/
│   ├── Weapons/
│   │   ├── _Shared/BalanceData/
│   │   ├── Pistols/JAK/Parts/
│   │   └── ...
│   └── Shields/
├── PlayerCharacters/
│   ├── DarkSiren/
│   └── ...
└── GameData/Loot/

7.3 What We Can’t Extract

Here’s where it gets interesting. The mappings between serial tokens and actual game parts—the heart of what makes serial decoding work—don’t exist as extractable pak file assets.

We wanted authoritative category mappings. Serial token {4} on a Vladof SMG should mean a specific part, and we wanted the game’s own data to tell us which one. So we investigated.

7.3.1 The Investigation: Binary Analysis

We used Rizin (a radare2 fork) to analyze the Borderlands4.exe binary directly:

rz-bin -S Borderlands4.exe

Results: - Total size: 715 MB - .sdata section: 157 MB (code) - .rodata section: 313 MB (read-only data)

We searched for part prefix strings like “DAD_PS.part_” and “VLA_SM.part_barrel”. Nothing. The prefixes don’t exist as literal strings in the binary.

We searched for category value sequences. Serial decoding uses Part Group IDs like 2, 3, 4, 5, 6, 7 (consecutive integers stored as i64). We found one promising sequence at offset 0x02367554:

# Found sequence 2,3,4,5,6,7 as consecutive i64 values at 0x02367554

But examining the context revealed it was near crypto code—specifically “Poly1305 for x86_64, CRYPTOGAMS”. Those consecutive integers were coincidental, not category definitions.

!!! warning “False Positives” When searching binaries for numeric patterns, verify the context. Small consecutive integers appear in many places: crypto code, lookup tables, version numbers. Always examine surrounding bytes.

7.3.2 UE5 Metadata: What We Know

From usmap analysis, we confirmed the exact structure linking parts to serials:

GbxSerialNumberIndex (12 bytes)
├── Category (Int64): Part Group ID
├── scope (Byte): EGbxSerialNumberIndexScope (Root=1, Sub=2)
├── status (Byte): EGbxSerialNumberIndexStatus
└── Index (Int16): Position in category

Every InventoryPartDef contains this structure. The Category field maps to Part Group IDs (2=Daedalus Pistol, 22=Vladof SMG, etc.). The Index field determines which part token decodes to this part.

But here’s the problem: we found zero InventoryPartDef assets in pak files.

uextract /path/to/Paks find-by-class InventoryPartDef
# Result: 0 assets found

7.3.3 Where Parts Actually Live

Parts aren’t stored as individual pak file assets. They’re:

  1. Runtime UObjects — Created when the game initializes
  2. Code-defined — Registrations happen in native code
  3. Self-describing — Each part carries its own index internally

7.3.4 The Key Insight: Self-Describing Parts

Here’s the crucial design pattern we discovered: there is no separate mapping file because each part stores its own index.

Every part UObject contains a GbxSerialNumberIndex structure at offset +0x28:

UObject + 0x28: GbxSerialNumberIndex (4 bytes)
├── Scope (1 byte)   ← EGbxSerialNumberIndexScope (Root=1, Sub=2)
├── Status (1 byte)  ← Reserved/state flags
└── Index (2 bytes)  ← THE serial index for this part

This is a “reverse mapping” architecture:

  • Traditional approach: Separate lookup table maps index → part_name
  • BL4’s approach: Each part stores its own index; the “mapping” IS the parts themselves

Why this design makes sense:

Benefit Explanation
No central registry Adding DLC parts doesn’t require updating a mapping file
Self-contained Each part is fully self-describing
Stable indices A part’s index never changes because it’s intrinsic to that part
No sync issues Impossible for mapping to drift from actual parts

Practical implication: When we extract parts from memory, we’re not building a mapping from separate data—we’re reading the authoritative index directly from each part. The memory dump contains the complete, correct mapping because that mapping IS the parts.

!!! note “Why Memory Dumps Are Essential” Since each part carries its own index internally, and parts only exist as runtime UObjects (not pak file assets), memory dumps are the only way to capture this data. The game’s binary contains the code to create parts, but the actual GbxSerialNumberIndex values are set during initialization.


7.4 Memory Extraction: The Breakthrough

Through systematic memory analysis, we discovered authoritative part-to-index mappings can be extracted from memory dumps. Here’s the structure:

7.4.1 The Part Registration Structure

When the game loads, it creates UObjects for each part and registers them in an internal array. This array has a discoverable pattern:

Part Array Entry (24 bytes):
├── FName Index (4 bytes)     ← References the part name in FNamePool
├── Padding (4 bytes)         ← Always zero
├── Pointer (8 bytes)         ← Address of the part's UObject
├── Marker (4 bytes)          ← 0xFFFFFFFF sentinel value
└── Priority (4 bytes)        ← Selection priority (not the serial index!)

The serial Index is stored inside the pointed UObject, at offset +0x28:

UObject at Pointer (offset +0x28):
├── Scope (1 byte)            ← EGbxSerialNumberIndexScope (always 2 for parts)
├── Reserved (1 byte)         ← Usually 0
└── Index (2 bytes, Int16)    ← THE SERIAL INDEX we need!

!!! important “Category Derivation” The Part Group ID (category) is not stored in the UObject at a fixed offset. Instead, derive it from the part name prefix (e.g., DAD_PS → category 2, VLA_AR → category 17). The bl4 tool includes a complete prefix-to-category mapping.

7.4.2 Verified Example

Searching for FName DAD_PS.part_barrel_01 (FName index 0x736a0a):

  1. Find the array entry: FName appears in the part array with pointer 0x7ff4ca7d75d0
  2. Read offset +0x28: At 0x7ff4ca7d75f8 we find 02 00 07 00
  3. Parse: Scope=2, Reserved=0, Index=7
  4. Derive category: DAD_PS prefix → category 2
  5. Verify: Reference data confirms DAD_PS.part_barrel_01 has index 7 ✓

Additional verified mappings: - DAD_PS.part_barrel_02 → Index 8 ✓ - DAD_PS.part_barrel_01_Zipgun → Index 1 ✓ - DAD_PS.part_barrel_02_rangefinder → Index 78 ✓

7.4.3 Extraction Algorithm

# Pseudocode for extracting all part mappings
def extract_parts(memory_dump):
    # Step 1: Build FName lookup table
    # Scan FNamePool for all names containing ".part_"
    fname_table = {}  # fname_idx -> name
    for block in fnamepool.blocks:
        for entry in block:
            if ".part_" in entry.name.lower():
                fname_table[entry.index] = entry.name

    parts = []

    # Step 2: Scan memory for 0xFFFFFFFF markers
    for marker_addr in scan(memory_dump, "ff ff ff ff"):
        # Read the 24-byte entry (marker is at offset 16)
        entry = read(marker_addr - 16, 24)

        fname_idx = entry[0:4]      # FName index
        pointer = entry[8:16]       # UObject pointer

        # Validate: known FName, padding=0, valid pointer
        if fname_idx not in fname_table:
            continue
        if entry[4:8] != 0 or not is_valid_pointer(pointer):
            continue

        # Read serial index from pointed UObject at offset +0x28
        uobject = read(pointer, 0x2C)
        scope = uobject[0x28]
        index = uobject[0x2A:0x2C]  # Int16 at bytes 2-3

        # Derive category from part name prefix
        name = fname_table[fname_idx]
        category = get_category_from_prefix(name)

        if category is not None:
            parts.append({
                'name': name,
                'category': category,
                'index': index
            })

    return parts

def get_category_from_prefix(name):
    prefix = name.split(".part_")[0].lower()
    # Pistols
    if prefix == "dad_ps": return 2
    if prefix == "jak_ps": return 3
    # ... (complete mapping in bl4 source)
    return None

7.4.4 Why This Works

The game registers parts at startup into internal arrays. Each entry links: - FName reference → The part’s name (e.g., “VLA_SM.part_barrel_01”) - UObject pointer → The full part definition, including serial index

By scanning for the 0xFFFFFFFF sentinel pattern that marks entry boundaries, we can walk these arrays and extract every part mapping the game knows about.

!!! tip “Practical Implication” Memory dumps contain authoritative part-to-index mappings. Extract them directly—no empirical testing required for known parts. Empirical validation is only needed for new parts added in patches.

7.4.5 Extraction Results

Running the extraction on a Dec 2025 memory dump yields:

Metric Value
Total parts extracted 1,070
FNames scanned 1,399
Categories covered 49
Match rate vs reference 84.3%
Core weapons (cat 2-29) 100% accurate

Distribution by type:

Type Categories Parts
Pistols 2-7 207
Shotguns 8-13 190
Assault Rifles 14-19 169
SMGs 20-23 135
Snipers 25-29 176
Heavy Weapons 244-247 37
Shields 279-287 44
Gadgets 300-330 146
Enhancements 400-409 23

!!! success “Core Weapon Accuracy” The extraction achieves zero mismatches for core weapon categories (2-29). Mismatches only occur in heavy weapons, shields, gadgets, and enhancements—likely due to different UObject layouts or reference data issues for those categories.


7.5 Empirical Validation (Fallback)

For edge cases or when memory extraction isn’t possible, empirical validation remains an option:

  1. Collect serials from real game items
  2. Decode the Part Group ID and part tokens
  3. Record which weapon/part combinations the tokens represent
  4. Validate by injecting serials into saves and checking in-game

The parts_database.json file combines memory-extracted mappings with empirically-verified data for comprehensive coverage.


7.6 Extraction Tools

7.6.1 retoc — IoStore Extraction

The essential tool for BL4’s pak format:

cargo install --git https://github.com/trumank/retoc retoc_cli

# List assets in a container
retoc list /path/to/pakchunk0-Windows_0_P.utoc

# Extract all assets
retoc unpack /path/to/pakchunk0-Windows_0_P.utoc ./output/

!!! warning For converting to legacy format, point at the Paks directory, not a single file. The tool needs access to global.utoc for ScriptObjects: bash retoc to-legacy /path/to/Paks/ ./output/ --no-script-objects

7.6.2 uextract — Project Tool

The bl4 project’s custom extraction tool:

cargo build --release -p uextract

# List all assets
./target/release/uextract /path/to/Paks --list

# Extract with filtering
./target/release/uextract /path/to/Paks -o ./output --ifilter "BalanceData"

# Use usmap for property resolution
./target/release/uextract /path/to/Paks -o ./output --usmap share/borderlands.usmap

7.7 The Usmap Requirement

UE5 uses “unversioned” serialization. Properties are stored without field names:

Versioned (old):   "Damage": 50.0, "Level": 10
Unversioned (new): 0x42480000 0x0000000A
                   └── Just values, no names

To parse unversioned data, you need a usmap file containing the schema—all class definitions, property names, types, and offsets.

We generate usmap from memory dumps:

bl4 memory --dump share/dumps/game.dmp dump-usmap

# Output: mappings.usmap
# Names: 64917, Enums: 2986, Structs: 16849, Properties: 58793

The project includes a pre-generated usmap at share/manifest/mappings.usmap.


7.8 Extracting Parts from Memory

Since parts only exist at runtime, memory extraction is the path forward.

7.8.1 Step 1: Create Memory Dump

Follow Chapter 3’s instructions to capture game memory while playing.

7.8.2 Step 2: Extract Part Names

bl4 memory --dump share/dumps/game.dmp dump-parts \
    -o share/manifest/parts_dump.json

This scans for strings matching XXX_YY.part_* patterns:

{
  "DAD_AR": [
    "DAD_AR.part_barrel_01",
    "DAD_AR.part_barrel_01_a",
    "DAD_AR.part_body"
  ],
  "VLA_SM": [
    "VLA_SM.part_barrel_01"
  ]
}

7.8.3 Step 3: Build Parts Database

bl4 memory --dump share/dumps/game.dmp build-parts-db \
    -i share/manifest/parts_dump.json \
    -o share/manifest/parts_database.json

The result maps parts to categories and indices:

{
  "parts": [
    {"category": 2, "index": 0, "name": "DAD_PS.part_barrel_01"},
    {"category": 22, "index": 5, "name": "VLA_SM.part_body_a"}
  ],
  "categories": {
    "2": {"count": 74, "name": "Daedalus Pistol"},
    "22": {"count": 84, "name": "Vladof SMG"}
  }
}

!!! important “Index Ordering” Part indices from memory dumps reflect the game’s internal registration order—not alphabetical. Parts typically register in this order: unique variants, bodies, barrels, shields, magazines, scopes, grips, licensed parts. Alphabetical sorting produces wrong indices.


7.9 Working with Extracted Assets

7.9.1 Asset Structure

Extracted .uasset files follow the Zen package format:

Package
├── Header
├── Name Map (local FNames)
├── Import Map (external dependencies)
├── Export Map (objects defined here)
└── Export Data (serialized properties)

With usmap, these parse into readable JSON:

{
  "asset_path": "OakGame/Content/Gear/Weapons/_Shared/BalanceData/WeaponStats/Struct_Weapon_Barrel_Init",
  "exports": [
    {
      "class": "ScriptStruct",
      "properties": {
        "Damage_Scale": 1.0,
        "FireRate_Scale": 1.0,
        "Accuracy_Scale": 1.0
      }
    }
  ]
}

7.9.2 Finding Specific Data

# Find legendary items
find ./bl4_assets -name "*legendary*" -type f

# Find manufacturer data
find ./bl4_assets -iname "*manufacturer*"

# Search asset contents
grep -r "Linebacker" ./bl4_assets --include="*.uasset" -l

7.9.3 Stat Patterns

Stats follow naming conventions: StatName_ModifierType_Index_GUID

Modifier Meaning
Scale Multiplier (×)
Add Flat addition (+)
Value Absolute override
Percent Percentage bonus

7.10 Oodle Compression

BL4 uses Oodle compression (RAD Game Tools). The retoc tool handles decompression automatically by loading the game’s DLL:

~/.steam/steam/steamapps/common/"Borderlands 4"/Engine/Binaries/ThirdParty/Oodle/
└── oo2core_9_win64.dll

!!! tip If extraction fails with Oodle errors, verify the game is installed and the DLL path is accessible. On Linux, Wine must be able to load the DLL.


7.11 Building a Data Pipeline

An automated extraction script saves time when the game updates:

#!/bin/bash
GAME_DIR="$HOME/.steam/steam/steamapps/common/Borderlands 4"
OUTPUT_DIR="./bl4_data"
USMAP="./share/manifest/mappings.usmap"

# Extract pak files
retoc unpack "$GAME_DIR/OakGame/Content/Paks/pakchunk0-Windows_0_P.utoc" "$OUTPUT_DIR/raw"

# Parse with usmap
./target/release/uextract "$OUTPUT_DIR/raw" -o "$OUTPUT_DIR/parsed" --usmap "$USMAP"

# Generate manifest
bl4-research pak-manifest -e "$OUTPUT_DIR/parsed" -o "$OUTPUT_DIR/manifest"

7.12 Summary: Data Sources

Data Source Extractable?
Balance/stats Pak files Yes
Naming strategies Pak files Yes
Loot pools Pak files Yes
Body definitions Pak files Yes
Part definitions Memory dump Yes (via UObject array scan)
Category mappings Memory dump Yes (embedded in part UObjects)

While parts don’t exist as pak file assets, memory dumps capture the complete runtime state including all part definitions with their authoritative serial indices. The 0xFFFFFFFF sentinel pattern and UObject offset +0x20 provide reliable extraction paths.


7.13 Exercises

Exercise 1: Extract and Explore

Extract the main pak file. Find balance data for a weapon type you use. What stats does the base template define?

Exercise 2: Search for Part References

Search extracted assets for references to specific parts (like “JAK_PS.part_barrel”). Where do they appear? What references them?

Exercise 3: Compare Manufacturers

Extract assets for two manufacturers (Jakobs vs Maliwan). Compare directory structures. What patterns emerge?


7.14 What’s Next

We’ve covered the full data extraction story—what works, what doesn’t, and why. The bl4 project wraps all these techniques into command-line tools.

Next, we’ll tour those tools: how to decode serials, edit saves, extract data, and more, all from the command line.

Next: Chapter 7: Using bl4 Tools