← Back to app
Integration Guide

Validating ESMFold2 & ESM3 Structures

CZ Biohub's ESMFold2 and ESM3 produce high-quality structure predictions — but pLDDT and pTM only measure model confidence. They don't check rotamer geometry, clashscores, backbone validity, or docking readiness. That gap is what StructSure fills.

From Biohub's own model card: "Model proposals may not be physically realizable; pLDDT/pTM are helpful but imperfect. Not intended for clinical or therapeutic applications without further validation."

StructSure is that validation step.

Workflow

ESMFold2 / ESM3
Biohub API → fold sequence or generate novel protein
StructSure
Validate → clean → repair → docking-ready export
Docking / MD simulation
ColabDock, ClusPro, ZDOCK, GROMACS, AMBER…

Quick Start

  1. Fold your sequence with Biohub
    from esm.sdk.forge import SequenceStructureForgeInferenceClient
    
    client = SequenceStructureForgeInferenceClient(
        model="esmfold2-fast-2026-05",
        url="https://biohub.ai",
        token="<your Biohub API token>",
    )
    
    result = client.fold(sequence="MKTAYIAKQRQISFVKSHFSRQ...")
    result.to_pdb("my_protein.pdb")
  2. Validate for free — check before you clean

    Upload to Validate in the app (free). You'll get a safe_for_docking flag, clash count, gap list, and missing atom count before spending credits on cleanup.

  3. Clean and repair with the CLI
    # Fills missing atoms, normalises chains, reports gaps
    pdbprep submit \
      --recipe structure_cleanup \
      --file my_protein.pdb \
      --output cleaned.pdb
  4. Export for docking (ColabDock, ClusPro, ZDOCK)
    pdbprep submit \
      --recipe docking_ready_export \
      --file my_protein.pdb \
      --options '{"target_service": "colabdock"}' \
      --output docking_ready.pdb

    The colabdock preset strips hydrogens, removes OXT terminal oxygens, renumbers residues from 1 per chain, reorders atom serials, and auto-detects VHH / nanobody chains to put the antigen first.

What StructSure Catches That pLDDT Doesn't

Issue pLDDT / pTM StructSure
Backbone gaps (missing residues)✓ detected + reported
Missing heavy atoms✓ filled where safe
Steric clashes✓ counted
Non-standard chain IDs✓ normalised
Alternate location records✓ resolved
Insertion codes✓ removed
Non-sequential residue numbering✓ renumbered
Hydrogen atoms (docking incompatibility)✓ stripped
Safe-for-docking verdict✓ explicit flag

Which Recipe to Use

GoalRecipeCost
Quick sanity checkvalidate_onlyFree
Feed into ABB3 or RFDiffusionstructure_cleanup~$0.15 / structure
ColabDock / ClusPro / ZDOCKdocking_ready_export + colabdock preset~$0.15 / structure
GROMACS / AMBER MDmd_preparation~$0.20 / structure
Screen 100+ ESM3 designsBatch validate → cleanup passing onlyFree to triage

ESMFold2 vs ESM3: What to Expect

ESMFold2

Folds a single sequence to all-atom coordinates. Output is generally clean but may have missing side-chain atoms in low-confidence regions (pLDDT < 70) and occasionally non-standard chain labelling. structure_cleanup handles both.

ESM3

A generative model that can hallucinate structurally plausible but geometrically strained regions, especially when conditioning on partial functional motifs. Run validate_only first to triage candidates before paying for cleanup — the free validation report tells you which structures are worth cleaning.

Batch Screening Workflow

When screening ESM3 designs, validate all candidates in one batch, filter on safe_for_docking == true and clash_count == 0, then clean only the structures that pass.

import requests, zipfile, pathlib

# Zip your ESM3 outputs
with zipfile.ZipFile("candidates.zip", "w") as zf:
    for pdb in pathlib.Path("esm3_outputs").glob("*.pdb"):
        zf.write(pdb, pdb.name)

# Submit batch validate (free)
r = requests.post(
    "https://api.structsure.bio/v1/batches",
    json={
        "recipe_id": "validate_only",
        "recipe_version": "1.0.0",
        "input_format": "zip",
        "batch_mode": {"file_glob": "**/*.pdb", "stop_on_error": False},
    },
)
batch = r.json()
requests.put(batch["upload"]["url"], data=open("candidates.zip", "rb"))
requests.post(f"https://api.structsure.bio/v1/batches/{batch['batch_id']}/submit")

# Poll until done, download batch_report.csv
# Filter: safe_for_docking == true, clash_count == 0
# Submit passing structures for structure_cleanup

Privacy Notes

StructSure processes files ephemerally. Input files are deleted immediately after processing begins; output files are available for 30–60 minutes, then deleted. No user accounts or file contents are retained — only SHA-256 hashes and metrics for reproducibility.

Your sequences stay within Biohub's infrastructure. Fold with your own Biohub token, then upload only the resulting PDB to StructSure. StructSure never sees your sequence.

Open StructSure App Biohub ESMFold2 Docs ↗ Biohub Binder Design Tutorial ↗