Split Command

Extract and visualize specific structural regions from protein structures with automatic alignment and progressive gap layout.

Usage

flatprot split STRUCTURE_FILE --regions "REGIONS" [OPTIONS]

Parameters

Required

STRUCTURE_FILE - Path to input structure file (PDB/CIF format)
--regions / -r - Comma-separated residue regions in format "CHAIN:START-END"
Examples: "A:1-100,A:150-250", "A:1-100,B:50-150,A:200-300"

Output Options

--output / -o - Output SVG file path [default: split_output.svg]
--canvas-width - Canvas width in pixels [default: 1000]
--canvas-height - Canvas height in pixels [default: 1000]

Gap Options

--gap-x - Progressive horizontal gap between domains in pixels [default: 0.0]
--gap-y - Progressive vertical gap between domains in pixels [default: 0.0]

Alignment Options

--alignment-mode - Alignment strategy [default: family-identity]
family-identity: Align each region using FoldSeek database search
inertia: Use principal component analysis alignment
--min-probability - Minimum alignment probability threshold [default: 0.5]
--foldseek / -f - FoldSeek executable path [default: foldseek]
--show-database-alignment - Enable database alignment and show family area annotations

Styling Options

--style - Custom style TOML file path
--show-positions - Position annotation level: none, minimal, major, full [default: minimal]

Input Options

--dssp - DSSP file for PDB input (required for PDB files)

Requirements

FoldSeek: Required for database alignment functionality
DSSP (mkdssp v4.4.0+): Required for PDB input files

Install via conda:

conda install bioconda::foldseek conda-forge::dssp

Alignment Modes

Family-Identity Alignment (Recommended)

Uses FoldSeek to align each region against a curated database of protein families: - Automatic database download on first use - Region-specific alignment with rotation-only transformations - Family annotations show SCOP family IDs and alignment probabilities

# Basic family-identity alignment with annotations
flatprot split protein.cif --regions "A:1-100,A:150-250" --show-database-alignment -o aligned_regions.svg

# High-confidence alignments only
flatprot split protein.cif --regions "A:1-100,A:150-250" --min-probability 0.8 --show-database-alignment

Inertia Alignment

Uses principal component analysis for structure alignment. Fast processing with no external database dependencies.

flatprot split protein.cif --regions "A:1-100,A:150-250" --alignment-mode inertia

Position Annotations

Controls residue numbering and terminus labels for each domain:

none: No position annotations
minimal (default): Only N and C terminus labels for each domain
major: Terminus labels + residue numbers for major secondary structures (≥3 residues)
full: All position annotations including single-residue elements

Progressive Gap Positioning

Each domain is offset from the previous one using progressive gaps: - Last domain remains at origin (0,0) - Previous domains offset by gap amount × position - Flexible arrangements combine gap_x and gap_y for custom layouts

# Horizontal arrangement
flatprot split protein.cif --regions "A:1-100,A:150-250,B:1-80" --gap-x 150

# Vertical arrangement
flatprot split protein.cif --regions "A:1-100,A:150-250,B:1-80" --gap-y 200

# Diagonal arrangement
flatprot split protein.cif --regions "A:1-100,A:150-250,B:1-80" --gap-x 100 --gap-y 150

Region Specification

Format

Pattern: "CHAIN:START-END"
Multiple regions: Comma-separated list
Chain IDs: Single letters (A, B, C, etc.)
Residue numbers: 1-based indexing

Examples

# Single chain, multiple domains
flatprot split protein.cif --regions "A:1-100,A:150-250,A:300-400"

# Multiple chains
flatprot split structure.cif --regions "A:1-100,B:50-150,C:20-120"

# Overlapping regions
flatprot split protein.cif --regions "A:1-120,A:80-200"  # 40 residue overlap

Examples

Basic Usage

# Simple domain splitting
flatprot split protein.cif --regions "A:1-100,A:150-250" -o domains.svg

# Multiple chains with progressive gaps
flatprot split structure.cif --regions "A:1-100,B:50-150,A:200-300" --gap-x 150 --gap-y 100 -o multi_chain.svg

Database Alignment and Annotations

# Enable family alignment with annotations
flatprot split protein.cif --regions "A:1-100,A:150-250" --show-database-alignment -o aligned_annotated.svg

# High-confidence alignments with custom threshold
flatprot split protein.cif --regions "A:1-80,A:100-180,A:200-280" --show-database-alignment --min-probability 0.7 -o high_confidence.svg

Custom Styling and Visualization

# Clean domains without annotations
flatprot split protein.cif --regions "A:10-110,A:130-230" --style custom.toml --show-positions none -o clean.svg

# Major structures with residue numbers
flatprot split protein.cif --regions "A:10-110,A:130-230" --style custom.toml --show-positions major -o detailed.svg

# Large canvas with progressive gaps
flatprot split protein.cif --regions "A:1-100,A:150-250,A:300-400" --canvas-width 1500 --gap-x 200 --gap-y 100 -o large_canvas.svg

PDB Input Workflow

# 1. Generate secondary structure
mkdssp -i protein.pdb -o protein.dssp

# 2. Split with database alignment
flatprot split protein.pdb --regions "A:1-100,A:150-200" --dssp protein.dssp --show-database-alignment -o output.svg

Comparison Studies

# Compare alignment modes
flatprot split protein.cif --regions "A:1-100,A:150-250" --alignment-mode family-identity --show-database-alignment -o family_aligned.svg
flatprot split protein.cif --regions "A:1-100,A:150-250" --alignment-mode inertia -o inertia_aligned.svg

Troubleshooting

Common Issues

"DSSP file required for PDB input" error:

# Generate DSSP file first
mkdssp -i structure.pdb -o structure.dssp
flatprot split structure.pdb --regions "A:1-100" --dssp structure.dssp

"Invalid region format" error:

# Check region format
flatprot split protein.cif --regions "A:1-100,B:50-150"  # Correct
flatprot split protein.cif --regions "A:1:100,B:50-150"  # Incorrect (colon instead of dash)

"No successful alignments found" error:

# Lower probability threshold
flatprot split protein.cif --regions "A:1-100" --min-probability 0.3 --show-database-alignment

# Or disable database alignment
flatprot split protein.cif --regions "A:1-100" --alignment-mode inertia

"Chain not found in structure" error:

# Check available chains in PDB
grep "^ATOM" structure.pdb | awk '{print $5}' | sort -u
# Or examine CIF structure to verify chain IDs

Database download issues:

# Check network connectivity and FoldSeek installation
which foldseek

# Use inertia mode as fallback
flatprot split protein.cif --regions "A:1-100" --alignment-mode inertia

Performance Tips

Speed Optimization

Use CIF input (no DSSP file generation required)
Disable database alignment for quick gap-based positioning only
Smaller canvas sizes for faster processing
Fewer regions reduce extraction and alignment time

Quality Optimization

Enable database alignment for biologically meaningful orientations
Higher alignment probability thresholds for confident annotations
Custom styling for publication-ready output
Larger canvas sizes for detailed visualization

Memory Optimization

Limit number of regions for memory-constrained systems
Smaller region sizes reduce memory usage
Use minimal/none position annotations for simpler scenes

Integration with Other Commands

Workflow with Align Command

# 1. Explore protein family alignment
flatprot align protein.cif -i alignment_info.json

# 2. Extract domains with family-specific alignment
flatprot split protein.cif --regions "A:1-100,A:150-250" --show-database-alignment -o aligned_domains.svg

Workflow with Project Command

# 1. Create full structure projection
flatprot project protein.cif -o full_structure.svg

# 2. Create domain-specific split view
flatprot split protein.cif --regions "A:1-100,A:150-250" --gap-x 150 -o domain_split.svg