Survival Support Vector Machines

Overview

Survival Support Vector Machines (SVMs) adapt the traditional SVM framework to survival analysis with censored data. They optimize a ranking objective that encourages correct ordering of survival times.

Core Idea

SVMs for survival analysis learn a function f(x) that produces risk scores, where the optimization ensures that subjects with shorter survival times receive higher risk scores than those with longer times.

When to Use Survival SVMs

Appropriate for:

Medium-sized datasets (typically 100-10,000 samples)
Need for non-linear decision boundaries (kernel SVMs)
Want margin-based learning with regularization
Have well-defined feature space

Not ideal for:

Very large datasets (>100,000 samples) - ensemble methods may be faster
Need interpretable coefficients - use Cox models instead
Require survival function estimates - use Random Survival Forest
Very high dimensional data - use regularized Cox or gradient boosting

Model Types

FastSurvivalSVM

Linear survival SVM optimized for speed using coordinate descent.

When to Use:

Linear relationships expected
Large datasets where speed matters
Want fast training and prediction

Key Parameters:

alpha: Regularization parameter (default: 1.0)
- Higher = more regularization
rank_ratio: Trade-off between ranking and regression (default: 1.0)
max_iter: Maximum iterations (default: 20)
tol: Tolerance for stopping criterion (default: 1e-5)

python

from sksurv.svm import FastSurvivalSVM

# Fit linear survival SVM
estimator = FastSurvivalSVM(alpha=1.0, max_iter=100, tol=1e-5, random_state=42)
estimator.fit(X, y)

# Predict risk scores
risk_scores = estimator.predict(X_test)

FastKernelSurvivalSVM

Kernel survival SVM for non-linear relationships.

When to Use:

Non-linear relationships between features and survival
Medium-sized datasets
Can afford longer training time for better performance

Kernel Options:

'linear': Linear kernel, equivalent to FastSurvivalSVM
'poly': Polynomial kernel
'rbf': Radial basis function (Gaussian) kernel - most common
'sigmoid': Sigmoid kernel
Custom kernel function

Key Parameters:

alpha: Regularization parameter (default: 1.0)
kernel: Kernel function (default: 'rbf')
gamma: Kernel coefficient for rbf, poly, sigmoid
degree: Degree for polynomial kernel
coef0: Independent term for poly and sigmoid
rank_ratio: Trade-off parameter (default: 1.0)
max_iter: Maximum iterations (default: 20)

python

from sksurv.svm import FastKernelSurvivalSVM

# Fit RBF kernel survival SVM
estimator = FastKernelSurvivalSVM(
    alpha=1.0,
    kernel='rbf',
    gamma='scale',
    max_iter=50,
    random_state=42
)
estimator.fit(X, y)

# Predict risk scores
risk_scores = estimator.predict(X_test)

HingeLossSurvivalSVM

Survival SVM using hinge loss, more similar to classification SVM.

When to Use:

Want hinge loss instead of squared hinge
Sparse solutions desired
Similar behavior to classification SVMs

Key Parameters:

alpha: Regularization parameter
fit_intercept: Whether to fit intercept term (default: False)

python

from sksurv.svm import HingeLossSurvivalSVM

# Fit hinge loss SVM
estimator = HingeLossSurvivalSVM(alpha=1.0, fit_intercept=False, random_state=42)
estimator.fit(X, y)

# Predict risk scores
risk_scores = estimator.predict(X_test)

NaiveSurvivalSVM

Original formulation of survival SVM using quadratic programming.

When to Use:

Small datasets
Research/benchmarking purposes
Other methods don't converge

Limitations:

Slower than Fast variants
Less scalable

python

from sksurv.svm import NaiveSurvivalSVM

# Fit naive SVM (slower)
estimator = NaiveSurvivalSVM(alpha=1.0, random_state=42)
estimator.fit(X, y)

# Predict
risk_scores = estimator.predict(X_test)

MinlipSurvivalAnalysis

Survival analysis using minimizing Lipschitz constant approach.

When to Use:

Want different optimization objective
Research applications
Alternative to standard survival SVMs

python

from sksurv.svm import MinlipSurvivalAnalysis

# Fit Minlip model
estimator = MinlipSurvivalAnalysis(alpha=1.0, random_state=42)
estimator.fit(X, y)

# Predict
risk_scores = estimator.predict(X_test)

Hyperparameter Tuning

Tuning Alpha (Regularization)

python

from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer

# Define parameter grid
param_grid = {
    'alpha': [0.1, 0.5, 1.0, 5.0, 10.0, 50.0]
}

# Grid search
cv = GridSearchCV(
    FastSurvivalSVM(),
    param_grid,
    scoring=as_concordance_index_ipcw_scorer(),
    cv=5,
    n_jobs=-1
)
cv.fit(X, y)

print(f"Best alpha: {cv.best_params_['alpha']}")
print(f"Best C-index: {cv.best_score_:.3f}")

Tuning Kernel Parameters

python

from sklearn.model_selection import GridSearchCV

# Define parameter grid for kernel SVM
param_grid = {
    'alpha': [0.1, 1.0, 10.0],
    'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1.0]
}

# Grid search
cv = GridSearchCV(
    FastKernelSurvivalSVM(kernel='rbf'),
    param_grid,
    scoring=as_concordance_index_ipcw_scorer(),
    cv=5,
    n_jobs=-1
)
cv.fit(X, y)

print(f"Best parameters: {cv.best_params_}")
print(f"Best C-index: {cv.best_score_:.3f}")

Clinical Kernel Transform

ClinicalKernelTransform

Special kernel that combines clinical features with molecular data for improved predictions in medical applications.

Use Case:

Have both clinical variables (age, stage, etc.) and high-dimensional molecular data (gene expression, genomics)
Clinical features should have different weighting
Want to integrate heterogeneous data types

Key Parameters:

fit_once: Whether to fit kernel once or refit during cross-validation (default: False)
Clinical features should be passed separately from molecular features

python

from sksurv.kernels import ClinicalKernelTransform
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.pipeline import make_pipeline

# Separate clinical and molecular features
clinical_features = ['age', 'stage', 'grade']
X_clinical = X[clinical_features]
X_molecular = X.drop(clinical_features, axis=1)

# Create pipeline with clinical kernel
estimator = make_pipeline(
    ClinicalKernelTransform(),
    FastKernelSurvivalSVM()
)

# Fit model
# ClinicalKernelTransform expects tuple (clinical, molecular)
X_combined = list(zip(X_clinical.values, X_molecular.values))
estimator.fit(X_combined, y)

Practical Examples

Example 1: Linear SVM with Cross-Validation

python

from sksurv.svm import FastSurvivalSVM
from sklearn.model_selection import cross_val_score
from sksurv.metrics import as_concordance_index_ipcw_scorer
from sklearn.preprocessing import StandardScaler

# Standardize features (important for SVMs!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create model
svm = FastSurvivalSVM(alpha=1.0, max_iter=100, random_state=42)

# Cross-validation
scores = cross_val_score(
    svm, X_scaled, y,
    cv=5,
    scoring=as_concordance_index_ipcw_scorer(),
    n_jobs=-1
)

print(f"Mean C-index: {scores.mean():.3f} (±{scores.std():.3f})")

Example 2: Kernel SVM with Different Kernels

python

from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import train_test_split
from sksurv.metrics import concordance_index_ipcw

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Compare different kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = {}

for kernel in kernels:
    # Fit model
    svm = FastKernelSurvivalSVM(kernel=kernel, alpha=1.0, random_state=42)
    svm.fit(X_train_scaled, y_train)

    # Predict
    risk_scores = svm.predict(X_test_scaled)

    # Evaluate
    c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
    results[kernel] = c_index

    print(f"{kernel:10s}: C-index = {c_index:.3f}")

# Best kernel
best_kernel = max(results, key=results.get)
print(f"\nBest kernel: {best_kernel} (C-index = {results[best_kernel]:.3f})")

Example 3: Full Pipeline with Hyperparameter Tuning

python

from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sksurv.metrics import as_concordance_index_ipcw_scorer

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', FastKernelSurvivalSVM(kernel='rbf'))
])

# Define parameter grid
param_grid = {
    'svm__alpha': [0.1, 1.0, 10.0],
    'svm__gamma': ['scale', 0.01, 0.1, 1.0]
}

# Grid search
cv = GridSearchCV(
    pipeline,
    param_grid,
    scoring=as_concordance_index_ipcw_scorer(),
    cv=5,
    n_jobs=-1,
    verbose=1
)
cv.fit(X_train, y_train)

# Best model
best_model = cv.best_estimator_
print(f"Best parameters: {cv.best_params_}")
print(f"Best CV C-index: {cv.best_score_:.3f}")

# Evaluate on test set
risk_scores = best_model.predict(X_test)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
print(f"Test C-index: {c_index:.3f}")

Important Considerations

Feature Scaling

CRITICAL: Always standardize features before using SVMs!

python

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Computational Complexity

FastSurvivalSVM: O(n × p) per iteration - fast
FastKernelSurvivalSVM: O(n² × p) - slower, scales quadratically
NaiveSurvivalSVM: O(n³) - very slow for large datasets

For large datasets (>10,000 samples), prefer:

FastSurvivalSVM (linear)
Gradient Boosting
Random Survival Forest

When SVMs May Not Be Best Choice

Very large datasets: Ensemble methods are faster
Need survival functions: Use Random Survival Forest or Cox models
Need interpretability: Use Cox models
Very high dimensional: Use penalized Cox (Coxnet) or gradient boosting with feature selection

Model Selection Guide

Model	Speed	Non-linearity	Scalability	Interpretability
FastSurvivalSVM	Fast	No	High	Medium
FastKernelSurvivalSVM	Medium	Yes	Medium	Low
HingeLossSurvivalSVM	Fast	No	High	Medium
NaiveSurvivalSVM	Slow	No	Low	Medium

General Recommendations:

Start with FastSurvivalSVM for baseline
Try FastKernelSurvivalSVM with RBF if non-linearity expected
Use grid search to tune alpha and gamma
Always standardize features
Compare with Random Survival Forest and Gradient Boosting