scientific-skills/arboreto/references/algorithms.md
Arboreto provides two algorithms for gene regulatory network (GRN) inference, both based on the multiple regression approach.
Both algorithms follow the same inference strategy:
The key difference is computational efficiency and the underlying regression method.
Purpose: Fast GRN inference for large-scale datasets using gradient boosting.
from arboreto.algo import grnboost2
network = grnboost2(
expression_data=expression_matrix,
tf_names=tf_names,
seed=42 # For reproducibility
)
grnboost2(
expression_data, # Required: pandas DataFrame or numpy array
gene_names=None, # Required for numpy arrays
tf_names='all', # List of TF names or 'all'
verbose=False, # Print progress messages
client_or_address='local', # Dask client or scheduler address
seed=None # Random seed for reproducibility
)
Purpose: Classic Random Forest-based GRN inference, serving as the conceptual blueprint.
from arboreto.algo import genie3
network = genie3(
expression_data=expression_matrix,
tf_names=tf_names,
seed=42
)
genie3(
expression_data, # Required: pandas DataFrame or numpy array
gene_names=None, # Required for numpy arrays
tf_names='all', # List of TF names or 'all'
verbose=False, # Print progress messages
client_or_address='local', # Dask client or scheduler address
seed=None # Random seed for reproducibility
)
| Feature | GRNBoost2 | GENIE3 |
|---|---|---|
| Speed | Fast (optimized for large data) | Slower |
| Method | Gradient boosting | Random Forest |
| Best for | Large-scale data (10k+ observations) | Small-medium datasets |
| Output format | Same | Same |
| Inference strategy | Multiple regression | Multiple regression |
| Recommended | Yes (default choice) | For comparison/validation |
For advanced users, pass custom scikit-learn regressor parameters:
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
# Custom GRNBoost2 parameters
custom_grnboost2 = grnboost2(
expression_data=expression_matrix,
regressor_type='GBM',
regressor_kwargs={
'n_estimators': 100,
'max_depth': 5,
'learning_rate': 0.1
}
)
# Custom GENIE3 parameters
custom_genie3 = genie3(
expression_data=expression_matrix,
regressor_type='RF',
regressor_kwargs={
'n_estimators': 1000,
'max_features': 'sqrt'
}
)
Decision guide:
Both algorithms produce comparable regulatory networks with the same output format, making them interchangeable for most analyses.