docs/algo/sona/SCC_sona_en.md
SCC(strongly connected components) algorithm is used to calculate the strongly connected components of a graph.
On a directed graph, SCC algorithm assigns the same label to nodes belonging to the same strongly connected component. We implemented scc algorithm for large-scale networks based on Spark On Angel. The ps maintains the node's latest estimation of label and status. The Spark side maintains the adjacency list of the network, and pulls the latest label and status estimate in each round. Each node in graph has two states: final or non-final. The final nodes are those whose labels is certain, the non-final uncertain.
The algorithm takes the min id of the node inside the connected component as the label of that component.
DISK_ONLY/MEMORY_ONLY/MEMORY_AND_DISKinput=hdfs://my-hdfs/data
output=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "cc angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class org.apache.spark.angel.examples.graph.SCCExample \
../lib/spark-on-angel-examples-3.3.0.jar
input:$input output:$output sep:tab storageLevel:MEMORY_ONLY useBalancePartition:true \
partitionNum:4 psPartitionNum:1