docs/algo/sona/hanp_sona_en.md
HANP(Hop Attenuation & Node Preference) is an algorithm for community detection based on label propagation. For more details about the algorithm, please refer to the article HANP
We have implemented the HANP algorithm on the Spark on Angel framework, which can handle large-scale industrial data. The degrees, the labels and the scores of the nodes are stored on the Angel PSs. Each Spark executor pulls the degrees, the labels and the scores of the nodes in its data partitions for computing new labels and scores, which are later pushed to the PSs for updating the information of the corresponding nodes. The pulling, computing and updating steps are iterated until the max iteration number is reached.
input=hdfs://my-hdfs/data
output=hdfs://my-hdfs/hanp_result
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class com.tencent.angel.spark.examples.cluster.HanpExample \
../lib/spark-on-angel-examples-3.3.0.jar \
input:$input output:$output isWeighted:true sep:tab maxIteration:10 preserveRate:0.1 delta:0.1 psPartitionNum:10 partitionNum:12