docs/algo/sona/closeness_sona_en.md
The Closeness Centrality of a node measures its average farness(inverse diastance) to all other nodes.This algorithm aim to detecting nodes that are able to spread information very efficiently through a graph。
Based on spark on angel and the paper "Centralities in Large Networks: Algorithms and Observations" , we we implemented a large-scale closeness algorithm。
In the implementation of closeness, hyperloglog + + cardinal counter is used to record the n-order neighbors of each vertex. Similar to the idea of hyperanf algorithm, the approximate calculation of closeness is carried out.
true / false, true is suggested when the distribution of graph vertices is unbalancedDISK_ONLY/MEMORY_ONLY/MEMORY_AND_DISKinput=hdfs://my-hdfs/data
output=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "commonfriends angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class org.apache.spark.angel.examples.cluster.ClosenessExample \
../lib/spark-on-angel-examples-3.3.0.jar
input:$input output:$output sep:tab storageLevel:MEMORY_ONLY useBalancePartition:true \
balancePartitionPercent:0.7 partitionNum:4 psPartitionNum:1 msgNumBatch:8 \
pullBatchSize:1000 verboseSaving:true src:1 dst:2 mode:yarn-cluster