docs/algo/sona/pagerank_on_sona_en.md
The PageRank algorithm is probably the most famous node importance evaluation algorithm. It was originally proposed by Larry Page and used in the ranking of web pages in Google search. For details, please refer to the paperThe PageRank Citation Ranking:Bringing Order to the Web.
We implemented large-scale PageRank calculation based on Spark On Angel, where ps maintains information of all nodes, including receiving and sending messages and rank value vectors. The calculation of the message and rank value is completed on the spark executor side, and the update is completed through the push / update operation of ps.
input=hdfs://my-hdfs/data
output=hdfs://my-hdfs/model
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class com.tencent.angel.spark.examples.cluster.PageRankExample \
../lib/spark-on-angel-examples-3.3.0.jar \
input:$input output:$output tol:0.01 resetProp:0.15 version:edge-cut batchSize:1000 psPartitionNum:10 dataPartitionNum:10