docs/algo/sona/deepwalk_sona_en.md
DeepWalk is a graph representation learning algorithm with depth-first traversal that repeatedly visits visited nodes. The DeepWalk algorithm uses a random walk (RandomWalk) method to sample nodes in the graph. Given the starting node of the current visit, randomly sample the node from its neighbors as the next visit node, and repeat this process until the visit sequence length meets the preset condition. This algorithm only includes the wandering part.
0 1 0.3
2 1 0.5
3 1 0.1
3 2 0.7
4 1 0.3
Among them, the meanings of the three columns of data from left to right are the source node ID, the end node ID, and the edge weight (it can be empty if there is no weight)
tab, space, etc.DISK_ONLY/MEMORY_ONLY/MEMORY_AND_DISKinput=hdfs://my-hdfs/data
output=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "deepwalk angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class com.tencent.angel.spark.examples.cluster.DeepWalkExample \
../lib/spark-on-angel-examples-3.3.0.jar
input:$input output:$output \
sep:tab storageLevel:MEMORY_ONLY useBalancePartition:true \
partitionNum:4 psPartitionNum:1 walkLength:10 needReplicateEdge:true