docs/algo/sona/bruce_force_sona_en.md
BruteForce is to search topK neighbors base on distance similarity.
space, comma, tab and colon. By
default, it is
colon.space, comma, tab and colon. By default, it
is space.space, comma and tab.
By
default, it is tab.cosine-distance、l1-distance、l2-distance and jaccard-distance, between nodes.MEMORY_ONLY.ps.instance and ps.memory is the total configuration memory of ps. In order
to ensure that Angel does not hang, you need to configure memory about twice the size of the model. For PageRank, the
calculation formula of the model size is: number of nodes * 3 * 4 Byte, according to which you can estimate the size
of ps memory that needs to be configured under Graph input of different sizes10 billion edge set is about 160G in size, and a 20G * 20 configuration is sufficient. In a
situation
where resources are really tight, try to increase the number of partitions!vectorPath=hdfs://my-hdfs/nodeToVector
queryPath=hdfs://my-hdfs/queryNodeToVertor
outputPath=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "swing angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class org.apache.spark.angel.examples.graph.SwingExample \
../lib/spark-on-angel-examples-3.3.0.jar
vectorPath:$vectorPath queryPath:$queryPath outputPath:$outputPath itemSep:colon \
vecSep:space saveItemSep:tab storageLevel:MEMORY_ONLY partitionNum:4 psPartitionNum:1 \
distanceFunction:cosine-distance queryPartitionNum:4
spark.hadoop.angel.am.appstate.timeout.ms = xxx to increase the timeout time, the default value is 600000, which is
10
minutes