docs/algo/sona/hnsw_sona_en.md
HNSW is a approximate K-nearest neighbor(ANN) search algorithm based on navigable small world graphs with controllable hierarchy, also see in introduction of Hierarchical Navigable Small World graphs.
space, comma, tab and colon. By
default, it is
colon.space, comma, tab and colon. By default, it
is space.space, comma and tab.
By
default, it is tab.cosine-distance、l1-distance、l2-distance and jaccard-distance, between nodes.MEMORY_ONLY.ef, this is a candidate node set which are constituted as Small World Graph nodes on
same layer.ps.instance and ps.memory is the total configuration memory of ps. In order
to ensure that Angel does not hang, you need to configure memory about twice the size of the model. For PageRank, the
calculation formula of the model size is: number of nodes * 3 * 4 Byte, according to which you can estimate the size
of ps memory that needs to be configured under Graph input of different sizes10 billion edge set is about 160G in size, and a 20G * 20 configuration is sufficient. In a
situation
where resources are really tight, try to increase the number of partitions!vectorPath=hdfs://my-hdfs/nodeToVector
queryPath=hdfs://my-hdfs/queryNodeToVertor
outputPath=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "swing angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class org.apache.spark.angel.examples.graph.SwingExample \
../lib/spark-on-angel-examples-3.3.0.jar
vectorPath:$vectorPath queryPath:$queryPath outputPath:$outputPath itemSep:colon \
vecSep:space saveItemSep:tab storageLevel:MEMORY_ONLY partitionNum:4 psPartitionNum:1 \
distanceFunction:cosine-distance queryPartitionNum:4 ef:40 efConstruction:40 M:16 maxM:16 maxM0:32 mL:1.0
spark.hadoop.angel.am.appstate.timeout.ms = xxx to increase the timeout time, the default value is 600000, which is
10
minutes