docs/algo/sona/hnsw_sona.md
HNSW(Hierarchical Navigable Small World)是基于图的一种近似最近邻搜索算法(Approximate Nearest
Neighbor,ANN),通过构建具有分层结构、可导航的、类似跳表结构的搜索图来快速地在大规模高维向量数据中召回topK邻居。HNSW算法出自于论文《Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs》。
storageLevel:RDD存储级别(可选值参考 ),默认为MEMORY_ONLY
vectorPath=hdfs://my-hdfs/nodeToVector
queryPath=hdfs://my-hdfs/queryNodeToVertor
outputPath=hdfs://my-hdfs/output
source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster\
--conf spark.ps.instances=1 \
--conf spark.ps.cores=1 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.memory=10g \
--name "swing angel" \
--jars $SONA_SPARK_JARS \
--driver-memory 5g \
--num-executors 1 \
--executor-cores 4 \
--executor-memory 10g \
--class org.apache.spark.angel.examples.graph.SwingExample \
../lib/spark-on-angel-examples-3.3.0.jar
vectorPath:$vectorPath queryPath:$queryPath outputPath:$outputPath itemSep:colon vecSep:space saveItemSep:tab storageLevel:MEMORY_ONLY \
partitionNum:4 psPartitionNum:1 distanceFunction:cosine-distance queryPartitionNum:4 ef:40 efConstruction:40 M:16 maxM:16 maxM0:32 mL:1.0