GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.md
In this section, we demonstrate some applications of our BSim database.
In order to query the database, you must register it with Ghidra:
example.mv.dbBefore presenting the exercises, we describe the general mechanics of querying a BSim database.
There are a number of ways to initiate a BSim query, including:
For these cases, the function(s) being queried depend on the current selection.
If there is no selection, the function containing the current address is queried.
If there is a selection, all functions whose entry points are within the selection are queried.
An easy way to query all functions in a program is to select all addresses with Ctrl-A in the Listing window and then initiate a BSim query.
It is also possible to initiate a BSim query from the Decompiler window. Simply right-click on a function name token and select BSim... to query the corresponding function. This action is available on the name token in the decompiled function's signature as well as tokens corresponding to names of callees.
All of these actions bring up the BSim Search Dialog.
From the BSim Search Dialog, you can
To query a registered BSim database, select that server from the BSim Server drop-down.
Similarity and confidence are scores used to evaluate the relationship between two vectors. The respective fields in the dialog set lower bounds for these values for the matches returned by BSim.
Similarity
Confidence
Confidence is used to judge the significance of a match. For example, many executables contain a function which simply returns a constant value. Given two executables, each with such a function, the similarity score between the corresponding BSim vectors will be 1.0. However, the confidence score of the match will be quite low, indicating that it is not very significant that the two executables "share" this code.
In general, setting the thresholds involves a tradeoff: lower values mean that the database is more likely to return legitimate matches with significant differences, but also more likely to return matches which simply happen to share some features by chance. The results of a BSim query can be sorted by the similarity and/or confidence of each match, so a common practice is to set the thresholds relatively low and to examine the matches in descending sort order.
The Matches per Function bound controls the number of results returned for a single function. Note that in large collections, certain small or common functions might have substantial numbers of identical matches.
Filters are discussed in BSim Filters.
Click the Search button in the dialog to perform a query.
After successfully issuing a query, you will also see a Search Function(s) action (without the ellipsis) in certain contexts. This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Search Dialog).
The database example contains vectors from a Linux executable used by Ghidra's GNU demangler.
Ghidra ships with several other versions of this executable.
We use these different versions to demonstrate some of the capabilities of BSim.
Note: Use the default query settings and autoanalysis options for the exercises unless otherwise specified.
<ghidra_install_dir>/GPL/DemanglerGnu/os/win_x86_64/demangler_gnu_v2_41.exe.
demangler_gnu_v2_41 but compiled with Visual Studio instead of GCC.demangler_gnu_v2_41.example for matches to the function at 140006760.Note: We cover the Decompiler View in greater detail and discuss the various "Apply" actions in Evaluating Matches and Applying Information.
<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_24.
example.expandargv in demangler_gnu_v2_24 and issue a BSim query.<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_24/c/argv.c<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_41/c/argv.c<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41.
example but compiled for a different architecture._expandargv and issue a BSim query with a similarity bound of 0.5.
In the decompiler diff view of the single match, what differences do you see regarding memmove and memcpy?
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
Q: If you set the similarity and confidence thresholds to 0.0, will a BSim query return all of the functions in the database?
A: No, because
Next Section: Ghidra from the Command Line