plugins/evaluation-plugin/README.md
The plugin deals with the evaluation of IDE features based on artificial queries. General approach:
It's not only about numerical value of quality. HTML-reports contain examples of source code with the results of invocation, so you can see how the feature works in specific situations.
https://buildserver.labs.intellij.net/guestAuth/repository/download/ijplatform_IntelliJProjectDependencies_CodeCompletionProjects_CompletionEvaluation_BuildFromIdea/lastSuccessful/updatePlugins.xml. InstructionEvaluation Plugin in Marketplace.{
"context": "ALL", // ALL, PREVIOUS
"prefix": {
"name": "SimplePrefix", // SimplePrefix (type 1 or more letters), CapitalizePrefix or NoPrefix
"n": 1
},
"filters": { // set of filters that allow to filter some completion locations out
"statementTypes": [ // possible values: METHOD_CALL, FIELD, VARIABLE, TYPE_REFERENCE, ARGUMENT_NAME
"METHOD_CALL"
],
"isStatic": true, // null / true / false
"packageRegex": ".*" // regex to check if java package of resulting token is suitable for evaluation
}
}
{
"mode": "TOKENS", // call completion only in meaningful tokens or everywhere; possible values: TOKENS, ALL
"invokeOnEachChar": true, // close popup after unsuccessful completion and invoke again (only for line-completion-golf feature)
"topN": 5, // take only N top proposals, applying after filtering by source
"checkLine": true, // accept multi token proposals
"source": "INTELLIJ", // take suggestions, with specific source; possible values: INTELLIJ (full-line), TAB_NINE, CODOTA
"suggestionsProvider": "DEFAULT" // provider of proposals (DEFAULT - completion engine), can be extended
}
pathToModelZip to use custom ranking model for the completions (do not pass source in this case to use the suggestions from all contributors){
"placeholderName": "DUMMY", // identifier for renaming existing variables
"suggestionsProvider": "DEFAULT", // provider of proposals (DEFAULT - IDE refactoring engine, LLM-rename - proposals of LLM plugin), can be extended
"filters": {
"statementTypes": null // currently not supported
}
}
You can find descriptions of all metrics in the code (com.intellij.cce.metric.Metric.getDescription).
Most of the metrics are also described here.
The plugin works in the headless mode of IDE. To start the evaluation you should describe where the project to evaluate is placed and rules for evaluation (language, strategy, output directories, etc.). We use JSON file for such king of description. Here is an example of such file with description for possible options but the strategy block depends on the feature used for evaluation.
{
"projectPath": "", // string with path to idea project
"language": "Java",
"outputDir": "", // string with path to output directory
"strategy": { // describes parameters of evaluation - depends on the feature (example below is for token-completion)
"context": "ALL",
"prefix": {
"name": "SimplePrefix",
"n": 1
},
"filters": {
"statementTypes": [
"METHOD_CALL"
],
"isStatic": true,
"packageRegex": ".*"
}
},
"actions": { // part of config about actions generation step
"evaluationRoots": [], // list of string with paths to files/directories for evaluation
"ignoreFileNames": [] // list of file/directory names to be ignored inside evaluationRoots
},
"interpret": { // part of config about actions interpretation step
"sessionProbability": 1.0, // probability that session won't be skipped
"sessionSeed": null, // seed for random (for previous option)
"saveLogs": false, // save completion logs or not (only if stats-collector plugin installed)
"logsTrainingPercentage": 70 // percentage for logs separation on training/validate
},
"reports": { // part of config about report generation step
"evaluationTitle": "Basic", // header name in HTML-report (use different names for report generation on multiple evaluations)
"sessionsFilters": [ // create multiple reports corresponding to these sessions filters (filter "All" creates by default)
{
"name": "Static method calls only",
"filters": {
"statementTypes": [
"METHOD_CALL"
],
"isStatic": true,
"packageRegex": ".*"
}
}
],
"comparisonFilters": []
}
}
Example of config.json to evaluate code completion on several modules from intellij-community project
{
"projectPath": "PATH_TO_COMMUNITY_PROJECT",
"language": "Java",
"outputDir": "PATH_TO_COMMUNITY_PROJECT/completion-evaluation",
"strategy": {
"type": "BASIC",
"context": "ALL",
"prefix": {
"name": "SimplePrefix",
"n": 1
},
"filters": {
"statementTypes": [
"METHOD_CALL"
],
"isStatic": null,
"packageRegex": ".*"
}
},
"actions": {
"evaluationRoots": [
"java/java-indexing-impl",
"java/java-analysis-impl",
"platform/analysis-impl",
"platform/core-impl",
"platform/indexing-impl",
"platform/vcs-impl",
"platform/xdebugger-impl",
"plugins/git4idea",
"plugins/java-decompiler",
"plugins/gradle",
"plugins/markdown",
"plugins/sh",
"plugins/terminal",
"plugins/yaml"
]
},
"interpret": {
"experimentGroup": null,
"sessionProbability": 1.0,
"sessionSeed": null,
"saveLogs": false,
"saveFeatures": false,
"logLocationAndItemText": false,
"trainTestSplit": 70
},
"reports": {
"evaluationTitle": "Basic",
"sessionsFilters": [],
"comparisonFilters": []
}
}
There are several options for the running plugin:
ml-evaluate full FEATURE_NAME [PATH_TO_CONFIG]PATH_TO_CONFIG missing, default config will be created.custom mode.
ml-evaluate actions FEATURE_NAME [PATH_TO_CONFIG]ml-evaluate custom FEATURE_NAME [--interpret-actions | -i] [--generate-report | -r] PATH_TO_WORKSPACEml-evaluate multiple-evaluations FEATURE_NAME PATH_TO_WORKSPACE...ml-evaluate compare-in FEATURE_NAME PATH_TO_DIRECTORYThere are many ways to start the evaluation in headless mode. Some of them are listed below.
line-completion feature among Machine Learning/[full-line] Completion Evaluation for <Language>IDEA or another IDE) add required options:
-Djava.awt.headless=true to jvm-optionsml-evaluate OPTION FEATURE_NAME OPTION_ARGS to cli arguments-Djava.awt.headless=true to jvm-options. Instruction.<Intellij IDEA> ml-evaluate OPTION FEATURE_NAME OPTION_ARGS with corresponding option and feature.We have a set of build configurations on TeamCity based of evaluation-plugin project. Most of them are devoted to estimating quality of code completion in different languages and products.
On top level there are few configurations: Build (compiles the plugin) and Test (checks everything still work). Below there is a bunch of language-specific projects - Java, Python, Kotlin, etc. Each of these projects contains a set of build configurations. They can be split on three groups:
Evaluate (ML/Basic) * - takes the latest build of IDE/plugin and starts the evaluation process.
Usually takes 30 - 120 minutes.Compare ML and basic * - takes output of corresponding "Evaluate * builds" and creates
a comparison report (see build artifacts).Generate logs * - takes nightly IDE build, latest evaluation plugin build and starts evaluation.
During the evaluation it collects the same logs we send from users.
These logs can be fed into ML Pipeline project.Q: How can I compare default completion quality vs ML?
A: Run Evaluate ML * and Evaluate Basic * configurations (perhaps, simultaneously).
After they finish just start the corresponding Compare ML and Basic * configuration.
Q: I implemented collecting for a new feature into completion logs. How can I check if the feature is collected and has any impact on completion quality?
A: Start Generate logs * configuration. Once it finished, start Build * model in ML Pipeline project.
Q: I want the similar reports for a new language.
A: Contact Alexey Kalina. The main challenge here is to set up SDK and project to evaluate on in the headless mode. If you can DIY we can provide assistance where to add that.
Q: I want to compare quality with a specific parameters but cannot find a suitable build configuration. What can I do?
A: Contact Alexey Kalina or Vitaliy Bibaev.