integration/schema-language-server/README.md
This directory contains both a backend and some frontends for providing language support when writing schema files.
It uses the LSP protocol for communicating with different clients. LSP is a way to standardize the way language support is provided by defining a set of messages and types that the server and client will use to communicate over JSON RPC.
https://microsoft.github.io/language-server-protocol/
This means that the bulk of the functionality lies inside /language-server. The clients are merely small bootstrapping wrappers for creating an extension/plugin and launching the language-server.
To release the language server, start the github action Vespa Schema LSP - Deploy extension. Note that the action must be started manually. The action will publish the extension to all the supported marketplaces, including a github release. In addition the action will bump the version and create a PR with the updated version.
To publish a new release from a branch other than master, use the following command:
gh workflow run "Vespa Schema LSP - Deploy extension" --ref <branch> -F version=<major | minor | patch>
Holds our client implementations. Currently IntelliJ and VSCode. They contain the code for building, running and packaging the plugins.
Maven Project containing the language server implementation.
./ccc: CongoCC parsers. These include the Schema, Indexing language and Ranking expressions parsers, ported from JavaCC to CongoCC. The CongoCC Maven plugin generates Java classes from the parsers and places them in ./target/generated-sources/ccc.
./python: Python code from fetching documentation from the Vespa documentation GitHub repo and placing Markdown files in ./target/generated-resources
./java/ai/vespa/schemals: The actual language server logic. Files in the root of this directory contains the code for setting up the language server. It is launched by SchemaLSLauncher, while SchemaLanguageServer handles initialize and setting up capabilities. The other files are wrappers for handling incoming and outgoing requests by implementing some interfaces from LSP.
The language server needs a client to start it. Therefore running and testing the language server happens through an editor with an extension or a plugin. The language server is primary developed for VSCode, but it can run on other editors as well. This guide is for running the extension in a development environment. In the clients folder are the different extensions and plugins for the supported editors.
mvn install -pl :schema-language-server -Pschema-language-server -amd in the project root to build the language server./clients/vscode in a new VSCode window.npm is downloaded and run npm install to install the necessary dependencies.Run and Debug tab on the left of the window.Run Extension, alternatively hit F5 to run and test the extension../clients/intellij in an IntelliJ window.Tasks/intellij platform/runIde.
The server is launched as an executable by the client that wants to use it. It will then run as a separate process, and they will communicate using standard input/output. Upon initialization, the server and client will exchange 'capabilities'. The server defines the subset of the LSP specification it supports, and the client does the same.
The main bookkeeping work typically happens through textDocument/didOpen and textDocument/didChange requests. The client gives the server the entire contents of the current text document. The server then does the following:
During the above process, symbols and their relationships are registered in a global index. A symbol is anything with a user-defined identifier, for instance a field, a document, a struct etc. The remaining LSP requests simply use the index and CST generated in the parsing step.
We needed a fault tolerant parser for parsing the schema language. A fault tolerant parser can continue parsing even if there exists some syntax errors in the document - this is crucial for providing good language support, as most of the time a document is not finished.
The original Schema parser, written in JavaCC, is not fault tolerant. To avoid duplicating the entire language definition we ideally wanted a fault tolerant version of the exact same parser that is used for deploying Vespa applications. That is of course impossible, however CongoCC is the continuation of a project called "JavaCC 21", which in turn is a continuation of JavaCC. CongoCC supports fault tolerant parsing and generates an AST/CST out of the box.
The syntax in CongoCC is very similar to JavaCC, the most notable differences being SCAN instead of LOOKAHEAD, and slightly different syntax for defining a rule.
We therefore ported the JavaCC parsers for the schema language, indexing language and ranking expressions to CongoCC and added some small modifications. The modifications we made are mainly to catch some exceptions early and add some bookkeeping information to some nodes in the AST.
Many errors that can occur when creating a Vespa application are not caught in the parsing phase but much later. In order to make the language server as simple, fault tolerant and flexible as possible, we sacrificed some correctness. This means that we don't go through all the steps you usually do when deploying an applications, but try to catch as much as possible by inspecting the CST.
This means that the language server will not catch all possible mistakes you can do when writing a schema, but a correct schema should never show any errors.
We have split the "parsing" process into three steps, "parse", "identify" and "resolve". This involves a few traversals of the CST (maybe more than strictly necessary). The errors generated by "identify" and "resolve" involves some very specific inspections of the syntax tree.
This means that we don't model the 'Vespa application', but rather assign symbol types to different nodes in the syntax tree and use information about where they are to determine semantic correctness of the schema. It allows all constructs to be traced to their exact location in a text document, and symbols can reference other symbols across files.
A brief description of the types of requests we support:
We scraped /config-model for *.sd files to use for testing. The testing mainly tests that a file or directory parses and generates the appropriate number of errors.
The appropriate number of errors is defined as 0 if the schema in question is supposed to be "deployable" as-is. Otherwise it is defined as the number of errors we expect given our current implementation of the language server.
There are some things we wanted to implement but we didn't have the time for: