docs/design/019-file-api.md
| author | @oleiade |
| status | 🔧 proposal |
| revisions | previous, initial |
| Proof of concept | branch |
| references | #2977 #2974 |
The current version of k6 lets users load file content via the open function, which is only accessible in the init context. However, the open function diverges from its counterparts in other languages and the Linux stack as it reads the whole file into memory rather than opening it for further interaction. This process leads to considerable memory consumption when loading large binary files (as the content ends up copied in each VUs) or when the SharedArray cannot be used.
In line with our commitment to optimize large file handling in k6, we propose introducing a new fs module. This module is intended to offer an intuitive and user-friendly API for file interactions within k6 scripts. We'll also provide some ideas for efficient file handling to minimize memory consumption during k6 execution.
Currently, files cannot be opened from within a function executed by a VU, only in the init context.
This is due to k6's design for distributed execution, particularly in the cloud. K6 runs the init context once, gathers resources, including files, and sends them to other instances where VU code runs.
open() function.open function.open function.We suggest implementing a minimalist, experimental file system (fs) module based on Deno's fs module. The new module will allow users to interact with files, separating text and binary files. The module will provide an open function that returns a file handle/view for performing read operations.
The initial API will mostly be asynchronous, except for the open functionality which will be synchronous due to the current lack of support for await operations within the init context.
The API will have the following characteristics:
A working proof of concept of the new API is available on GitHub.
/*
* openSync opens a file and returns an instance of a
* `File`.
*/
openSync(path: string): File
/*
* open opens a file and resolves to an instance of a
* `File`.
*
* Because the k6 init context does not support using await yet,
* to use this function, users must use a workaround:
*
* ```
* let f;
* (async function() {
* f = await asyncOpen("./somefile"); // name for emphasis not as a proposal
* }());
* ```
*/
open(path: string): Promise<File>
/*
* File is an abstraction to interact with
* files which exposes read-only operations.
*/
interface File {
/*
* read reads the file into an array buffer.
* resolves to either the number of bytes read during the operation
* or `null` if there was nothing to read.
*/
read(p ArrayBuffer | TypedArray | DataView): Promise<number>
/*
* readAll reads the whole content of the file and
* returns a promise that will resolve to its content
* as an `ArrayBuffer`.
*/
readAll(): Promise<ArrayBuffer>
/*
* Seek to the given `offset` under mode given by `whence`.
* The call resolves to the new position within the resource
* (bytes from the start).
*/
seek(offset: number, whence: SeekMode): Promise<number>
/*
* Resolves to a `FileInfo` describing the file.
*/
stat(): Promise<FileInfo>
/*
* close closes the file.
*/
close(): Promise<void>
}
/*
* FileInfo provides information about a file.
*/
interface FileInfo {
/*
* the filename of the file
*/
name: string
/*
* the size of the file, in bytes.
*/
size: number
}
N.B: the File operations only support working with ArrayBuffer as of this proposal. This is based on the assumption we could somewhat easily add TextDecoder support to k6 (see comments #2291 and #2440). If this assumption was to be invalidated, we could adopt the same API format and have two different read-operation variants on the File, or even expose two different kinds of files TextFileHandle and BinaryFileHandle for instance.
import { openSync, SeekMode } from 'k6/experimental/fs';
export const options = {
scenarios: {
default: {
executor: 'constant-vus',
vus: 100,
duration: '1m',
},
},
};
const file = openSync("./data.csv");
export default async function () {
let resultString = ""
let buffer = new Uint8Array(10);
let n = await file.read(buffer);
resultString += ab2str(buffer)
// Read the same data again
n = await file.read(buffer);
resultString += ab2str(buffer)
// Read the same data again
n = await file.read(buffer);
resultString += ab2str(buffer)
await file.seek(0, SeekMode.Start);
console.log(`[vu ${__VU}] resultString: ${resultString}`);
}
export default function teardown() {
file.close();
}
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
The proposed API is made feasible through the following implementation aspects:
open* receives a unique file handle linked to the same memory area, they each have unique offsets. This setup allows each VU to process file data independently without conflict or race conditions.The Deno API's FsFile our proposal is inspired by exposes a readable read-only property which is a Streams API ReadableStream allowing to stream the content of the file. We have an open issue tracking the implementation of the Streams API in k6 #2978.
See #3020
This is currently unachievable because we must anticipate which files will be opened. While some quick fixes might appear feasible (e.g., parsing the AST before execution to identify files), they quickly fall apart: What if the filename resides in a variable? A plausible solution would involve requiring users to declare necessary resources (files/folders) ahead of time. This approach would ensure these resources are captured and included in the archive for future VU code access.
We believe the proof of concept developed with this proposal illustrates the feasibility and benefits of developing such an API.