Roadmap

Documentation

Work with Mintlify to translate docs. How does Mintlify let us translate our documentation automatically? I know there's a way.
Better comments throughout the package (they're like docs for contributors)
Show how to replace interpreter.llm so you can use a custom llm

New features

Future-proofing

What's in our scope?

Open Interpreter contains two projects which support each other, whose scopes are as follows:

core, which is dedicated to figuring out how to get LLMs to safely control a computer. Right now, this means creating a real-time code execution environment that language models can operate.
terminal_interface, a text-only way for users to direct the code-running LLM running inside core. This includes functions for connecting the core to various local and hosted LLMs (which the core itself should not know about).

What's not in our scope?

Our guiding philosophy is minimalism, so we have also decided to explicitly consider the following as out of scope:

Additional functions in core beyond running code.
More complex interactions with the LLM in terminal_interface beyond text (but file paths to more complex inputs, like images or video, can be included in that text).

This roadmap gets pretty rough from here. More like working notes.

Working Notes

* Roughly, how to build `computer.browser`:

First I think we should have a part, like computer.browser.ask(query) which just hits up perplexity for fast answers to questions.

Then we want these sorts of things:

browser.open(url)
browser.screenshot()
browser.click()

It should actually be based closely on Selenium. Copy their API so the LLM knows it.

Other than that, basically should be = to the computer module itself, at least the IO / keyboard and mouse parts.

However, for non vision models, browser.screenshot() can return the accessibility tree, not an image. And for browser.click(some text) we can use the HTML to find that text.

Here's how GPT suggests we implement the first steps of this:

Creating a Python script that automates the opening of Chrome with the necessary flags and then interacts with it to navigate to a URL and retrieve the accessibility tree involves a few steps. Here's a comprehensive approach:

Script to Launch Chrome with Remote Debugging:
- This script will start Chrome with the --remote-debugging-port=9222 flag.
- It will handle different platforms (Windows, macOS, Linux).
Python Script for Automation:
- This script uses pychrome to connect to the Chrome instance, navigate to a URL, and retrieve the accessibility tree.

Step 1: Launching Chrome with Remote Debugging

You'll need a script to launch Chrome. This script varies based on the operating system. Below is an example for Windows. You can adapt it for macOS or Linux by changing the path and command to start Chrome.

python

import subprocess
import sys
import os

def launch_chrome():
    chrome_path = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"  # Update this path for your system
    url = "http://localhost:9222/json/version"
    subprocess.Popen([chrome_path, '--remote-debugging-port=9222'], shell=True)
    print("Chrome launched with remote debugging on port 9222.")

if __name__ == "__main__":
    launch_chrome()

Step 2: Python Script to Navigate and Retrieve Accessibility Tree

Next, you'll use pychrome to connect to this Chrome instance. Ensure you've installed pychrome:

bash

pip install pychrome

Here's the Python script:

python

import pychrome
import time

def get_accessibility_tree(tab):
    # Enable the Accessibility domain
    tab.call_method("Accessibility.enable")

    # Get the accessibility tree
    tree = tab.call_method("Accessibility.getFullAXTree")
    return tree

def main():
    # Create a browser instance
    browser = pychrome.Browser(url="http://127.0.0.1:9222")

    # Create a new tab
    tab = browser.new_tab()

    # Start the tab
    tab.start()

    # Navigate to a URL
    tab.set_url("https://www.example.com")
    time.sleep(3)  # Wait for page to load

    # Retrieve the accessibility tree
    accessibility_tree = get_accessibility_tree(tab)
    print(accessibility_tree)

    # Stop the tab (closes it)
    tab.stop()

    # Close the browser
    browser.close()

if __name__ == "__main__":
    main()

This script will launch Chrome, connect to it, navigate to "https://www.example.com", and then print the accessibility tree to the console.

Note: The script to launch Chrome assumes a typical installation path on Windows. You will need to modify this path according to your Chrome installation location and operating system. Additionally, handling different operating systems requires conditional checks and respective commands for each OS.

Roadmap

Roadmap

Documentation

New features

Future-proofing

What's in our scope?

What's not in our scope?

Working Notes

* Roughly, how to build computer.browser:

Step 1: Launching Chrome with Remote Debugging

Step 2: Python Script to Navigate and Retrieve Accessibility Tree

* Roughly, how to build `computer.browser`: