docs/ROADMAP.md
input(). Do we somehow detect a delay in the output..? Is there some universal flag that TUIs emit when they expect user input? Should we do this semantically with embeddings, then ask OI to review it and respond..?textualtextual offers, like could we make it easier to select text? Copy paste in and out? Code editing interface?rich artifactsinterpreter.chat(async=True) bool. OI can use this to open OI on a new thread
interpreter.await() which waits for interpreter.running (?) to = False, and interpreter.result() which returns the last assistant messages content.interpreter.functions) using regex
interpreter.functions != []:
interpreter.computer.languages to only use Pythoninterpreter.embed or computer.ai.embed) which will let us do semantic searchappdir or platformdirs stuff should be only for the TUI
interpreter.__dict__ = a dict derived from config is how the python package should be set, and this should be from the TUI. interpreter should not know about the configOpen Interpreter contains two projects which support each other, whose scopes are as follows:
core, which is dedicated to figuring out how to get LLMs to safely control a computer. Right now, this means creating a real-time code execution environment that language models can operate.terminal_interface, a text-only way for users to direct the code-running LLM running inside core. This includes functions for connecting the core to various local and hosted LLMs (which the core itself should not know about).Our guiding philosophy is minimalism, so we have also decided to explicitly consider the following as out of scope:
core beyond running code.terminal_interface beyond text (but file paths to more complex inputs, like images or video, can be included in that text).This roadmap gets pretty rough from here. More like working notes.
computer.browser:First I think we should have a part, like computer.browser.ask(query) which just hits up perplexity for fast answers to questions.
Then we want these sorts of things:
browser.open(url)browser.screenshot()browser.click()It should actually be based closely on Selenium. Copy their API so the LLM knows it.
Other than that, basically should be = to the computer module itself, at least the IO / keyboard and mouse parts.
However, for non vision models, browser.screenshot() can return the accessibility tree, not an image. And for browser.click(some text) we can use the HTML to find that text.
Here's how GPT suggests we implement the first steps of this:
Creating a Python script that automates the opening of Chrome with the necessary flags and then interacts with it to navigate to a URL and retrieve the accessibility tree involves a few steps. Here's a comprehensive approach:
Script to Launch Chrome with Remote Debugging:
--remote-debugging-port=9222 flag.Python Script for Automation:
pychrome to connect to the Chrome instance, navigate to a URL, and retrieve the accessibility tree.You'll need a script to launch Chrome. This script varies based on the operating system. Below is an example for Windows. You can adapt it for macOS or Linux by changing the path and command to start Chrome.
import subprocess
import sys
import os
def launch_chrome():
chrome_path = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe" # Update this path for your system
url = "http://localhost:9222/json/version"
subprocess.Popen([chrome_path, '--remote-debugging-port=9222'], shell=True)
print("Chrome launched with remote debugging on port 9222.")
if __name__ == "__main__":
launch_chrome()
Next, you'll use pychrome to connect to this Chrome instance. Ensure you've installed pychrome:
pip install pychrome
Here's the Python script:
import pychrome
import time
def get_accessibility_tree(tab):
# Enable the Accessibility domain
tab.call_method("Accessibility.enable")
# Get the accessibility tree
tree = tab.call_method("Accessibility.getFullAXTree")
return tree
def main():
# Create a browser instance
browser = pychrome.Browser(url="http://127.0.0.1:9222")
# Create a new tab
tab = browser.new_tab()
# Start the tab
tab.start()
# Navigate to a URL
tab.set_url("https://www.example.com")
time.sleep(3) # Wait for page to load
# Retrieve the accessibility tree
accessibility_tree = get_accessibility_tree(tab)
print(accessibility_tree)
# Stop the tab (closes it)
tab.stop()
# Close the browser
browser.close()
if __name__ == "__main__":
main()
This script will launch Chrome, connect to it, navigate to "https://www.example.com", and then print the accessibility tree to the console.
Note: The script to launch Chrome assumes a typical installation path on Windows. You will need to modify this path according to your Chrome installation location and operating system. Additionally, handling different operating systems requires conditional checks and respective commands for each OS.