plugins/plugin-computeruse/docs/IOS_CONSTRAINTS.md
Apple does not let third-party apps drive other apps. There is no equivalent
to Android's MediaProjection for full-screen capture without a user prompt,
no ScreenCaptureKit (that is macOS only), and no system-wide accessibility
API that a sandboxed iOS app can use to inject events into other apps.
This document spells out what is possible on iOS, exactly how each surface behaves, and the manual validation checklist that has to be run on a real device before any of this ships.
RPScreenRecorder.shared().startCapture(handler:completionHandler:) returns
CMSampleBuffers of the host app's window. Useful for "show me what's on
the Milady screen right now."
RPScreenRecorder.shared().startCapture(handler: { sampleBuffer, type, error in
// Receive frames for own-app capture only.
}, completionHandler: { error in
// Setup result; ReplayKit shows a system permission prompt on first use.
})
Constraints:
frameRate (default 1Hz).BGProcessingTask.A separate target with a ~50MB memory ceiling. Streams frames over an App Group shared container into the main app. The user must start the broadcast themselves via the system share-sheet picker — apps cannot programmatically launch the extension.
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer,
with sampleBufferType: RPSampleBufferType) {
// Compress + dump into the App Group container.
}
iOS 26 / 26.1 beta regression: extensions are killed within ~3 seconds
even when memory headroom is fine. We surface this as
extension_died from broadcastExtensionHandshake so callers can fall back
to foreground capture. Track Apple's feedback status before shipping this
target on iOS 26.
VNRecognizeTextRequest is on-device, free, supports ~30 languages, and
typically runs sub-300ms on modern devices.
let request = VNRecognizeTextRequest { request, error in
let observations = request.results as? [VNRecognizedTextObservation]
// Top candidate per observation; each has bounding box, text, confidence.
}
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US"]
try VNImageRequestHandler(cgImage: cg).perform([request])
This is the OCR provider the WS6 scene-builder will pick up on iOS. The
provider interface is in
eliza/plugins/plugin-computeruse/src/mobile/ocr-provider.ts.
The only sanctioned way to drive other apps. Each target app must expose
intents (Shortcuts-style). We support invocation via x-callback URL schemes
for the system apps in the static registry
(ios-app-intent-registry.ts):
mailto: with subject / body / cc / bccsms: with bodyhttp://maps.apple.com/?daddr=...&dirflg=...For richer intents (Notes append, Reminders add, Music play with a query)
the user has to donate the action via Shortcuts; we can then invoke via
AppIntent on iOS 16+. The bridge's appIntentList returns the runtime
list of donated intents this app sees.
accessibilitySnapshot walks the key window's view hierarchy and returns
accessibilityLabel / accessibilityValue / role. iOS gives us no way to
read another app's UIAccessibility tree.
Apple ships an on-device LLM under the FoundationModels framework when
Apple Intelligence is enabled. We expose this as an opportunistic
fast-path. If unavailable, the existing llama-cpp-capacitor (Qwen3-VL-2B)
local-inference path stays as the default.
Entitlement and Info.plist updates are listed below.
Stock iOS does not allow any of the following from a third-party app, and we do not pretend otherwise:
MediaProjection / ScreenCaptureKit equivalent, no system-wide
accessibility event injection.ps-equivalent.BGProcessingTask (opportunistic, OS-scheduled, not guaranteed).If a feature spec calls for any of the above, escalate it as not feasible on iOS rather than inventing a workaround that will get the app rejected or silently broken by Apple in the next OS update.
Add the following to apps/app/ios/App/App/App.entitlements:
<key>com.apple.developer.kernel.increased-memory-limit</key>
<true/>
<key>com.apple.developer.kernel.extended-virtual-addressing</key>
<true/>
Add the following to apps/app/ios/App/App/Info.plist:
<key>NSScreenCaptureDescription</key>
<string>Captures the Milady app window when you ask it to see the screen.</string>
<key>NSAppleEventsUsageDescription</key>
<string>Sends Shortcuts to other apps when you authorize an action.</string>
The screen capture string is shown on ReplayKit's first-launch system
prompt. The Apple Events string is shown only if the host (this app) ever
needs to drive AppleScript via the Catalyst surface; iOS apps that only
use UIApplication.shared.open(url:) to invoke x-callback intents do not
need it, but it costs nothing to include for forward compatibility with
Mac Catalyst builds.
The Swift code in this repo is unverified locally because there is no iOS
device available. Every file is marked
// TODO: validate on device — checklist in IOS_CONSTRAINTS.md. Before
shipping any of this:
| Surface | Simulator | Real device |
|---|---|---|
| ReplayKit foreground capture | Partial | Required |
| Broadcast extension | No | Required |
| Vision OCR | Yes | Required |
| App Intents (x-callback) | Partial | Required |
| Accessibility snapshot | Yes | Required |
| Foundation Models | No | Required |
| Memory pressure probe | Limited | Required |
For each method below, run on at least one A14-or-later iPhone running the target iOS (currently iOS 26.1, with iOS 17.6 as the floor):
probe() — assert data.platform === "ios", osVersion matches the
device, all six capability bits are present.replayKitForegroundStart — assert the system permission prompt is
shown on first use; on accept, replayKitForegroundDrain returns frames
with non-empty jpegBase64 and the device's screen resolution.broadcastExtensionHandshake — bundle the extension target via Xcode,
tap the share-sheet broadcast picker, and assert broadcastActive
transitions to true. On iOS 26 betas, verify regression.observed
eventually flips to true after the extension dies.visionOcr — render a known PNG with the string "WS9 OCR SMOKE"
into the app, run OCR with recognitionLevel: "accurate", and assert
fullText contains the source string (case-insensitive).appIntentList — verify the returned list is empty for a fresh
install and grows after the user donates intents via Shortcuts.appIntentInvoke with com.apple.mobilesafari.open-url — assert
Safari opens to the provided URL. With com.apple.MobileSMS.send-message
— assert Messages opens with the recipient and body pre-filled.accessibilitySnapshot — assert the returned tree's top-level node
has role !== "labeled" and children.length > 0 on a populated screen.foundationModelGenerate — on a device with Apple Intelligence
enabled, assert a short prompt returns non-empty text. With AI disabled,
assert the call resolves with foundation_model_unavailable.memoryPressureProbe — invoke UIApplication.shared.performMemoryWarning()
(debug-only) and assert the next probe call returns
severity >= 0.7 with lastWarningAt set.MemoryPressureSample via the
shared IPressureSignal contract in ios-bridge.ts. The bridge is the
producer; the arbiter is the consumer. WS1 will subscribe to push events
via the bridge once that channel lands; for now it polls memoryPressureProbe.selectOcrProvider() in ocr-provider.ts. Register the provider at app
boot when running on iOS:
registerOcrProvider(
createIosVisionOcrProvider(() => Capacitor.Plugins.ComputerUse),
);
os_proc_available_memory() — https://developer.apple.com/documentation/foundation/process_info/3743117-os_proc_available_memory