docs/infra/Flutter-Framework-Gardener-Rotation.md
The framework gardener role currently makes use of several tools and communication channels only available to Google employees. See On-call scheduling for Flutter for an overview of work required to make this process more open.
The framework gardener's role is to eliminate impediments to engineering velocity on the framework, tool, and related teams working on the Flutter framework repository, to unblock rolls into the repository, and to minimize the latency with which critical fixes arrive in our customers' hands. To that end, we maintain a rotation so that there is a clear owner and point of contact for Flutter repo issues, and so that engineers can plan their work around an assumption of reduced productivity during their rotation.
The framework gardener's core responsibilities are:
Gardener responsibility does not include:
As such, the gardener should use their badge and hat to:
Rotations are managed in the Rotations tool. The Framework Gardener calendar can be added to your calendar. Both of these links are currently Google internal.
Team members are not expected to participate in multiple Flutter rotations. For example, those on the engine rotation are exempt and vice versa. New team members should be added to a single rotation, depending on the team to which they belong.
Before heading out on holiday, or if you get to your shift and find you can't do it, check the upcoming rotations and find a volunteer to swap shifts with while you're out. During some holiday periods when many team members are out and activity is particularly low on the tree, it may not be essential to have a dedicated gardener.
Open the Framework build dashboard.
Unmute the tree-gardener channel and hackers-infra channel on Discord. Contributors are encouraged to escalate tree closures to you. Respond there as quickly as possible. If you'd like automatic notifications of when the tree goes red, you can also unmute the tree-status channel.
Escalate to the test owner. File GitHub issues if none are already open.
P1 priority.c: contributor-productivity label.See [Why Flutter Devicelab Tests Break].
The devices in the Firebase test lab are not managed by the Flutter infra team.
If the test failure is not a known flake or infrastructure issue, revert the commit immediately.
If the commit landed within the last 24 hours:
Revert label. Our infrastructure will automatically create a revert PR and land it.If the commit could not be automatically reverted:
revert label to the PR to allow the bot to land it without approval.analyze-linux test passes, merge it. You do not need to wait for all presubmit tests to pass, or for an LGTM.Flakes are particularly productivity-killing since they silently trigger all of the key problems the gardener is meant to prevent: red tree status. As such flakes should be treated in the same way a reproducible breakage is treated -- as though it were always failing.
If you see a test failure that appears to be a flake:
If the failure is happening on an engine roll, contact the Engine Sheriff chat so the engine sheriff can locate and revert the engine or upstream commit(s) causing the issue.
Coordinate with the engine sheriff on pausing and unpausing the Engine to Framework autoroller during this process.
Check framework benchmarks for regressions. File issues and escalate.
Review engine benchmarks for any regressions. Choose the Triage item on the left, and walk through new issues. For each commit that caused a regression you'll see marks in columns corresponding to the regression --- those marks indicate whether the results at that commit are low or high.
Click a mark, and you'll be taken to a popup with the plot of recent data around the commit in question. From here you can:
If there is a new regression not deemed to be noise in a benchmark:
team: benchmark, and severe:regression labels. Label it with the severe:performance label if the benchmark is a performance one.P0 for regressions significantly (1.5x or more) above the noted baselines, or with regular spikes that suggests a possible issue with the device lab.P2 for issues where slow creep appears to be happening.See the golden test build breakage guide.
team-infra label and a priority label:
P0 (immediate): Such as a build break or regression.
P1 (high): Users are suffering but not blocked; or, an immediate-level incident will happen if this is not addressed (e.g., almost out of quota).
The bulk of communication happens on Discord.