Back to Tock

Tock Core Notes 2021

doc/wg/core/notes/core-notes-2021-02-12.md

latest16.8 KB
Original Source

Tock Core Notes 2021

Attending

  • Pat Pannuto
  • Amit Levy
  • Leon Schuermann
  • Hudson Ayers
  • Alistair
  • Arjun Deopujari
  • Brad Campbell
  • Johnathan Van Why
  • Philip Levis
  • Vadim Sukhomlinov

Updates

  • johnathan - libtock-rs: internal deadline, racing ahead in personal branch to go faster; will slow down testing though
  • phil - 2.0 progress: down to the last 2! nrf serialization and IPC. IPC likely the hardest for many reasons. Serializiation is ported, but userspace library isn't working for reasons.. unknown. Hoping someone more familiar with the serializiation can take a look.
    • brad: why did the syscall interface change
    • phil: because it was performing operations on allows
    • brad: but if we just made an excpetion here, we wouldn't have to do anything...
    • phil: in theory, but what if it still doesn't work?
    • brad: don't feel like debugging any more than you do...
    • phil: should we cut it?
    • brad: no! it's our only working BLE stack...
    • [...] (indecision)
    • amit: porting it without fixing the system calls is simpler, and more like to just work
    • amit: would be nice to fix the interface in addition, but maybe this makes it more of a stepwise change
    • amit: I can try to help out with this
    • amit: think Branden originally wrote the driver, but I have at least some familiarity
    • amit: still, Brad's suggestion to 'get it working' might be a good start
    • phil: but then it will never get fixed..
    • amit: was thinking 'within the same PR'
    • phil: ah, incremental development makes sense. Get it working with old syscall semantics first
    • phil: but vehemently opposed to leaving this unchanged for 2.0
    • phil: okay, I will try the stepwise then; start with preserving operation-on-allow, see if that works at least

TBF Headers

  • amit: There are two header permission PRs, both just documentation (specification)
  • amit: #2172 is a general notion, adding a list of system calls that an app declares what it will need to use
  • amit: #2177 is a specific set of permissions for flash
  • amit: Question 1, does this align with the security model
  • amit: Is it fair to say that in a sense this is similar to Android's manifest permissions?
  • amit: Declaration vs enforcement are separate questions
  • alistair: yes that's fair; and systems such as OT can have additional
  • johnathan: in Android analogy: this PR is the app declaring its permissions in a manifest, but it's not the Settings app giving toggles; which is why allowing apps to control this is not at odds with the threat model
  • amit: So for OT, part of this will expect that apps (and these manifests) are signed
  • alistair: Yes, but in the case of OT, the theory is that users will also be able to load apps from an untrusted third party. So there the kernel will have a list of permissions it is willing to give to apps
  • amit: so this serves two ends, early-detection that an app can't run, and an opportunity for an app to sandbox itself
  • amit: perm's granted are only the intersection of those allowed and those it needs
  • alistair: Yes.
  • amit: This seems sane to me, and there seems to be at least one good use case for it.
  • amit: The other question is whether this is all-encompassing (does this cover all cases we care about; e.g. the non-OT cases)? If not, does this add additional, unnecessary bloat that could be solved better?
  • amit: My personal take, so far, is to not see a reason not to do this
  • alistair: My goal was to make this not OT-specific
  • amit: Would a reasonable policy, e.g., for a dev board (e.g. hail) that tockloader inspects these permissions and verifies at flash-time that the flasher wants to give these permissions
  • alistair: Yeah, on a dev board, the permissions are generated, but dev boards just allows them all. For something like OT, there are more likely to be flash time and runtime checks of these
  • amit: Pat just added some specific Q's to the PR, but are there any high level issues?
  • phil: No high level issues; some questions on the actual structure, e.g. are we restricting ourselves to just 32 commands given the data structure here? [nvm, not something needed to discuss here; will take offline]
  • johnathan: the allowed commands feild is a 64-bit structure, which limits to commands up to 64
  • alistair: I didn't see any drivers that use numbers >64
  • phil: I think there are some; statement here about commands using a compact space
  • alistair: yeah, there are some that use big numbers, but they don't have that many actual commands
  • amit: would it be reaonsable to have an additional offset field, or a header type, that allow for multiple numbers/masks for more commands?
  • phil: the place where this might break (the 64 bit type), the design that we accepted but didn't implement was something like where userspace would have access to multiple SPI buses, they would use the higher order bits
  • pat: We just revisited the syscall interface, and decided that commands should have a fairly wide bit width, but here we are implicitly make the opposite decision -- something feels at odds there
  • amit: does this only restrict commands? the thought being that subscribe/allow doesn't have side effects on its own
  • alistair: right, you can allow all you want without harming anything
  • phil: right, and we should probably make explicit in the TRD that you don't have subscribe callbacks to your app if your app didn't issue the command?
  • alistair: Maybe if that driver number isn't even mentioned, then subscribe/allow not allowed for that driver number?
  • phil: We did ultimately think somewhat deeply about the choice to have a wide command identifier type, but that was not necessarily at odds with being sparse
  • alistair: Yeah, it makes sense to me to compact the command space
  • phil: But will still need to handle the multiple user case
  • brad: ninedof for example has command number that are not sequential
  • phil: this is an easy fix: syscall TRD should say that commands should be dense; can point to this for justification
  • amit: The second one to discuss is flash
  • amit: anything beyond the comment hudson left yesterday?
  • [crickets]

Flash HIL update #2248

  • amit: This is a PR that looks at addressing issues raised from the OpenSK folks in RFC #1459
  • amit: This PR does not change the flash HIL to be synchronous, still an async HIL
  • amit: The main thing is that the interface to writing flash is now absolute addresses rather than pages/regions and uses types to express minimum transaction size
  • amit: Some other things as well, e.g. making it clear that erase and write have to happen separately
  • amit: I don't feel super comfortable leading this discussion, as I don't have strong opinions or intuitions about flash
  • amit: it seems that we have rough consensus that we should have separate interfaces for external and internal flash; does this improve things for at least one of those?
  • phil: It improves things....
  • phil: Agree with Pat's comment on the PR where the interface changing between pages and byte ranges is awkward. Maybe expose an erase that takes byte ranges, and internally converts to pages?
  • alistair: but what if the byte range doesn't line up on page boundaries, throw an error?
  • phil: yeah.., a few options there; could have an "erase_precise" that requires alignment; could do best effort; could re-write regions that should not have been erased
  • leon: The issue is perhaps that the alignment is only partially enforced by the type system?
  • alistair: yeah, the type gives the alignment for writes, which is often more granular than for erase
  • amit: what does this actually look like?
  • alistair: right, generally erasing in ~4K and writes are ~word
  • phil: sometimes things that are messier are you can 'write more than once, but only a few times', etc -- good that the interface doesn't try to capture that
  • leon: at the risk of increasing complexity, should we add another layer that encapsulates write and erase boundaries?
  • amit: I kind of agree with that I think. The goal of the HILs should be to abstract differences between hardware specific implementations, but not necessarily to abstract the hardware interface into something that is easier to use. The cost of the latter would be losing capability of the hardware and pushing complexity of adhering to the interface into each of the HW implementations
  • phil: agree, not worried about the most ergonomic interface, but do want one that doesn't make it easy to make mistakes on
  • amit: how do you find page number?
  • alistair: just count
  • amit: so the client needs to know the base address?
  • alistair: I think so
  • leon: this is exactly what I'm talking about, that a wrapper or library can encapsulate these details
  • amit: Alistair, this is why in write, the address is a usize rather than a pointer, because it's not a pointer, it's an offset from the base address of flash?
  • alistiar: yes, I should make that clearer
  • leon: since this is an async HIL, we could pass in larger than usize things, right? Could have >4GB of flash
  • alistair: I guess.. but you need a buffer for that?
  • phil: right, but external flash can have an offset; external flash can easiler by larger
  • alistair: but then we have to deal with 64 bit types everywhere
  • phil: it gets tricky when offset is platform-dependent though in how much you can address
  • leon: I think this could also be solved by an absract interface: if I request to write to something outside of valid bounds then the wrapper can reject the request; it can then pass friendly types to the internal impelmentation
  • amit: what about an additional type parameter with a default
  • leon: then you still expose this difference to the user; could make issues for use in hardware independent manner
  • hudson: if the user is requesting reads and writes from a given address, then they probably need to know the offset, not sure how you would hide that
  • leon: if the user is just the kernel driver, it will need to pass through the requested addresses
  • hudson: if you're not going to propogate the raw address through, how would an app write to >4GB in external flash
  • amit: allowing 64 bits up to userspace feels not that bad
  • amit: is it the case that the API for external flash has a similar hardware interface (erase pages; write smaller)?
  • phil: usually, but not always
  • phil: Alistair's use of the W parameter covers the odd case already where W is block size
  • amit: the downside is the address isn't necessarily aligned at the type level
  • phil: I think it is if you look at the comments
  • phil: I like Leon's suggestion that encapsulates this
  • phil: Alistair, if I want to do writes larger than W then I'll do a series of writes?
  • Alistair: I guess so
  • Vadim: What about region-locking, or if one app tries to write data owned by another app?
  • Alistair: that's a different problem, this is a low-level interface
  • phil: I think your interface still supports that; this can be a logical (i.e. virtual memory translation-esque) will work here
  • vadim: this actually allows regions via base+offset then?
  • jonathan: I don't think so as you'll need some virtualizing at the HIL internally if you want to do isolation here alone, which probably isn't the right place
  • alistair: we talked last time about an 'advanced' one as well; is that something we still want to do?
  • leon: I got the impression that that was just for things that are advanced
  • alistair: that was the hope, but I guess in theory there are some that support only reading/writing pages
  • amit: but W solves that, right?
  • alistiar: yeah.. I've never seen a flash with something like a read minimum? just making sure this will work
  • leon: yeah, could work around that easily by just discarding what wasn't asked for
  • phil: could separate out into read/write/erase traits
  • alistair: flash always does all of these though
  • phil: yeah, but UART also transmits and receives, but we split those. Allows structs to only implement what they should have access to
  • leon: Interesting concept; I don't write write without erase, but read without write makes sense. Perhaps a use case for super traits
  • phil: there are write without read cases: e.g. a shared log where you can't read others log entries
  • amit: if it proves necessary, then you could have different write traits for weird edge cases this way, and maybe compatibility layers
  • leon: one issue with UartAdvanced you either have lots of compatibility layers or higher level things lose out becuase of one small missing piece
  • leon: maybe then there is benefit to forcing lower layers to implement to avoid fragmentation
  • leon: otherwise there are many consumers all requiring slightly different flash traits
  • amit: okay -- as we have a packed agenda, may need to move on, summary thus far
    • rough interface is good thus far
    • some Q's still about how to separate the pieces
    • especially how to handle write, partiuclarly odder impl's with big writes
    • an implementaiton for an external or other flash might help surface issues or establish validity
  • amit: it seems that there remain non-obvious tradeoffs that will need an implementation to suss out best option
  • alistair: I'm away for the next two weeks, so if folks comment on the PR in that time then when I get back I can lean into an implementation to test out the ideas
  • [consensus yes]

libtock-rs + Tock 2.0

  • amit: Q's from johnathan
    • We normally have a 7 day rule for merging significant PRs
  • johnathan: the part I want to jump is that if everyone approves, we can merge without waiting out the window for all reviewers -- looking for approval from core to skip this for 2.0
  • alistair: I'm not sure we should waive the policy just because we're in a rush to get things in
  • leon: appreciate tagging me, but don't wait for me
  • alistair: the problem is that sometimes the tagged people review quickly, so things can merge in hours and others don't get a chance to look, but maybe 7 days is too long
  • amit: no horse in this race..
  • phil: is there a number between 1 and 7 days that's better?
  • alistair: 3? we should consider just changing the policy
  • loen: 3 can be pretty short though..
  • johnathan: okay; this is a more complex conversation than I'd hoped; let's shelve this one so we can get to the next one

libtock-rs: Better safe or better fast?

  • johnathan: Especially down at the syscall boundary, there's a tradeoff between safety and performance
    • e.g. can shave off two instructions in yeild-no-wait with an unsafe
    • e.g. can be faster to trust error codes and cast rather than bounds-check
  • johnathan: currently, anywhere where using unsafe would save code size, I've been doing it, especially if it aligns with the TRD wording, but not all reviewers seem in agreement
  • amit: two cases, one where the core kernel is making promises about the type (e.g. that it'll be a valid error code) and others where individual capsules will be making that promise
  • johnathan: I'm only considering the core kernel, otherwise it would be a threat model issue to trust capsules
  • amit: exactly, so in my mind, it seems fair to trust the core kernel
  • amit: we already trust the core kernel, e.g. it could do anything to the memory space of a process..
  • johnathan: we could potentially categorize by how common the call is; e.g. memop is not called often
  • alistair: we should be able to make decisions for this based on how much overhead there really is, right?
  • johnathan: per syscall or total? e.g. adding to command will add thousands to the applicaiton
  • alistair: right, thinking total; agree with the memop example
  • hudson: I suspect it might be challenging to have a general policy on this; can we trust the kernel to return valid error codes, yes, but are there other issues, e.g. alignment, that I as a reviewer will need to think about?
  • johnthan: I can definitely comment these better; plus in the future these will all be unit tested under MIRI
  • leon: part of why might help these questions is the boundaries, i.e. the raw_syscalls which close?
  • johnathan: actually raw_syscalls is the farthest of all of the crates under discussion here; if you're comfortable with those, comfortable with all
  • amit: agree Hudson, unlikely to be a hard and fast rule; will need case-by-case evaluation
  • amit: want to avoid unsafe if it's unnecessary -- if it is doing useful things, should be easier to justify
  • pat: is this something I can offer the team of students doing measurement work to help provide hard numbers for? they are on a 3-4 week to end product timeline
  • johnathan: my timeline is to be done with implmentation by the end of the weekend, so that doesn't quite line up
  • amit: still use in measuring; can implement the unsafe version for now, and see how much win it actually gets later

Quick circle back

  • amit: proposal, for now, reduce libtock-rs merge window to 3 days for 2.0; can revisit later whether to make this permanent. [passes]