Tock Core Notes 2020-09-25

Attending

Pat Pannuto
Johnathan Van Why
Philip Levis
Amit Levy
Leon Schuermann
Alistair
Brad Campbell
Hudson Ayers
Branden Ghena
Samuel Jero
Vadim Sukhomlinov

Updates

Phil: Timer merged!!
Brad: elf2tab rounds application outputs to power-of-2, because of Cortex-M MPU limitations. For Risc-V, this power-of-2 isn't a requirement, so looking at moving the logic out of elf2tab
- Amit: Sounds reasonable, perhaps tockloader responsibility?
- Brad: Actually, it is already a tockloader responsibility. The multiple compiled binaries already a major R5 complexity
- Amit: Yeah, seems like a pre-tockloader choice whose time has come to move
Johnathan: OpenSK is working on adding NFC support for 52840; also working on CryptoCell, but that's moving slower b/c very complex

Interrupt control semantics in Tock/HIL

Amit: Summary: Who in the kernel is responsible to enable/disable NVIC on ARM / equiv on R5?
Phil: Not just about who, but about what happens
Phil: Q's are: Can an interrupt wake a processor, and do you do something in response to it?
Phil: Today, semantics are unclear and somewhat varying on what happens when interrupts are enabled/disabled
Phil: Things seem to be tied more to how it's implemented rather than a concerted design
Phil: ARM originally did a FIFO queue in top-half handlers
Phil: What happens if queue overflows? Replaced with a no-queue design that scans over what interrupts are pending. Problem: no way to tell the kernel, "don't call this handler" (i.e. active interrupt wakes processor and masked interrupt is pending, it will be called)
Phil: This came up again with R5; thought is maybe that core kernel loop should control which interrupts should be enabled; but what if a peripheral wants to disable and the core loop overrides?
Phil: Personal opinion: peripherals should own their interrupts -- they should decide if the core will wake and if their handler will be called
Phil: If that's not tractable, will have to push the operation into handlers (i.e. first line of handler just checks if handling now and ignores if needed) [editor's note: may be more complexity here]
Amit: The current state is that capsules do not control whether interrupts can wake the CPU. In general, they cannot access the NVIC methods that control this
Phil: Directly, indirectly, or both? e.g. if SPI capsule tells SPI HW to turn off, does that count?
Amit: That would count... as far as I can tell, there should not be and are no references to nvic enable/disable except for board configurations and in chip handle_pending_interrupts functions
Amit: Today: capsules & low-level drivers should not and cannot affect interrupts
Amit: Not necessarily a thought-out design decision; was a solution to the FIFO queue at the time
Amit: The way that peripherals do have control is ensuring that their peripheral will not generate interrupts at all
Hudson: There was an exception in R5 around mtimer, but that was just removed
Amit: Is thre something like NVIC on R5?
Brad: Yes. There's two levels. There's an architectural state register, which has bits for different interrupts, one of which is the external interrupt, which maps to something akin to NVIC [there are multiple implementation of this, commonly the PLIC]; inside PLIC there are mappings to peripherals
Brad: Caveat: that's a simplification of course
Leon: Important to note: Many chips have a PLIC, but the thing named PLIC varies somewhat wildly
Phil: On idea on Cortex-M was maybe to do this in a handler via software bits. For R5, there are 1024 which is very expensive to scan for wakeups
Amit: And you don't have to do that to find the pending bits?
Phil: Yeah, R5 has a magic "give me next interrupt" register
Amit: Ah, which is effectively what we were trying to do in software on ARM...
Amit: To move forward...
- 1. Agree on what current semantics are
- 1. Whatever PRs we have outstanding should go in if the match the existing semantics
- 1. We should then [task force?] figure out what desired semantics should be
Amit: Towards that last point, what do we actually want for flexibility for drivers; what's the ideal interface?
Alistair: Drivers, or for handling interrupts?
Amit: Well, the interrupt logic is in service of what drivers want to do. My understanding is that the current interface doesn't give enough flexibility to drivers. e.g. on ARM a driver cannot mask themselves off on the NVIC
Alistair: Think drivers shouldn't have affect on the global interrupts -- they can stop themselves from generating interrupts, but shouldn't suppress interrupts. But a board should have the ability. e.g. board can disable USB interrupts b/c the hardware is broken
Alistair: The problem with the current PR is that the timer core is making assumptions about interrupt state (e.g. always turn on timer) rather than checking what was there before
Phil: Q, are you saying that drivers should not be able to clear out PLIC or NVIC, even for themselves?
Alistair: Yes, I think so
Phil: What if I have a driver with multiple things that can cause interrupts (e.g. 7 gpio pins), how do I make an atomic change to behavior .. eg if this is across multiple registers?
Alistair: Is it any different if thery're across multiple registers on the PLIC?
Phil: Typically you would globally disable interrupts; trying to think of cases wehere interrupts are spread across multiple registers, but can't really think of any
Amit: e.g. sam4l where each GPIO bank has different NVIC entry – but think there is still one register for interrupts
Alistair: but whether devices can or cannot clear interrupts, we still shouldn't be forcefully enabling things in the interupt handler
Phil: Yeah, agreed. The question is whether the 'driver no access' works always; 90% of the time, probably; but there are cases where you need to disable in order to be able to do atomic operations
Leon: If it's only atomic operations, wouldn't it make sense to have a general atomic closure?
Pat: We do have this, but there's inherent unsafety here to gloablly disabling interrupts
Leon: Yes, but this is a rare use case, it's already unsafe; likely within the scope of auditable
Amit: I'm not convinced that doing an atomic CPU operation would give the semantics that you wanted. e.g. I believe that on ARM it prevents ISRs from running. It does not, however, prevent pending bits from being set. So if the semantics that I care about is I get interrupts for all or none of the events that I'm messing with, then disabling interrupts doesn't help
Amit: Conversely, the ISR running has some performance overhead, but likely not a funcitonal issue
Amit: Could check in ISR if things have associated interrupt; only interrupts execution temporarily
Phil: I think the way to think about this is, have complex periph trying to set up operations for, and while this complex setup is happening, don't want ISRs to run. Given Tock's concurrency model (i.e. push everything to bottom-halfs) it could be that this already just goes away
Amit: Yes. Though it would be good to cover this expliclty
Phil: Yes.. because of Tock's sync model, many of the typical problems are elided, but would put $ down that there are still problems hidden somewhere here
Phil: Ran into this on H1b
Amit: The more dramatic option would be moving the whole kernel to the top half
Amit: We only use this top/bottom mask model to allow pending bits to sit there
Phil: Just realized one use case: Have an on-chip peripheral, with multiple interrupts, during power-down either clear all config or just turn off interrupts
Amit: Practically speaking, is that a way that chips behave?
Amit: e.g. on SAM4L interrupts are disabled implicity if peripheral clocks are disabled
Phil: Yeah, again this is a 90% time works out; but need to support the weird cases
Vadim: How might this interact with future multi-core considerations?
Vadim: Some of our projects are looking to things such as I/O core and compute core, might want to figure out how to adapt Tock to this; know there are a lot of assumptions about single core in Tock now, but this is an opportunity to look ahead
Amit: We've thought about this a few times; it's involved, but we've largely punted on the issue – we should chat more post-call
Phil: What I was hoping for from this discussion is being able to say, "when you disable an interrupt, and you ensured that your handler will not later be invoked"?
Amit: For clarity, let's call NVIC interrupts and ISR the top-half handler
Amit: if you disable an interrupt in your driver, the gaurentee that you have is that hardware will not set an interrupt bit [however: it may have already been set]; this means that the ISR may have run and your handle_interrupt function may be called in the future
Phil: That's usually not that expected semantics, if you disable interrupts, you expect pending to be cleared
Amit: Yes, so today, you have to do the additional work of clearing your own pending bit in the hardware. But today, driver's can't control NVIC directly and can't clear directly; there's one case in upstream STM where clear_pending uses unsafe correclty to do this, but not explored deeply
Amit: Generally, seems it should be safe to allow access to pending and not break the contract that interrupt handling code expects
Phil: Comfortable if that's the preferred approach, want to minmize unsafe of course, but this may be a needed escape hatch
Amit: Current answer is that this is all low-level logic happening in chips crate, so unsafe is accessible. Calling enable/disable will do unexpected things because it'll get re-enabled elsewhere later. Messing with pending will probably do what they want.
Amit: Again, this is descriptive of today's model, not necessarily right
Phil: Looking at the NVIC for cortex-M, instantiation is unsafe, but none of the methods are unsafe
Amit: Right; however I believe that we never pass an NVIC to any driver
Amit: Again, the one exception found is in the STM who made an NVIC on the fly
Amit: Really, that NVIC interface is a holdover from when we were passing NVICs instances around, and maybe should go away

Libtock C Switchover

Phil: New Alarm API in kernel with new associated syscall
Phil: Current driver also allows old syscall
Phil: Would be good to update usersapce
Pat: We do parallel libtock-c releases; so this should be a 1.6 blocker and testing should be atop this new interface
[consensus]

Tock 2.0 syscall ABI

Phil: Primary goal today just to raise awareness
Phil: Long ago decided that it'd be good to switch over to results rather than ReturnCode, hasn't happened yet
Phil: Issue is that in the kernel the simplicity of ReturnCode as a value was useful early on; may be able to just move over to Result
Phil: All of the complication really is translation to formal syscall return values; ideally this goes away with 2.0
Leon: Proposed a transition phase where both are used, is that in there?
Phil: Yeah, this code in there now doesn't change the Driver trait -- this just changes the ABI. Haven't had a chance to look over your proposed soultion yet
Phil: Instinct: transition periods stretch on, easier to do a clean break
Amit: Really an orthogonal question
Phil: Today, uses current driver trait and translates into syscall return values
L/Phil: We'll look at each other's code
Phil: Long-standing Rust wishlist miss: Can't match on enum values, ergonomics a bit worse as a result

USB/CTAP

Brad: Hard to talk specifics w/out Guillaume, but can talk higher-level issues
Brad: Have two different takes on a chunk of the USB stack right now; need to figure out how to harmonize
Alistair: Really don't both, that seems like a bad idea in the long run; constant USB HIL refactoring etc
Alistair: Big thing that I think is important is separate USB HIL
Hudson: Seems like part of the concern is that OpenSK has had this implementaiton for a while; hesitation to swap out working code for something new
Amit: Presumably their version has a working userspace? As it's been there for a while?
Alistair: Yeah, theirs is in one of the SK repos; my PR has one in libtock-rs
Brad: Userspace impl's is a bit of a red herring; shouldn't drive choice here
Brad: Can expose two syscall driver interfaces in the short term
Brad: Don't want different USB stacks underneath those syscall drivers
Amit: That seems reasonable – how do we move forward w/out G? How different are they under the hood?
Alistair: They aren't that different
Alistair: I don't like two syscalls as much, but that's not as big of a problem as two stacks; seems like one stack two syscalls good place to be for now
Amit: Also probably easier to just maintain the syscall layer out-of-tree
Brad: Right. So the core issue seems to be should we have interfaces/traits in USB for CTAP/HID?
Pat: I think this boils down should USB have a "CTAP+HID" (current OpenSK) or USB adds "HID" and then a separate "CTAP" atop that (other PR)
Brad: Q is whether we should add layers of abstraction when there's only one user of each layer [tinyos problem?]
Phil: HILs are hard to change once they are in; seems priority should be sorting what USB HIL looks like
Alistair: That's the big difference; OpenSK version has no HIL
Phil: Having a HIL seems important
Brad: Important to remember that this isn't a HIL that touches hardware -- just a software HIL. Not going to have 15 impls, maybe just 2
Phil: Doesn't mean you can be sloppy about it
Brad: Yes, but also not the same barrier to change as something like the time HIL; more realistic to change
Amit: Maybe closer to some of the interfaces in the networking stack rather than timer HIL
Hudson: Yeah, those networking traits are just in a net/ folder
Amit: Seems the thing to do would be to have a call with Al + G + other salient folks. If the underlying implementations are really that similar, should be able to come to a consensus
Amit: If we have an interface, should be able to support both use cases
Brad: We're having an email discussion already and that's not really resolving – need a dedicated call
Phil: Important result is that this ends in one API
Amit: Userspace or kernel?
Phil: both.
Amit: Think in the short term, converging in the kernel is important; but userspace can be deferred
Amit: I can organize and moderate this
Alistair: I really don't think they are actually that different
Hudson: Yeah, I really think this is talking past each other a bit on github, and a phone will should hopefully resolve this quickly
Hudson: Really important for Tock to support both OpenSK and OpenTitan well
{consensus on these points}
Alistair: Happy to keep updating, just want to make sure work doesn't go to waste
Amit: Sounds good, I will set up call