doc/wg/network/notes/network-notes-2024-03-04.md
Tyler: Good progress! OpenThread stack works right now: fully joins a Thread network and remains attached with child update requests. The wireshark trace for OpenThread on Tock matches OpenThread on baremetal Nordic boards.
Tyler: As part of that, a major fix and weird bug: in 6lowpan with fragmentation packets. With 15.4 the max packet size is 127 bytes. 6lowpan provides a way to fragment across packets and recombine. As part of that, the packets send in quick succession. The current design for receiving: the user process provides a buffer with 129 bytes of space. The kernel when receive is called transfers the buffer into the user buffer and schedules a callback. But the kernel maintains control of the buffer until the upcall has been handled. So what was happening is that the second packet arrived and overwrote the first packet. So the first one wasn't received and all of the packets were dropped. The solution is a larger buffer for queuing packets. So I made a way for the user process to provide a ring buffer instead of just one packet. That's not too much of a change in all honesty, but expect the PR today.
Leon: This is a very familiar problem to me. The TAP driver on tock-ethernet runs into this problem. We can't be sure when userspace will be scheduled and for how long and whether it processed everything. The design of Tock's upcalls right now and capsules not knowing if they've been received is intentional, but there's a broader need for a more efficient ring buffer data structure. Way back when we redesigned the userspace, either the kernel or userspace should have sole ownership of the data when allowed. So a "shared" ring buffer is likely non-compliant. It needs to be unallowed before modifying it.
Tyler: I very carefully handle that actually, and think I'm in compliance
Leon: That's great. I think in the long run we'll want to have an explicit system call for transferring packets through a lock-read data structure.
Tyler: Right now it's a pretty rudimentary data structure. The userspace unallows the buffer before making changes and allows the buffer back after copying data over to a separate userspace location. Long term, do you think we should implement a bigger fix now?
Leon: Wouldn't want to hold anything up on this. Your PR is probably a good first step. What I want is a general solution that's got stability guarantees. So I wouldn't want to blindly promote a solution for general use
Branden: What's left for OpenThread development then?
Tyler: "Works". Only on channel 26 right now. Getting channel switching implemented keeps falling on the priority list, getting sending / receiving to work has been challenging. Need to be able to switch channels, clean up PR and submit it. There is some hand-waving around signal-strength indication (currently just hard-code RSSI to -50 and link-quality indicator). This information doesn't make it up to the 15.4 part of the stack. Thread works by having a child issues a parent request, and chooses best parent based on the RSSI (so important for router selection). In my opinion, the most challenging part is getting the radio packets correctly parsed. It made me really happy to see that Thread works generally.
Current architecture: in a loop: call "do thread work", then yield. When a packet arrives or alarm fires, there's a delay before yielding again and I didn't know if that would be okay.
Next steps: let device running, see whether it crashes or falls off the network.