lib/libesp32/berry/DEEP_REPOSITORY_ANALYSIS.md
Berry is a sophisticated embedded scripting language with a register-based virtual machine, one-pass compiler, and mark-sweep garbage collector. The architecture prioritizes memory efficiency and performance for embedded systems while maintaining full dynamic language capabilities.
be_vm.h, be_vm.c)struct bvm {
bglobaldesc gbldesc; // Global variable management
bvalue *stack; // Register stack (not call stack!)
bvalue *stacktop; // Stack boundary
bupval *upvalist; // Open upvalue chain for closures
bstack callstack; // Function call frames
bstack exceptstack; // Exception handling stack
bcallframe *cf; // Current call frame
bvalue *reg; // Current function base register
bvalue *top; // Current function top register
binstruction *ip; // Instruction pointer
struct bgc gc; // Garbage collector state
// ... performance counters, hooks, etc.
};
Key Architectural Decisions:
bvalue structure with type taggingbe_object.h)typedef struct bvalue {
union bvaldata v; // Value data (int, real, pointer, etc.)
int type; // Type tag (BE_INT, BE_STRING, etc.)
} bvalue;
Type Hierarchy:
Basic Types (not GC'd):
├── BE_NIL (0) - null value
├── BE_INT (1) - integer numbers
├── BE_REAL (2) - floating point
├── BE_BOOL (3) - boolean values
├── BE_COMPTR (4) - common pointer
└── BE_FUNCTION (6) - function reference
GC Objects (BE_GCOBJECT = 16):
├── BE_STRING (16) - string objects
├── BE_CLASS (17) - class definitions
├── BE_INSTANCE (18) - class instances
├── BE_PROTO (19) - function prototypes
├── BE_LIST (20) - dynamic arrays
├── BE_MAP (21) - hash tables
├── BE_MODULE (22) - module objects
└── BE_COMOBJ (23) - common objects
Performance Optimization:
be_lexer.c, be_lexer.h)Token Processing Pipeline:
Source Code → Lexer → Token Stream → Parser → AST → Code Generator → Bytecode
Key Features:
be_parser.c, be_parser.h)Expression Descriptor System:
typedef struct {
union {
struct { /* for suffix expressions */
unsigned int idx:9; // RK index (register/constant)
unsigned int obj:9; // object RK index
unsigned int tt:5; // object type
} ss;
breal r; // for real constants
bint i; // for integer constants
bstring *s; // for string constants
int idx; // variable index
} v;
int t, f; // true/false jump patch lists
bbyte not; // logical NOT flag
bbyte type; // expression type (ETLOCAL, ETGLOBAL, etc.)
} bexpdesc;
Expression Types:
ETLOCAL: Local variables (register-allocated)ETGLOBAL: Global variables (by index)ETUPVAL: Upvalues (closure variables)ETMEMBER: Object member access (obj.member)ETINDEX: Array/map indexing (obj[key])ETREG: Temporary registersbe_code.c)Bytecode Instruction Format:
32-bit instruction = [8-bit opcode][24-bit parameters]
Parameter formats:
- A, B, C: 8-bit register/constant indices
- Bx: 16-bit constant index
- sBx: 16-bit signed offset (jumps)
Register Allocation Strategy:
be_gc.c, be_gc.h)Mark-Sweep Algorithm:
struct bgc {
bgcobject *list; // All GC objects
bgcobject *gray; // Gray objects (mark phase)
bgcobject *fixed; // Fixed objects (never collected)
struct gc16_t* pool16; // Small object pool (≤16 bytes)
struct gc32_t* pool32; // Medium object pool (17-32 bytes)
size_t usage; // Current memory usage
size_t threshold; // GC trigger threshold
bbyte steprate; // Threshold growth rate
bbyte status; // GC state
};
GC Object Header:
#define bcommon_header \
struct bgcobject *next; \ // Linked list pointer
bbyte type; \ // Object type
bbyte marked // GC mark bits
Tri-color Marking:
Memory Pools:
be_string.c, be_string.h)String Interning System:
struct bstringtable {
bstring **table; // Hash table of interned strings
int size; // Table size
int count; // Number of strings
};
String Types:
be_jsonlib.c) - SECURITY CRITICALRecent Security Enhancements:
// Safe Unicode string length calculation
static size_t json_strlen_safe(const char *str, size_t len) {
size_t result = 0;
const char *end = str + len;
while (str < end) {
if (*str == '\\' && str + 1 < end) {
if (str[1] == 'u') {
// Unicode escape: \uXXXX → 1-3 UTF-8 bytes
result += 3; // Conservative allocation
str += 6; // Skip \uXXXX
} else {
result += 1; // Other escapes → 1 byte
str += 2; // Skip \X
}
} else {
result += 1;
str += 1;
}
}
return result;
}
Security Features:
Function Registration:
typedef int (*bntvfunc)(bvm *vm);
// Native function descriptor
typedef struct {
const char *name;
bntvfunc function;
} bnfuncinfo;
Calling Convention:
be_return() or be_returnvalue()be_func.c)Upvalue Management:
typedef struct bupval {
bcommon_header;
bvalue *value; // Points to stack slot or own storage
union {
bvalue val; // Closed upvalue storage
struct bupval *next; // Open upvalue chain
} u;
} bupval;
Closure Lifecycle:
be_class.c)Class Structure:
typedef struct bclass {
bcommon_header;
bstring *name; // Class name
bclass *super; // Superclass (single inheritance)
bmap *members; // Instance methods and variables
bmap *nvar; // Native variables
// ... method tables, constructors, etc.
} bclass;
Method Resolution:
be_module.c)Module Loading Pipeline:
Module Name → Path Resolution → File Loading → Compilation → Caching → Execution
Module Types:
.be files compiled to bytecode.bec files.so, .dll)Comparison with Stack-Based VMs:
Stack-based (Python): Register-based (Berry):
LOAD_FAST 0 # Already in register
LOAD_FAST 1 ADD R0, R1, R2
BINARY_ADD # Single instruction
STORE_FAST 2
Advantages:
Compile-time Optimizations:
2 + 3 → 5 at compile timeSmall Object Optimization:
Buffer Overflow Protection:
Integer Overflow Protection:
Resource Limits:
API Restrictions:
Test Categories:
Unit Tests (51 total):
├── Language Features (15 tests)
│ ├── assignment.be, bool.be, class.be
│ ├── closure.be, function.be, for.be
│ └── vararg.be, cond_expr.be, exceptions.be
├── Data Types (12 tests)
│ ├── list.be, map.be, range.be
│ ├── string.be, int.be, bytes.be
│ └── int64.be, bytes_fixed.be, bytes_b64.be
├── Libraries (8 tests)
│ ├── json.be (9168 lines - comprehensive security tests)
│ ├── math.be, os.be, debug.be
│ └── introspect.be, re.be, time.be
├── Parser/Compiler (6 tests)
│ ├── parser.be, lexer.be, compiler.be
│ └── suffix.be, lexergc.be, reference.be
└── Advanced Features (10 tests)
├── virtual_methods.be, super_auto.be
├── class_static.be, division_by_zero.be
└── compound.be, member_indirect.be
JSON Security Test Suite (10 functions):
Configuration System (berry_conf.h):
// Memory configuration
#define BE_STACK_TOTAL_MAX 2000 // Maximum stack size
#define BE_STACK_FREE_MIN 20 // Minimum free stack
// Feature toggles
#define BE_USE_PERF_COUNTERS 0 // Performance monitoring
#define BE_USE_DEBUG_GC 0 // GC debugging
#define BE_USE_SCRIPT_COMPILER 1 // Include compiler
// Integer type selection
#define BE_INTGER_TYPE 1 // 0=int, 1=long, 2=long long
Platform Abstraction (be_port.c):
tools/coc/)Compile-on-Command System:
.be files to C arraysInterpreter Core:
Benchmark Characteristics:
Compilation Performance:
be_api.c)API Categories:
// VM lifecycle
bvm* be_vm_new(void);
void be_vm_delete(bvm *vm);
// Script execution
int be_loadstring(bvm *vm, const char *str);
int be_pcall(bvm *vm, int argc);
// Stack manipulation
void be_pushnil(bvm *vm);
void be_pushint(bvm *vm, bint value);
bint be_toint(bvm *vm, int index);
// Native function registration
void be_regfunc(bvm *vm, const char *name, bntvfunc f);
Module Registration Pattern:
static int my_function(bvm *vm) {
int argc = be_top(vm);
if (argc >= 1 && be_isint(vm, 1)) {
bint value = be_toint(vm, 1);
be_pushint(vm, value * 2);
be_return(vm);
}
be_return_nil(vm);
}
static const bnfuncinfo functions[] = {
{ "my_function", my_function },
{ NULL, NULL }
};
int be_open_mymodule(bvm *vm) {
be_regfunc(vm, "my_function", my_function);
return 0;
}
JIT Compilation:
Advanced GC:
Enhanced Sandboxing:
Cryptographic Support:
Berry represents a sophisticated balance between simplicity and functionality. Its register-based VM, one-pass compiler, and integrated garbage collector provide excellent performance for embedded systems while maintaining the flexibility of a dynamic language. The recent security enhancements, particularly in JSON parsing, demonstrate a commitment to production-ready robustness.
The architecture's key strengths are:
This deep analysis provides the foundation for understanding any aspect of Berry's implementation, from low-level VM details to high-level language features.