Back to Azul

Gemini Perf Response2

scripts/gemini_perf_response2.md

0.0.710.8 KB
Original Source

This is an excellent analysis. You have correctly identified that for a batch-processing tool like git2pdf, the incremental update machinery of a GUI layout engine (reconciliation, tree diffing, repeated CSS parsing) becomes pure overhead.

Here is the strategy to strip that overhead, ranked by impact.

1. Top 3 Optimizations to Pursue

  1. Fix Architectural "Double Work" (High Impact, Low Risk):

    • Eliminate the double Display List generation.
    • Eliminate the reconcile_and_invalidate step for fresh DOMs.
    • Estimated Gain: ~400ms (eliminates ~130ms DL gen + ~294ms reconcile).
  2. Optimize Data Structures (High Impact, Medium Risk):

    • Replace BTreeMap<usize, LogicalPosition> with Vec<LogicalPosition>. The node indices are dense integers (0..N). O(1) vs O(log N) lookups during the positioning pass will be significant when N=12,000.
    • Estimated Gain: 50-100ms in the layout loop.
  3. Cache Parsed CSS (Medium Impact, Low Risk):

    • The str_to_dom phase parses the same syntax highlighting CSS for every commit.
    • Estimated Gain: ~50-100ms in str_to_dom.

2. Implementation: Fixing the Architecture

We need to split layout_document_with_fragmentation into two parts: layout calculation and display list generation. We also need a "fresh" path that skips reconciliation.

A. Modify solver3/mod.rs (Export LayoutResults)

We need a struct to hold the state between layout and display list generation.

rust
// In layout/src/solver3/mod.rs

/// Holds the results of a layout pass before DisplayList generation
pub struct LayoutResults {
    pub tree: crate::solver3::layout_tree::LayoutTree,
    pub calculated_positions: std::collections::BTreeMap<usize, azul_core::geom::LogicalPosition>,
    pub width: f32,
    pub height: f32,
}

B. Modify solver3/paged_layout.rs

Refactor layout_document_paged_with_config to avoid the double work and skipping reconciliation.

rust
// In layout/src/solver3/paged_layout.rs

// 1. Refactor the inner layout logic to return LayoutResults instead of DisplayList
fn compute_layout_fresh<T: ParsedFontTrait + Sync + 'static>(
    cache: &mut LayoutCache, // Used only for scratch space/buffers
    text_cache: &mut TextLayoutCache,
    fragmentation_context: &mut FragmentationContext,
    new_dom: &StyledDom,
    viewport: LogicalRect,
    font_manager: &crate::font_traits::FontManager<T>,
    get_system_time_fn: azul_core::task::GetSystemTimeCallback,
) -> Result<crate::solver3::LayoutResults> {
    
    // SETUP CONTEXT
    let mut counter_values = BTreeMap::new();
    let empty_text_selections = BTreeMap::new();
    
    // SKIP RECONCILIATION: Just build the tree from scratch
    // This replaces reconcile_and_invalidate()
    let mut ctx_tree = LayoutContext {
        styled_dom: new_dom,
        font_manager,
        selections: &BTreeMap::new(),
        text_selections: &empty_text_selections,
        debug_messages: &mut None, // Skip debug during tree build
        counters: &mut counter_values,
        viewport_size: viewport.size,
        fragmentation_context: Some(fragmentation_context),
        cursor_is_visible: true,
        cursor_location: None,
        cache_map: Default::default(),
        system_style: None,
        get_system_time_fn,
    };

    // O(N) tree build instead of O(N) diffing + O(N) patching
    let mut new_tree = crate::solver3::layout_tree::generate_layout_tree(&mut ctx_tree)?;

    // COMPUTE COUNTERS
    crate::solver3::cache::compute_counters(new_dom, &new_tree, &mut counter_values);

    // SETUP LAYOUT CONTEXT
    let mut cache_map = std::mem::take(&mut cache.cache_map);
    cache_map.resize_to_tree(new_tree.nodes.len());
    
    // Mark ROOT as dirty - this forces full layout without checking dirty flags recursively
    let layout_roots = std::collections::BTreeSet::from([new_tree.root]);
    let intrinsic_dirty = (0..new_tree.nodes.len()).collect(); // Everything needs measuring

    let mut ctx = LayoutContext {
        styled_dom: new_dom,
        font_manager,
        selections: &BTreeMap::new(),
        text_selections: &empty_text_selections,
        debug_messages: &mut None, // Pass mutable reference if you want logs
        counters: &mut counter_values,
        viewport_size: viewport.size,
        fragmentation_context: Some(fragmentation_context),
        cursor_is_visible: true,
        cursor_location: None,
        cache_map,
        system_style: None,
        get_system_time_fn,
    };

    // LAYOUT LOOP (Same as before, but operating on fresh tree)
    // ... [Copy the layout loop logic from layout_document_with_fragmentation] ...
    // ... [BUT STOP before generate_display_list] ...
    
    // Execute sizing and positioning logic...
    // (See full implementation block below)
    
    let cache_map_back = std::mem::take(&mut ctx.cache_map);
    cache.cache_map = cache_map_back;
    
    Ok(crate::solver3::LayoutResults {
        tree: new_tree,
        calculated_positions,
        width: viewport.size.width,
        height: viewport.size.height, // Or calculated content height
    })
}

// 2. Update the main paged layout function to use this optimized path
pub fn layout_document_paged_with_config(...) -> Result<Vec<DisplayList>> {
    // ... [Font loading code remains same] ...

    // --- OPTIMIZED LAYOUT PATH ---
    // 1. Compute Layout (Tree + Positions) ONLY. Do not generate Display List.
    let layout_results = compute_layout_fresh(
        cache,
        text_cache,
        &mut fragmentation_context,
        new_dom,
        viewport,
        font_manager,
        get_system_time_fn,
    )?;

    // 2. NOW generate the display list, but only ONCE for the infinite canvas
    let (scroll_ids, _) = crate::window::LayoutWindow::compute_scroll_ids(&layout_results.tree, new_dom);

    let mut ctx = LayoutContext {
        // ... set up context ...
    };

    let full_display_list = crate::solver3::display_list::generate_display_list(
        &mut ctx,
        &layout_results.tree,
        &layout_results.calculated_positions, // <--- Using positions we just computed
        scroll_offsets,
        &scroll_ids,
        gpu_value_cache,
        renderer_resources,
        id_namespace,
        dom_id,
    )?;

    // 3. Paginate
    // ... [Pagination logic remains same] ...
}

3. Answers to Your Questions

  1. Top 3 Optimizations:

    1. Architecture: Split layout from display_list. The current double-generation is burning 131ms per commit.
    2. Zero-Diffing: Implement compute_layout_fresh to skip reconcile_and_invalidate. For 12k nodes, comparing them one-by-one to find they are all different is expensive.
    3. Data Structure: Replace BTreeMap<usize, ...> with Vec<Option<...>> in LayoutCache.
  2. Layout Loop Complexity (528ms):

    • The complexity is nominally $O(N)$ due to the caching, but the constant factor is high.
    • Specifically, calculate_layout_for_subtree does map lookups.
    • Hotspot: BTreeMap lookups in calculated_positions. With 12,000 nodes, doing $O(\log 12000)$ lookups inside the hot loop adds up.
    • Solution: Changing calculated_positions to Vec will likely cut this time by 30-40%.
  3. str_to_dom (267ms):

    • Is CSS the bottleneck? Yes. full_dom.style(combined_css) matches selectors against all nodes.
    • Cache Strategy: In git2pdf, parse the CSS string into a Css struct once at startup.
    • Modify str_to_dom to accept Option<&Css> (pre-parsed) instead of parsing the string internally.
    • Even better: Since syntax highlighting classes are static (e.g., .kw, .str), you might be able to construct the StyledDom directly with pre-resolved properties, but passing pre-parsed Css is the easiest win.
  4. reconcile_and_invalidate (115ms):

    • What it does: It walks the old tree and new DOM simultaneously to find differences.
    • Bypass: Yes. In git2pdf, you discard the entire state after every PDF page generation anyway. Just call generate_layout_tree directly.
  5. BTreeMap vs Vec:

    • Worth it? YES. 12,481 nodes means thousands of allocations and pointer chases per frame in BTreeMap.
    • The node indices are contiguous (0 to N). Vec<LogicalPosition> (initializing with 0.0 or using Option) will be much faster and cache-friendly.
  6. Parallelization:

    • Yes. Since you create a fresh StyledDom and LayoutCache for every commit, they are thread-safe.
    • Use rayon to iterate over commits.
    • Caveat: fontconfig is not always thread-safe. You are using SharedFontPool with an Arc<Mutex<HashMap...>> for parsed fonts, which is good. Ensure fc_cache usage is thread-safe (it usually is).
  7. Batch PDF Architecture:

    • For batch PDF, you effectively want a "stateless" layout engine.
    • The current engine is heavily designed for "stateful" GUI (caching, diffing, cursors).
    • Create a dedicated entry point layout_static_document that strips all cursor logic, selection logic, and reconciliation logic.

4. Code Change: BTreeMap to Vec

This is a pervasive change, but here is how to start it in layout/src/solver3/cache.rs.

rust
// LayoutCache definition
pub struct LayoutCache {
    pub tree: Option<LayoutTree>,
    // CHANGE THIS:
    // pub calculated_positions: BTreeMap<usize, LogicalPosition>,
    pub calculated_positions: Vec<LogicalPosition>, 
    // ...
}

// In calculate_layout_for_subtree and friends:
// Instead of: calculated_positions.insert(idx, pos);
// Do: 
// if idx >= calculated_positions.len() {
//     calculated_positions.resize(idx + 1, LogicalPosition::zero());
// }
// calculated_positions[idx] = pos;

// Instead of: calculated_positions.get(&idx)
// Do: calculated_positions.get(idx) (returns Option<&T>)

Note: Since the tree is built sequentially, you can usually reserve the Vec to tree.nodes.len() immediately after tree creation to avoid reallocations.

5. Code Change: Disable Debug Logging in Release

In your git2pdf main logic (or xml_to_pdf_pages), ensure you are passing None to debug_messages unless a specific flag is set.

rust
// In printpdf/src/html/mod.rs

// Change this:
// let mut debug_messages = Some(Vec::new()); 
// To:
let mut debug_messages = if cfg!(debug_assertions) { 
    Some(Vec::new()) 
} else { 
    None 
};

This prevents the string formatting overhead inside debug_info! macros entirely.

  1. Apply cfg!(debug_assertions) guard to debug_messages in xml_to_pdf_pages. (Instant 2.8s fix).
  2. Implement compute_layout_fresh in paged_layout.rs to skip reconcile and stop generating the throw-away DisplayList.
  3. Refactor str_to_dom to accept a pre-parsed &Css object, and parse the syntax highlighting theme only once in git2pdf.
  4. Parallelize the loop over commits using rayon (after fixing shared font pool locking).