deep/README.md
The smile-deep module wraps the PyTorch / LibTorch C++ runtime,
giving Java applications GPU-accelerated deep learning without leaving the JVM.
smile.deep.tensor)
smile.deep.layer)
smile.deep.activation)smile.deep.Loss)smile.deep.Optimizer)smile.deep.Model)smile.deep.metric)smile.deep.Dataset)smile.deep.CUDA)smile.llm)
smile.vision)
// build.gradle.kts (consumer module)
dependencies {
implementation("com.github.haifengl:smile-deep:6.x.x")
}
The module ships CPU-only LibTorch by default. For GPU support, make sure the
native CUDA libraries are on the java.library.path and that your Bytedeco
pytorch classifier matches your CUDA version (e.g. linux-x86_64-gpu-cuda12.4).
smile.deep
├── tensor/ Tensor class, Index, Device, DeviceType, ScalarType, Layout
├── layer/ Layer interface and all built-in layer implementations
├── activation/ ActivationFunction and ~14 activation modules
├── metric/ Accuracy, Precision, Recall, F1Score, Averaging
├── Loss.java Static factory for all standard loss functions
├── Optimizer.java Static factory for SGD, Adam, AdamW, RMSprop
├── Model.java Abstract base class for trainable models
├── Dataset.java Dataset interface
├── DatasetImpl.java, DataSampler.java, SampleBatch.java
└── CUDA.java GPU info helpers
smile.llm
├── tokenizer/ Tokenizer interface + Tiktoken (BPE) implementation
├── llama/ LLaMA-3 transformer: Llama, Transformer, TransformerBlock,
│ Attention, FeedForward, ModelArgs, Tokenizer (llama-specific)
├── Message.java Immutable dialog message (role + content)
├── Role.java system / user / assistant / ipython
├── ChatCompletion.java Inference result record
├── FinishReason.java stop / length / function_call / content_filter
├── PositionalEncoding.java Sinusoidal (original Transformer) PE
└── RotaryPositionalEncoding.java RoPE (used by LLaMA)
smile.vision
├── transform/ Transform interface, ImageClassification pipeline
├── layer/ Vision-specific blocks: MBConv, FusedMBConv,
│ Conv2dNormActivation, SqueezeExcitation, StochasticDepth
├── EfficientNet.java EfficientNet-V2 architecture + pretrained factory methods
├── VisionModel.java Model subclass coupling a LayerBlock with a Transform
├── ImageDataset.java Folder-per-class dataset with background prefetch
└── ImageNet.java 1000-class ImageNet label/folder arrays + utilities
smile.deep.tensor)Tensor is the central data structure — a multidimensional array backed by a
native LibTorch tensor. It implements AutoCloseable; always close tensors
(or use a scope) when they are no longer needed to avoid native memory leaks.
// Zeros / ones
Tensor z = Tensor.zeros(3, 4); // shape [3,4], float32
Tensor o = Tensor.ones(2, 3);
// Random
Tensor r = Tensor.rand(5, 5); // uniform [0,1)
Tensor rn = Tensor.randn(5, 5); // standard normal
// From Java arrays
float[] data = {1f, 2f, 3f, 4f};
Tensor t = Tensor.of(data, 2, 2); // shape [2,2]
long[] ldata = {0L, 1L, 2L};
Tensor li = Tensor.of(ldata, 3); // Int64 tensor
// Arange
Tensor ar = Tensor.arange(0, 10, 1); // [0,1,...,9]
// Eye (identity matrix)
Tensor eye = Tensor.eye(4);
smile.deep.tensor.Index provides Python-style index objects:
Tensor t = Tensor.rand(4, 4);
Tensor col1 = t.get(Index.Colon, Index.of(1)); // all rows, column 1 → shape [4]
Tensor row2 = t.get(Index.of(2)); // row 2 → shape [4]
Tensor sub = t.get(Index.Slice(1, 3)); // rows 1–2 → shape [2, 4]
Tensor last = t.get(Index.Ellipsis, Index.of(3)); // last col via ellipsis
Tensor newDim = t.get(Index.None, Index.of(0)); // insert batch dim → shape [1, 4]
// Index with another tensor
int[] rows = {0, 2};
Tensor rowIdx = Tensor.of(rows, 2);
Tensor subset = t.get(rowIdx); // rows 0 and 2 → shape [2, 4]
Tensor a = Tensor.ones(3);
Tensor b = Tensor.ones(3).mul(2.0);
// Non-mutating (returns new tensor)
Tensor sum = a.add(b);
Tensor diff = a.sub(1.0f); // sub(float) or sub(double) — non-mutating
Tensor prod = a.mul(3.0);
Tensor quot = a.div(2.0);
// In-place (trailing underscore — returns 'this')
a.add_(1.0);
a.sub_(0.5f); // sub_(float) — mutates in place
a.mul_(2.0);
a.exp_(); // e^x in place
a.fill_(0.0f);
// Reduction
double s = a.sum().doubleValue();
Tensor argmax = a.argmax(0, false); // index of max along dim 0
Tensor topk2 = a.topk(2, 0, true, true).get0(); // top-2 values
// Shape utilities
long[] shape = a.shape();
int rank = a.dim();
long rows = a.size(0);
Tensor flat = a.view(-1);
Tensor t2d = flat.reshape(3, 1);
Tensor tr = t2d.t(); // transpose
Tensor contig = tr.contiguous(); // force contiguous memory layout
// Type casting
Tensor fp16 = a.to(ScalarType.Float16);
Tensor onCuda = a.to(new Device(DeviceType.CUDA, 0));
Use AutoScope to batch-free many tensors at once:
try (var scope = new smile.util.AutoScope()) {
Tensor.push(scope);
// ... all tensors created here are tracked
Tensor result = computeSomething();
result.retain(); // keep this one after scope exit
Tensor.pop(); // closes all tracked tensors except retained ones
}
For inference loops you can also use Tensor.noGradGuard():
try (var guard = Tensor.noGradGuard()) {
Tensor output = model.forward(input);
// no gradient graph is built → lower memory usage
}
// Set global defaults (affects all subsequent factory calls)
Tensor.setDefaultOptions(new Options()
.dtype(ScalarType.Float32)
.device(Device.ofCPU()));
// Per-tensor override
Tensor t = Tensor.ones(new Options().dtype(ScalarType.Float64), 3, 3);
smile.deep.layer)All layers implement the Layer interface:
public interface Layer extends Function<Tensor, Tensor> {
Tensor forward(Tensor input);
Module asTorch(); // underlying PyTorch Module
Layer to(Device device); // move to GPU
}
Layer provides convenience factories that combine a LinearLayer with an
activation in a single SequentialBlock:
LinearLayer fc = Layer.linear(128, 64); // no activation
SequentialBlock r = Layer.relu(128, 64); // Linear + ReLU
SequentialBlock rd = Layer.relu(128, 64, 0.2); // Linear + ReLU + Dropout(0.2)
SequentialBlock g = Layer.gelu(128, 64);
SequentialBlock s = Layer.silu(128, 64);
SequentialBlock t = Layer.tanh(128, 64);
SequentialBlock sg = Layer.sigmoid(128, 64);
SequentialBlock ls = Layer.logSoftmax(128, 64);
SequentialBlock lk = Layer.leaky(128, 64, 0.01); // LeakyReLU
// Simple conv with kernel 3, stride 1, no padding
Conv2dLayer c1 = Layer.conv2d(3, 32, 3);
// Full control: in, out, kernel, stride, padding, dilation, groups, bias, paddingMode
Conv2dLayer c2 = Layer.conv2d(3, 32, 3, 1, 1, 1, 1, true, "zeros"); // same padding
// String padding ("valid" or "same")
Conv2dLayer c3 = Layer.conv2d(3, 32, 3, 1, "same", 1, 1, true, "zeros");
MaxPool2dLayer mp = Layer.maxPool2d(2); // 2×2 max pooling
AvgPool2dLayer ap = Layer.avgPool2d(2);
AdaptiveAvgPool2dLayer aa = Layer.adaptiveAvgPool2d(1); // output 1×1 (global)
BatchNorm1dLayer bn1 = Layer.batchNorm1d(64);
BatchNorm2dLayer bn2 = Layer.batchNorm2d(32);
// Group Norm — 4 groups over 32 channels
GroupNormLayer gn = Layer.groupNorm(4, 32);
// RMS Norm — normalizes last dimension
RMSNormLayer rms = Layer.rmsNorm(64);
DropoutLayer drop = Layer.dropout(0.3);
EmbeddingLayer emb = Layer.embedding(50000, 256); // vocab=50k, dim=256
EmbeddingLayer emb2 = Layer.embedding(50000, 256, 1.0); // with scale alpha
// Build a small MLP
SequentialBlock mlp = new SequentialBlock(
Layer.relu(784, 256),
Layer.relu(256, 128),
Layer.logSoftmax(128, 10)
);
Tensor output = mlp.forward(input); // or mlp.apply(input)
// Add layers dynamically
SequentialBlock seq = new SequentialBlock();
seq.add(Layer.linear(64, 32));
seq.add(Layer.relu(32, 10));
smile.deep.activation)All activations implement ActivationFunction (which extends Layer).
They can be used standalone or placed inside a SequentialBlock:
| Class | Activation |
|---|---|
ReLU | max(0, x) |
LeakyReLU | max(αx, x) |
GELU | Gaussian-error linear unit |
SiLU | x·σ(x) (Swish) |
Tanh | tanh(x) |
Sigmoid | σ(x) |
Softmax | softmax along last dim |
LogSoftmax | log-softmax |
LogSigmoid | log(σ(x)) |
GLU | Gated linear unit |
HardShrink | x if |
SoftShrink | sign(x)·max(0, |
TanhShrink | x − tanh(x) |
Tensor x = Tensor.randn(8, 16);
ReLU relu = new ReLU(true); // inplace=true
Tensor y = relu.forward(x);
GELU gelu = new GELU();
Tensor z = gelu.forward(x);
smile.deep.Loss)Loss is a BiFunction<Tensor, Tensor, Tensor>. All standard PyTorch losses
are available as static factories:
Loss l1 = Loss.l1(); // MAE
Loss mse = Loss.mse(); // MSE
Loss bce = Loss.bce(); // Binary cross-entropy (requires sigmoid input)
Loss bceL = Loss.bceWithLogits(); // BCE + sigmoid (numerically stable)
Loss ce = Loss.crossEntropy(); // Softmax cross-entropy (standard classification)
Loss nll = Loss.nll(); // NLL (requires log-softmax input)
Loss sl1 = Loss.smoothL1(); // Huber/smooth-L1 (beta=1)
Loss hub = Loss.huber(0.5); // Huber with explicit delta=0.5
Loss kl = Loss.kl(); // KL divergence
Loss hinge = Loss.hingeEmbedding(); // Hinge embedding
Tensor lossTensor = ce.apply(logits, labels);
double lossVal = lossTensor.doubleValue();
For losses with three arguments:
// Margin ranking and triplet margin
Tensor mrLoss = Loss.marginRanking(input1, input2, target);
Tensor tmLoss = Loss.tripleMarginRanking(anchor, positive, negative);
smile.deep.Optimizer)import smile.deep.Optimizer;
var params = model.asTorch().parameters();
Optimizer sgd = Optimizer.sgd(params, 0.01); // lr=0.01
Optimizer sgdM = Optimizer.sgd(params, 0.01, 0.9); // + momentum
Optimizer adam = Optimizer.adam(params, 1e-3);
Optimizer adamW = Optimizer.adamW(params, 1e-3);
Optimizer rms = Optimizer.rmsprop(params, 1e-3);
// Per step
optimizer.zeroGrad();
loss.backward();
optimizer.step();
smile.deep.Model)Extend Model to define custom architectures:
public class MyCNN extends Model {
private final Conv2dLayer conv1;
private final LinearLayer fc;
public MyCNN() {
this.conv1 = Layer.conv2d(1, 32, 3);
this.fc = Layer.linear(32 * 13 * 13, 10);
register("conv1", conv1);
register("fc", fc);
}
@Override
public Tensor forward(Tensor input) {
Tensor h = conv1.forward(input);
h = h.relu_();
h = Layer.maxPool2d(2).forward(h);
h = h.view(h.size(0), -1);
return fc.forward(h);
}
}
MyCNN model = new MyCNN();
Optimizer optimizer = Optimizer.adam(model.asTorch().parameters(), 1e-3);
Loss criterion = Loss.crossEntropy();
model.train(
10, // epochs
dataset, // Dataset<SampleBatch>
criterion,
optimizer,
new Accuracy(), // metric to track
testDataset // optional eval dataset
);
The Model.train(...) method handles:
DataSamplersmile.deep.metric)All metrics implement Metric:
public interface Metric {
void update(Tensor output, Tensor target);
double compute();
void reset();
String name();
}
Available metrics:
| Class | Description |
|---|---|
Accuracy | # correct / total |
Precision | TP / (TP + FP) |
Recall | TP / (TP + FN) |
F1Score | Harmonic mean of precision and recall |
For multi-class classification pass an Averaging strategy:
Accuracy acc = new Accuracy();
Precision mp = new Precision(Averaging.Macro);
Recall mr = new Recall(Averaging.Micro);
F1Score wf1 = new F1Score(Averaging.Weighted);
F1Score binF1 = new F1Score(); // binary (uses threshold 0.5)
acc.update(output, target); // call once per batch
double result = acc.compute(); // fraction correct
acc.reset(); // clear accumulators
smile.deep.Dataset)// Create from arrays
float[][] features = ...;
int[] labels = ...;
Dataset<SampleBatch> ds = new DatasetImpl(features, labels);
// Iterate batches manually
DataSampler sampler = new DataSampler(ds, batchSize, /*shuffle=*/true);
for (SampleBatch batch : sampler) {
Tensor x = batch.data();
Tensor y = batch.target();
// ... train step
}
smile.deep.CUDA)boolean available = CUDA.isAvailable();
int count = CUDA.deviceCount();
int current = CUDA.currentDevice();
long free = CUDA.memoryReserved(); // bytes
boolean bf16 = Tensor.isBF16Supported(); // Ampere or newer
smile.llm)The smile.llm package provides building blocks and a complete LLaMA-3
inference stack on top of LibTorch.
| Type | Kind | Purpose |
|---|---|---|
Role | enum | system, user, assistant, ipython |
Message | record | A single dialog turn — (Role role, String content) |
FinishReason | enum | stop, length, function_call, content_filter |
ChatCompletion | record | Inference result — generated text, token arrays, log-probs, finish reason |
// Build a simple conversation
Message[] dialog = {
Message.system("You are a helpful assistant."),
Message.user("What is the capital of France?")
};
// Inspect a completion
ChatCompletion reply = llama.chat(dialog, 256, 0.6, 0.9, false, 0L, null);
System.out.println(reply.content()); // "The capital of France is Paris."
System.out.println(reply.reason()); // FinishReason.stop
smile.llm.tokenizer)Tokenizer is the encoding/decoding interface. Tiktoken is the BPE
implementation compatible with OpenAI's tiktoken library (used by LLaMA-3):
import smile.llm.tokenizer.Tiktoken;
Tiktoken tok = new Tiktoken(
pattern, // regex splitting pattern
specialTokens, // map of special-token string → rank
ranks, // merged BPE vocabulary (bytes → rank)
bosId, eosId // BOS / EOS token IDs
);
// Encode with BOS+EOS
int[] ids = tok.encode("Hello, world!", true, true);
// Decode back to text
String text = tok.decode(ids);
// Vocabulary size
int vocab = tok.size();
Tiktoken handles:
tryDecode variant that throws
CharacterCodingException on invalid byte sequences)Two implementations are provided:
| Class | Algorithm | Used by |
|---|---|---|
PositionalEncoding | Sinusoidal (sin/cos, fixed) | Original Transformer |
RotaryPositionalEncoding | RoPE (complex-number rotation) | LLaMA |
// Sinusoidal — precomputes a [maxLen × dim] table once
PositionalEncoding pe = new PositionalEncoding(512, 2048);
Tensor out = pe.forward(embeddingTensor); // adds positional signal
// RoPE — called inside Attention.forward()
RotaryPositionalEncoding rope = new RotaryPositionalEncoding(headDim, maxSeqLen);
smile.llm.llama)A full LLaMA-3 inference implementation:
| Class | Role |
|---|---|
ModelArgs | Hyperparameter record; loaded from params.json |
Transformer | Top-level module — embedding + N × TransformerBlock + output projection |
TransformerBlock | Single decoder block: Attention + FeedForward + RMS norms |
Attention | Multi-head (grouped-query) attention with KV-cache and RoPE |
FeedForward | SwiGLU feed-forward network |
Tokenizer (llama) | Thin wrapper around smile.llm.tokenizer.Tokenizer |
Llama | High-level entry point — build(), generate(), chat() |
Loading a checkpoint:
import smile.llm.llama.Llama;
// Loads params.json + *.pt checkpoint(s) from the directory
Llama llama = Llama.build(
"model/Meta-Llama-3-8B-Instruct", // checkpoint dir
"model/Meta-Llama-3-8B-Instruct/tokenizer.model",
/*maxBatchSize=*/ 4,
/*maxSeqLen=*/ 2048,
/*deviceId=*/ (byte) 0 // CUDA:0; use -1 for CPU
);
Text generation (raw token IDs):
int[][] prompts = { llama.tokenizer.encode("Once upon a time", true, false) };
ChatCompletion[] results = llama.generate(
prompts,
/*maxGenLen=*/ 200,
/*temperature=*/ 0.6,
/*topp=*/ 0.9,
/*logprobs=*/ false,
/*seed=*/ 42L,
/*publisher=*/ null // or a SubmissionPublisher<String> for streaming
);
System.out.println(results[0].content());
Chat completion (dialog format):
import smile.llm.Message;
ChatCompletion reply = llama.chat(
new Message[]{
Message.system("Be concise."),
Message.user("Explain RoPE in one sentence.")
},
/*maxGenLen=*/ 128,
/*temperature=*/ 0.7,
/*topp=*/ 0.9,
/*logprobs=*/ false,
/*seed=*/ 0L,
/*publisher=*/ null
);
System.out.println(reply.content());
Streaming output (single prompt only):
import java.util.concurrent.SubmissionPublisher;
var publisher = new SubmissionPublisher<String>();
publisher.subscribe(new Flow.Subscriber<>() {
public void onNext(String token) { System.out.print(token); }
// ... other methods
});
int[][] prompt = { llama.tokenizer.encode("Tell me a joke", true, false) };
llama.generate(prompt, 200, 0.8, 0.95, false, 0L, publisher);
publisher.close();
Note: GPU inference requires CUDA libraries on
java.library.path. On Ampere or newer hardware, the model is loaded in BFloat16; on older GPUs, Float16 is used. CPU inference runs in Float32.
smile.vision)The smile.vision package provides image classification pipelines built on
top of the smile.deep layer stack.
smile.vision.transform)Transform is a functional interface that converts one or more
BufferedImage objects into a 4-D [N, C, H, W] float tensor suitable
for a vision model.
import smile.vision.transform.Transform;
import smile.vision.transform.ImageClassification;
// Standard ImageNet preprocessing:
// resize shorter side → 384, centre-crop to 384×384,
// normalize with ImageNet mean/std
Transform t = Transform.classification(384, 384);
// Custom crop / resize / normalisation
Transform custom = new ImageClassification(
/*cropSize=*/ 224,
/*resizeSize=*/ 256,
/*mean=*/ new float[]{0.5f, 0.5f, 0.5f},
/*std=*/ new float[]{0.5f, 0.5f, 0.5f},
/*hints=*/ java.awt.Image.SCALE_SMOOTH
);
// Apply the transform
BufferedImage img = ImageIO.read(new File("cat.jpg"));
try (Tensor batch = t.forward(img)) {
// batch shape: [1, 3, 384, 384]
}
Default ImageNet statistics are available as constants:
float[] mean = Transform.DEFAULT_MEAN; // {0.485f, 0.456f, 0.406f}
float[] std = Transform.DEFAULT_STD; // {0.229f, 0.224f, 0.225f}
The Transform interface also exposes helper default methods:
// Resize keeping aspect ratio (shorter side → size)
BufferedImage resized = transform.resize(image, 256, Image.SCALE_SMOOTH);
// Centre-crop to square
BufferedImage cropped = transform.crop(resized, 224, false); // shallow copy
BufferedImage deep = transform.crop(resized, 224, true); // deep copy
// Convert image array → float32 [N,C,H,W] tensor (values in [0,1])
Tensor tensor = Transform.toTensor(images);
ImageDataset implements Dataset<SampleBatch> and reads images from a
folder-per-class directory structure:
root/
dog/
dog001.jpg
dog002.jpg
cat/
cat001.jpg
import smile.vision.ImageDataset;
import smile.vision.transform.Transform;
Transform t = Transform.classification(224, 224);
// targetTransform maps a class-folder name to an integer label
ImageDataset ds = new ImageDataset(
/*batch=*/ 32,
/*root=*/ new File("data/train"),
/*transform=*/ t,
/*targetTransform=*/ ImageNet.INSTANCE::targetTransform
);
for (SampleBatch batch : ds) {
Tensor images = batch.data(); // [32, 3, 224, 224]
Tensor labels = batch.target(); // [32]
}
Image loading runs on a background platform thread and is prefetched into a bounded queue (capacity 100), so the training loop is never blocked waiting for I/O.
EfficientNet extends LayerBlock and implements the EfficientNet-V2
architecture. Three pretrained VisionModel variants are available as
static factory methods:
| Factory | Variant | Input size | Parameters |
|---|---|---|---|
EfficientNet.V2S() | EfficientNet-V2-S | 384 × 384 | ~21 M |
EfficientNet.V2M() | EfficientNet-V2-M | 480 × 480 | ~54 M |
EfficientNet.V2L() | EfficientNet-V2-L | 480 × 480 | ~119 M |
import smile.vision.EfficientNet;
// Load pretrained weights from the default path
VisionModel model = EfficientNet.V2S();
// Or specify a custom checkpoint path
VisionModel model = EfficientNet.V2S("checkpoints/efficientnet_v2_s.pt");
// Run inference on one or more images
BufferedImage img = ImageIO.read(new File("dog.jpg"));
try (Tensor logits = model.forward(img)) { // shape [1, 1000]
Tensor probs = logits.softmax(1);
int classIdx = probs.argmax(1, false).intValue();
System.out.println(ImageNet.INSTANCE.labels()[classIdx]);
}
VisionModel.forward(BufferedImage...) automatically applies the model's
associated Transform, so you never need to preprocess images manually.
The EfficientNet constructor accepts an MBConvConfig[] array to define a
custom architecture, giving fine-grained control over each inverted-residual
stage:
MBConvConfig[] config = {
MBConvConfig.FusedMBConv(/*expandRatio=*/1, /*kernel=*/3, /*stride=*/1,
/*inCh=*/24, /*outCh=*/24, /*numLayers=*/2),
MBConvConfig.MBConv(4, 3, 2, 24, 48, 4),
// ... more stages
};
EfficientNet net = new EfficientNet(
config,
/*dropout=*/ 0.2,
/*stochasticDepth=*/ 0.2,
/*numClasses=*/ 1000,
/*lastChannel=*/ 1280,
/*normLayer=*/ null // defaults to BatchNorm2d
);
ImageNet is an interface with two 1000-element string arrays and a set of
utility methods for mapping between class indices and human-readable labels:
import smile.vision.ImageNet;
// The single concrete implementation
ImageNet inet = ImageNet.INSTANCE;
// Human-readable label strings ("Egyptian cat", "labrador", …)
String[] labels = inet.labels();
// Folder names used in the ILSVRC validation set ("n02124075", …)
String[] folders = inet.folders();
// Look up a label by index
String label = inet.labelOf(282); // e.g. "tiger cat"
// Map a folder name to a label
String name = inet.classify("n02124075");
// Map a folder name to a class index (useful as targetTransform)
int index = inet.targetTransform("n02124075");
import smile.deep.*;
import smile.deep.layer.*;
import smile.deep.tensor.Tensor;
// 1. Build model
SequentialBlock mlp = new SequentialBlock(
Layer.relu(784, 256),
Layer.relu(256, 128),
Layer.logSoftmax(128, 10)
);
// 2. Optimizer + loss
var params = mlp.asTorch().parameters();
Optimizer optimizer = Optimizer.adam(params, 1e-3);
Loss criterion = Loss.nll();
// 3. Training loop
for (int epoch = 0; epoch < 5; epoch++) {
for (SampleBatch batch : trainSampler) {
optimizer.zeroGrad();
Tensor logp = mlp.forward(batch.data());
Tensor loss = criterion.apply(logp, batch.target());
loss.backward();
optimizer.step();
}
}
public class LeNet extends Model {
private final Conv2dLayer conv1 = Layer.conv2d(1, 6, 5);
private final Conv2dLayer conv2 = Layer.conv2d(6, 16, 5);
private final LinearLayer fc1 = Layer.linear(16 * 4 * 4, 120);
private final LinearLayer fc2 = Layer.linear(120, 84);
private final LinearLayer fc3 = Layer.linear(84, 10);
private final MaxPool2dLayer pool = Layer.maxPool2d(2);
public LeNet() {
register("conv1", conv1); register("conv2", conv2);
register("fc1", fc1); register("fc2", fc2); register("fc3", fc3);
}
@Override
public Tensor forward(Tensor input) {
// [N,1,28,28] → [N,6,12,12]
Tensor x = pool.forward(new ReLU(true).forward(conv1.forward(input)));
// → [N,16,4,4]
x = pool.forward(new ReLU(true).forward(conv2.forward(x)));
x = x.view(x.size(0), -1); // flatten
x = new ReLU(true).forward(fc1.forward(x));
x = new ReLU(true).forward(fc2.forward(x));
return new LogSoftmax().forward(fc3.forward(x));
}
}
// Train on MNIST dataset
LeNet lenet = new LeNet();
lenet.train(
10,
mnistTrainDataset,
Loss.nll(),
Optimizer.sgd(lenet.asTorch().parameters(), 0.01, 0.9),
new Accuracy(),
mnistTestDataset
);
# Build the module (skip tests)
./gradlew :deep:build -x test
# Compile tests only (fast check)
./gradlew :deep:compileTestJava
# Run all tests (requires LibTorch native library on PATH)
./gradlew :deep:test
# Run a specific test class
./gradlew :deep:test --tests "smile.deep.tensor.TensorTest"
./gradlew :deep:test --tests "smile.deep.metric.MetricTest"
./gradlew :deep:test --tests "smile.deep.LossTest"
./gradlew :deep:test --tests "smile.deep.layer.LayerTest"
./gradlew :deep:test --tests "smile.llm.tokenizer.TiktokenTest"
./gradlew :deep:test --tests "smile.vision.transform.TransformTest"
./gradlew :deep:test --tests "smile.vision.ImageNetTest"
Note: Tests require the PyTorch / LibTorch native libraries to be present. On CI without GPU, the tests run on CPU-only LibTorch. Tests that require CUDA can be tagged
@Tag("gpu")and excluded with-DexcludeTags=gpu.
SMILE — Copyright © 2010–2026 Haifeng Li. GNU GPL v3 licensed.