hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md
BulkDeleteThe BulkDelete interface provides an API to perform bulk delete of files/objects
in an object store or filesystem.
The API is designed to match the semantics of the AWS S3 Bulk Delete REST API call, but it is not exclusively restricted to this store. This is why the "provides no guarantees" restrictions do not state what the outcome will be when executed on other stores.
org.apache.hadoop.fs.BulkDeleteSourceThe interface BulkDeleteSource is offered by a FileSystem/FileContext class if
it supports the API. The default implementation is implemented in base FileSystem
class that returns an instance of org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation.
The default implementation details are provided in below sections.
@InterfaceAudience.Public
@InterfaceStability.Unstable
public interface BulkDeleteSource {
BulkDelete createBulkDelete(Path path)
throws UnsupportedOperationException, IllegalArgumentException, IOException;
}
org.apache.hadoop.fs.BulkDeleteThis is the bulk delete implementation returned by the createBulkDelete() call.
@InterfaceAudience.Public
@InterfaceStability.Unstable
public interface BulkDelete extends IOStatisticsSource, Closeable {
int pageSize();
Path basePath();
List<Map.Entry<Path, String>> bulkDelete(List<Path> paths)
throws IOException, IllegalArgumentException;
}
bulkDelete(paths)if length(paths) > pageSize: throw IllegalArgumentException
All paths which refer to files are removed from the set of files.
FS'Files = FS.Files - [paths]
No other restrictions are placed upon the outcome.
The BulkDeleteSource interface is exported by FileSystem and FileContext storage clients
which is available for all FS via org.apache.hadoop.fs.impl.DefaultBulkDeleteSource. For
integration in applications like Apache Iceberg to work seamlessly, all implementations
of this interface MUST NOT reject the request but instead return a BulkDelete instance
of size >= 1.
Use the PathCapabilities probe fs.capability.bulk.delete.
store.hasPathCapability(path, "fs.capability.bulk.delete")
The need for many libraries to compile against very old versions of Hadoop means that most of the cloud-first Filesystem API calls cannot be used except through reflection -And the more complicated The API and its data types are, The harder that reflection is to implement.
To assist this, the class org.apache.hadoop.io.wrappedio.WrappedIO has few methods
which are intended to provide simple access to the API, especially
through reflection.
public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException;
public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException;
public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs, Path base, Collection<Path> paths);
The default implementation which will be used by all implementation of FileSystem of the
BulkDelete interface is org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation which fixes the page
size to be 1 and calls FileSystem.delete(path, false) on the single path in the list.
The S3A implementation is org.apache.hadoop.fs.s3a.impl.BulkDeleteOperation which implements the
multi object delete semantics of the AWS S3 API Bulk Delete
For more details please refer to the S3A Performance documentation.