Back to Hadoop

Licensed to the Apache Software Foundation (ASF) under one

hadoop-common-project/hadoop-common/src/site/markdown/release/0.22.0/RELEASENOTES.0.22.0.md

2.0.5-alpha-rc225.1 KB
Original Source
<!--- # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -->

Apache Hadoop 0.22.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.


  • MAPREDUCE-478 | Minor | separate jvm param for mapper and reducer

Allow map and reduce jvm parameters, environment variables and ulimit to be set separately.

Configuration changes: add mapred.map.child.java.opts add mapred.reduce.child.java.opts add mapred.map.child.env add mapred.reduce.child.ulimit add mapred.map.child.env add mapred.reduce.child.ulimit deprecated mapred.child.java.opts deprecated mapred.child.env deprecated mapred.child.ulimit


  • HADOOP-6344 | Major | rm and rmr fail to correctly move the user's files to the trash prior to deleting when they are over quota.

Trash feature notifies user of over-quota condition rather than silently deleting files/directories; deletion can be compelled with "rm -skiptrash".


  • HADOOP-6599 | Major | Split RPC metrics into summary and detailed metrics

Split existing RpcMetrics into RpcMetrics and RpcDetailedMetrics. The new RpcDetailedMetrics has per method usage details and is available under context name "rpc" and record name "detailed-metrics"


  • MAPREDUCE-927 | Major | Cleanup of task-logs should happen in TaskTracker instead of the Child

Moved Task log cleanup into a separate thread in TaskTracker. Added configuration "mapreduce.job.userlog.retain.hours" to specify the time(in hours) for which the user-logs are to be retained after the job completion.


  • HADOOP-6730 | Major | Bug in FileContext#copy and provide base class for FileContext tests

WARNING: No release note provided for this change.


  • MAPREDUCE-1707 | Major | TaskRunner can get NPE in getting ugi from TaskTracker

Fixed a bug that causes TaskRunner to get NPE in getting ugi from TaskTracker and subsequently crashes it resulting in a failing task after task-timeout period.


  • MAPREDUCE-1680 | Major | Add a metrics to track the number of heartbeats processed

Added a metric to track number of heartbeats processed by the JobTracker.


WARNING: No release note provided for this change.


  • HDFS-1061 | Minor | Memory footprint optimization for INodeFile object.

WARNING: No release note provided for this change.


  • HDFS-1079 | Major | HDFS implementation should throw exceptions defined in AbstractFileSystem

Specific exceptions are thrown from HDFS implementation and protocol per the interface defined in AbstractFileSystem. The compatibility is not affected as the applications catch IOException and will be able to handle specific exceptions that are subclasses of IOException.


  • MAPREDUCE-1558 | Major | specify correct server principal for RefreshAuthorizationPolicyProtocol and RefreshUserToGroupMappingsProtocol protocols in MRAdmin (for HADOOP-6612)

new config: hadoop.security.service.user.name.key this setting points to the server principal for RefreshUserToGroupMappingsProtocol. The value should be either NN or JT principal depending if it is used in DFAdmin or MRAdmin. The value is set by the application. No need for default value.


  • HDFS-708 | Major | A stress-test tool for HDFS.

Does not currently provide anything but uniform distribution. Uses some older depreciated class interfaces (for mapper and reducer) This was tested on 0.20 and 0.22 (locally) so it should be fairly backwards compatible.


  • MAPREDUCE-1354 | Critical | Incremental enhancements to the JobTracker for better scalability

Incremental enhancements to the JobTracker include a no-lock version of JT.getTaskCompletion events, no lock on the JT while doing i/o during job-submission and several fixes to cut down configuration parsing during heartbeat-handling.


Removes JNI calls to get jvm current/max heap usage in ClusterStatus. Any instances of ClusterStatus serialized in a prior version will not be correctly deserialized using the updated class.


  • MAPREDUCE-1773 | Major | streaming doesn't support jobclient.output.filter

Improved console messaging for streaming jobs by using the generic JobClient API itself instead of the existing streaming-specific code.


  • MAPREDUCE-1785 | Minor | Add streaming config option for not emitting the key

Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter's, key is always written.


  • MAPREDUCE-572 | Minor | If #link is missing from uri format of -cacheArchive then streaming does not throw error.

Improved streaming job failure when #link is missing from uri format of -cacheArchive. Earlier it used to fail when launching individual tasks, now it fails during job submission itself.


  • HDFS-1096 | Major | allow dfsadmin/mradmin refresh of superuser proxy group mappings

changed protocol name (may be used in hadoop-policy.xml) from security.refresh.usertogroups.mappings.protocol.acl to security.refresh.user.mappings.protocol.acl


  • HADOOP-6787 | Major | Factor out glob pattern code from FileContext and Filesystem

WARNING: No release note provided for this change.


  • MAPREDUCE-1836 | Major | Refresh for proxy superuser config (mr part for HDFS-1096)

changing name of the protocol (may be used in hadoop-policy.xml) from security.refresh.usertogroups.mappings.protocol.acl to security.refresh.user.mappings.protocol.acl


  • MAPREDUCE-1505 | Major | Cluster class should create the rpc client only when needed

Lazily construct a connection to the JobTracker from the job-submission client.


  • MAPREDUCE-1543 | Major | Log messages of JobACLsManager should use security logging of HADOOP-6586

Adds the audit logging facility to MapReduce. All authorization/authentication events are logged to audit log. Audit log entries are stored as key=value.


  • MAPREDUCE-1533 | Major | Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()

Incremental enhancements to the JobTracker to optimize heartbeat handling.


  • HDFS-1080 | Major | SecondaryNameNode image transfer should use the defined http address rather than local ip address

WARNING: No release note provided for this change.


Fixed an NPE in streaming that occurs when there is no input to reduce and the streaming reducer sends status updates by writing "reporter:status: xxx" statements to stderr.


  • MAPREDUCE-1829 | Major | JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()

Improved performance of the method JobInProgress.findSpeculativeTask() which is in the critical heartbeat code path.


  • MAPREDUCE-1887 | Major | MRAsyncDiskService does not properly absolutize volume root paths

MAPREDUCE-1887. MRAsyncDiskService now properly absolutizes volume root paths. (Aaron Kimball via zshao)


  • HADOOP-6835 | Major | Support concatenated gzip files

Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)


  • MAPREDUCE-1733 | Major | Authentication between pipes processes and java counterparts.

This jira introduces backward incompatibility. Existing pipes applications MUST be recompiled with new hadoop pipes library once the changes in this jira are deployed.


  • HDFS-1315 | Major | Add fsck event to audit log and remove other audit log events corresponding to FSCK listStatus and open calls

When running fsck, audit log events are not logged for listStatus and open are not logged. A new event with cmd=fsck is logged with ugi field set to the user requesting fsck and src field set to the fsck path.


  • MAPREDUCE-1866 | Minor | Remove deprecated class org.apache.hadoop.streaming.UTF8ByteArrayUtils

Removed public deprecated class org.apache.hadoop.streaming.UTF8ByteArrayUtils.


  • HDFS-330 | Trivial | Datanode Web UIs should provide robots.txt

A robots.txt is now in place which will prevent well behaved crawlers from perusing Hadoop web interfaces.


  • HDFS-202 | Major | Add a bulk FIleSystem.getFileBlockLocations

WARNING: No release note provided for this change.


  • MAPREDUCE-1780 | Major | AccessControlList.toString() is used for serialization of ACL in JobStatus.java

Fixes serialization of job-acls in JobStatus to use AccessControlList.write() instead of AccessControlList.toString().


  • HADOOP-6905 | Major | Better logging messages when a delegation token is invalid

WARNING: No release note provided for this change.


  • HADOOP-6693 | Major | Add metrics to track kerberos login activity

New metrics "login" of type MetricTimeVaryingRate is added under new metrics context name "ugi" and metrics record name "ugi".


  • HDFS-1318 | Major | HDFS Namenode and Datanode WebUI information needs to be accessible programmatically for scripts

resubmit the patch for HDFS1318 as Hudson was down last week.


  • MAPREDUCE-220 | Major | Collecting cpu and memory usage for MapReduce tasks

Collect cpu and memory statistics per task.


  • HDFS-712 | Major | Move libhdfs from mr to hdfs

Moved the libhdfs package to the HDFS subproject.


  • MAPREDUCE-2032 | Major | TestJobOutputCommitter fails in ant test run

Clears a problem that {{TestJobCleanup}} leaves behind files that cause {{TestJobOutputCommitter}} to error out.


Makes AccessControlList a writable and updates documentation for Job ACLs.


<!-- markdown -->
  • Removed aclsEnabled flag from queues configuration files.
  • Removed the configuration property mapreduce.cluster.job-authorization-enabled.
  • Added mapreduce.cluster.acls.enabled as the single configuration property in mapred-default.xml that enables the authorization checks for all job level and queue level operations.
  • To enable authorization of users to do job level and queue level operations, mapreduce.cluster.acls.enabled is to be set to true in JobTracker's configuration and in all TaskTrackers' configurations.
  • To get access to a job, it is enough for a user to be part of one of the access lists i.e. either job-acl or queue-admins-acl(unlike before, when, one has to be part of both the lists).
  • Queue administrators(configured via acl-administer-jobs) of a queue can do all view-job and modify-job operations on all jobs submitted to that queue.
  • ClusterOwner(who started the mapreduce cluster) and cluster administrators(configured via mapreduce.cluster.permissions.supergroup) can do all job level operations and queue level operations on all jobs on all queues in that cluster irrespective of job-acls and queue-acls configured.
  • JobOwner(who submitted job to a queue) can do all view-job and modify-job operations on his/her job irrespective of job-acls and queue-acls.
  • Since aclsEnabled flag is removed from queues configuration files, "refresh of queues configuration" will not change mapreduce.cluster.acls.enabled on the fly. mapreduce.cluster.acls.enabled can be modified only when restarting the mapreduce cluster.

  • MAPREDUCE-1517 | Major | streaming should support running on background

Adds -background option to run a streaming job in background.


  • MAPREDUCE-2147 | Trivial | JobInProgress has some redundant lines in its ctor

Remove some redundant lines from JobInProgress's constructor which was re-initializing things unnecessarily.


  • HDFS-1435 | Major | Provide an option to store fsimage compressed

This provides an option to store fsimage compressed. The layout version is bumped to -25. The user could configure if s/he wants the fsimage to be compressed or not and which codec to use. By default the fsimage is not compressed.


  • HADOOP-7005 | Major | Update test-patch.sh to remove callback to Hudson master

N/A


  • HADOOP-6663 | Major | BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file

Fix EOF exception in BlockDecompressorStream when decompressing previous compressed empty file


  • HDFS-903 | Critical | NN should verify images and edit logs on startup

Store fsimage MD5 checksum in VERSION file. Validate checksum when loading a fsimage. Layout version bumped.


  • HDFS-1457 | Major | Limit transmission rate when transfering image between primary and secondary NNs

Add a configuration variable dfs.image.transfer.bandwidthPerSec to allow the user to specify the amount of bandwidth for transferring image and edits. Its default value is 0 indicating no throttling.


  • HDFS-1035 | Major | Generate Eclipse's .classpath file from Ivy config

Added support to auto-generate the Eclipse .classpath file from ivy.


  • MAPREDUCE-1592 | Major | Generate Eclipse's .classpath file from Ivy config

Added support to auto-generate the Eclipse .classpath file from ivy.


  • HADOOP-4675 | Major | Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Support for reporting metrics to Ganglia 3.1 servers


  • MAPREDUCE-1905 | Blocker | Context.setStatus() and progress() api are ignored

Moved the api public Counter getCounter(Enum<?> counterName), public Counter getCounter(String groupName, String counterName) from org.apache.hadoop.mapreduce.TaskInputOutputContext to org.apache.hadoop.mapreduce.TaskAttemptContext


  • HADOOP-7013 | Major | Add boolean field isCorrupt to BlockLocation

This patch has changed the serialization format of BlockLocation.


  • HADOOP-6683 | Minor | the first optimization: ZlibCompressor does not fully utilize the buffer

Improve the buffer utilization of ZlibCompressor to avoid invoking a JNI per write request.


  • HDFS-1560 | Minor | dfs.data.dir permissions should default to 700

The permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.


  • MAPREDUCE-2096 | Blocker | Secure local filesystem IO from symlink vulnerabilities

The TaskTracker now uses the libhadoop JNI library to operate securely on local files when security is enabled. Secure clusters must ensure that libhadoop.so is available to the TaskTracker.


  • HADOOP-7089 | Minor | Fix link resolution logic in hadoop-config.sh

Updates hadoop-config.sh to always resolve symlinks when determining HADOOP_HOME. Bash built-ins or POSIX:2001 compliant cmds are now required.


  • HADOOP-6436 | Major | Remove auto-generated native build files

The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


  • HDFS-1582 | Major | Remove auto-generated native build files

The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


  • HADOOP-7134 | Major | configure files that are generated as part of the released tarball need to have executable bit set

I have just committed this to trunk and branch-0.22. Thanks Roman!


  • MAPREDUCE-2054 | Major | Hierarchical queue implementation broke dynamic queue addition in Dynamic Scheduler

Fix Dynamic Priority Scheduler to work with hierarchical queue names


  • MAPREDUCE-1996 | Trivial | API: Reducer.reduce() method detail misstatement

Fix a misleading documentation note about the usage of Reporter objects in Reducers.


  • MAPREDUCE-1159 | Trivial | Limit Job name on jobtracker.jsp to be 80 char long

Job names on jobtracker.jsp should be 80 characters long at most.


Job ACL files now have permissions set to 600 (previously 700).


  • MAPREDUCE-2251 | Major | Remove mapreduce.job.userhistorylocation config

Remove the now defunct property `mapreduce.job.userhistorylocation`.


  • HADOOP-7156 | Critical | getpwuid_r is not thread-safe on RHEL6

Adds a new configuration hadoop.work.around.non.threadsafe.getpwuid which can be used to enable a mutex around this call to workaround thread-unsafe implementations of getpwuid_r. Users should consult http://wiki.apache.org/hadoop/KnownBrokenPwuidImplementations for a list of such systems.


  • HDFS-1596 | Major | Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

Removed references to the older fs.checkpoint.* properties that resided in core-site.xml


  • HADOOP-7117 | Major | Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

Removed references to the older fs.checkpoint.* properties that resided in core-site.xml


  • HADOOP-6949 | Major | Reduces RPC packet size for primitive arrays, especially long[], which is used at block reporting

Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5. Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.


  • HADOOP-7193 | Minor | Help message is wrong for touchz command.

Updated the help for the touchz command.


  • HADOOP-7229 | Major | Absolute path to kinit in auto-renewal thread

When Hadoop's Kerberos integration is enabled, it is now required that either {{kinit}} be on the path for user accounts running the Hadoop client, or that the {{hadoop.kerberos.kinit.command}} configuration option be manually set to the absolute path to {{kinit}}.


  • MAPREDUCE-2410 | Minor | document multiple keys per reducer oddity in hadoop streaming FAQ

Add an FAQ entry regarding the differences between Java API and Streaming development of MR programs.


  • HDFS-1825 | Major | Remove thriftfs contrib

Removed thriftfs contrib component.


Removed contrib related build targets.


  • HADOOP-7192 | Trivial | fs -stat docs aren't updated to reflect the format features

Updated the web documentation to reflect the formatting abilities of 'fs -stat'.


  • HADOOP-7302 | Major | webinterface.private.actions should not be in common

Option webinterface.private.actions has been renamed to mapreduce.jobtracker.webinterface.trusted and should be specified in mapred-site.xml instead of core-site.xml


Configuration option webinterface.private.actions has been renamed to mapreduce.jobtracker.webinterface.trusted


  • HDFS-1948 | Major | Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'

Adds method to NameNode/ClientProtocol that allows for rude revoke of lease on current lease holder


Confirmed that problem of finding ivy file occurs w/o patch with ant 1.7, and not with patch (with either ant 1.7 or 1.8). Other unit tests are still failing the test steps themselves on my laptop, but that is not due not finding the ivy file.


  • MAPREDUCE-1118 | Major | Capacity Scheduler scheduling information is hard to read / should be tabular format

Add CapacityScheduler servlet to enhance web UI for queue information.