Yahoo! Distribution of Hadoop Change Log

Patches from the following Apache Jira issues have been applied
to this release in the order indicated. This is in addition to
the patches applied from issues referenced in CHANGES.txt.

yahoo-hadoop-0.20.1-3195383008
    HADOOP-6521. Fix backward compatiblity issue with umask when applications 
    use deprecated param dfs.umask in configuration or use 
    FsPermission.setUMask().  (suresh) 

    MAPREDUCE-1372. Fixed a ConcurrentModificationException in jobtracker.
    (Arun C Murthy via yhemanth)

    MAPREDUCE-1316. Fix jobs' retirement from the JobTracker to prevent memory
    leaks via stale references. (Amar Kamat via acmurthy)  

    MAPREDUCE-1342. Fixed deadlock in global blacklisting of tasktrackers. 
    (Amareshwari Sriramadasu via acmurthy)  

    HADOOP-6460. Reinitializes buffers used for serializing responses in ipc
    server on exceeding maximum response size to free up Java heap. (suresh)

    MAPREDUCE-1100. Truncate user logs to prevent TaskTrackers' disks from
    filling up. (Vinod Kumar Vavilapalli via acmurthy) 

    MAPREDUCE-1143. Fix running task counters to be updated correctly
    when speculative attempts are running for a TIP.
    (Rahul Kumar Singh via yhemanth)

    HADOOP-6151, 6281, 6285,  6441. Add HTML quoting of the parameters to all
    of the servlets to prevent XSS attacks. (omalley)

    MAPREDUCE-896. Fix bug in earlier implementation to prevent
    spurious logging in tasktracker logs for absent file paths.
    (Ravi Gummadi via yhemanth)

    MAPREDUCE-676. Fix Hadoop Vaidya to ensure it works for map-only jobs. 
    (Suhas Gogate via acmurthy) 

    HADOOP-5582. Fix Hadoop Vaidya to use new Counters in
    org.apache.hadoop.mapreduce package. (Suhas Gogate via acmurthy) 

    HDFS-595.  umask settings in configuration may now use octal or 
    symbolic instead of decimal.  Update HDFS tests as such.  (jghoman)

    MAPREDUCE-1068. Added a verbose error message when user specifies an
    incorrect -file parameter. (Amareshwari Sriramadasu via acmurthy)  

    MAPREDUCE-1171. Allow the read-error notification in shuffle to be
    configurable. (Amareshwari Sriramadasu via acmurthy) 

    MAPREDUCE-353. Allow shuffle read and connection timeouts to be
    configurable. (Amareshwari Sriramadasu via acmurthy) 

    HADOOP-6428. HttpServer sleeps with negative values (cos)

    HADOOP-6386. NameNode's HttpServer can't instantiate InetSocketAddress:
    IllegalArgumentException is thrown. (cos)

    HDFS-781. Namenode metrics PendingDeletionBlocks is not decremented. (suresh)
 
    MAPREDUCE-1185. Redirect running job url to history url if job is already 
    retired. (Amareshwari Sriramadasu and Sharad Agarwal via sharad)

    MAPREDUCE-754. Fix NPE in expiry thread when a TT is lost. (Amar Kamat 
    via sharad)

    MAPREDUCE-896. Modify permissions for local files on tasktracker before
    deletion so they can be deleted cleanly. (Ravi Gummadi via yhemanth)
 
    HADOOP-5771. Implements unit tests for LinuxTaskController.
    (Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)

    MAPREDUCE-1124. Import Gridmix3 and Rumen. (cdouglas)

    MAPREDUCE-1063. Document gridmix benchmark. (cdouglas)

    HDFS-758. Changes to report status of decommissioining on the namenode web
    UI. (jitendra)

    HADOOP-6234. Add new option dfs.umaskmode to set umask in configuration
    to use octal or symbolic instead of decimal. (Jakob Homan via suresh)

    MAPREDUCE-1147. Add map output counters to new API. (Amar Kamat via
    cdouglas)

    MAPREDUCE-1182. Fix overflow in reduce causing allocations to exceed the
    configured threshold. (cdouglas)

    HADOOP-4933. Fixes a ConcurrentModificationException problem that shows up
    when the history viewer is accessed concurrently.
    (Amar Kamat via ddas)

    MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts for 
    unreferenced files in error conditions.
    (Amareshwari Sriramadasu via yhemanth)

    HADOOP-6203. FsShell rm/rmr error message indicates exceeding Trash quota
    and suggests using -skpTrash, when moving to trash fails.
    (Boris Shkolnik via suresh)

    HADOOP-5675. Do not launch a job if DistCp has no work to do. (Tsz Wo
    (Nicholas), SZE via cdouglas)

    HDFS-457. Better handling of volume failure in Data Node storage,
    This fix is a port from hdfs-0.22 to common-0.20 by Boris Shkolnik.
    Contributed by Erik Steffl

    HDFS-625. Fix NullPointerException thrown from ListPathServlet. 
    Contributed by Suresh Srinivas.

    HADOOP-6343. Log unexpected throwable object caught in RPC.  
    Contributed by Jitendra Nath Pandey

yahoo-hadoop-0.20.1-3092118007:

    MAPREDUCE-1186. Fixed DistributedCache to do a recursive chmod on just the
    per-cache directory, not all of mapred.local.dir.
    (Amareshwari Sriramadasu via acmurthy)

    MAPREDUCE-1231. Add an option to distcp to ignore checksums when used with
    the upgrade option.
    (Jothi Padmanabhan via yhemanth)

yahoo-hadoop-0.20.1-3092118006:

    MAPREDUCE-1219. Fixed JobTracker to not collect per-job metrics, thus
    easing load on it. (Amareshwari Sriramadasu via acmurthy)

    HDFS-761. Fix failure to process rename operation from edits log due to
    quota verification. (suresh)

yahoo-hadoop-0.20.1-3092118005:

    MAPREDUCE-1196. Fix FileOutputCommitter to use the deprecated cleanupJob
    api correctly. (acmurthy)

yahoo-hadoop-0.20.1-3092118004:

    HADOOP-6344.  rm and rmr immediately delete files rather than sending
    to trash, despite trash being enabled, if a user is over-quota. (jhoman)

    MAPREDUCE-1160. Reduce verbosity of log lines in some Map/Reduce classes
    to avoid filling up jobtracker logs on a busy cluster.
    (Ravi Gummadi and Hong Tang via yhemanth)

    HDFS-587. Add ability to run HDFS with MR test on non-default queue,
    also updated junit dependendcy from junit-3.8.1 to junit-4.5 (to make
    it possible to use Configured and Tool to process command line to
    be able to specify a queue). Contributed by Erik Steffl.

    MAPREDUCE-1158. Fix JT running maps and running reduces metrics.
    (sharad)

    MAPREDUCE-947. Fix bug in earlier implementation that was
    causing unit tests to fail.
    (Ravi Gummadi via yhemanth)

    MAPREDUCE-1062. Fix MRReliabilityTest to work with retired jobs
    (Contributed by Sreekanth Ramakrishnan)

    MAPREDUCE-1090. Modified log statement in TaskMemoryManagerThread to
    include task attempt id. (yhemanth)

    MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while
    holding a global lock. (Amareshwari Sriramadasu via acmurthy)

    MAPREDUCE-1048. Add occupied/reserved slot usage summary on
    jobtracker UI. (Amareshwari Sriramadasu via sharad)

    MAPREDUCE-1103. Added more metrics to Jobtracker. (sharad)

    MAPREDUCE-947. Added commitJob and abortJob apis to OutputCommitter.
    Enhanced FileOutputCommitter to create a _SUCCESS file for successful
    jobs. (Amar Kamat & Jothi Padmanabhan via acmurthy) 

    MAPREDUCE-1105. Remove max limit configuration in capacity scheduler in
    favor of max capacity percentage thus allowing the limit to go over
    queue capacity. (Rahul Kumar Singh via yhemanth)

    MAPREDUCE-1086. Setup Hadoop logging environment for tasks to point to
    task related parameters. (Ravi Gummadi via yhemanth)

    MAPREDUCE-739. Allow relative paths to be created inside archives. 
    (mahadev)

    HADOOP-6097. Multiple bugs w/ Hadoop archives (mahadev)

    HADOOP-6231. Allow caching of filesystem instances to be disabled on a
    per-instance basis (ben slusky via mahadev)

    MAPREDUCE-826.  harchive doesn't use ToolRunner / harchive returns 0 even
    if the job fails with exception (koji via mahadev)

    HDFS-686. NullPointerException is thrown while merging edit log and
    image. (hairong)

    HDFS-709. Fix TestDFSShell failure due to rename bug introduced by 
    HDFS-677. (suresh)

    HDFS-677. Rename failure when both source and destination quota exceeds
    results in deletion of source. (suresh)

    HADOOP-6284. Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to
    hadoop-config.sh so that it allows setting java command options for
    JAVA_PLATFORM.  (Koji Noguchi via szetszwo)

    MAPREDUCE-732. Removed spurious log statements in the node
    blacklisting logic. (Sreekanth Ramakrishnan via yhemanth)

    MAPREDUCE-144. Includes dump of the process tree in task diagnostics when 
    a task is killed due to exceeding memory limits.
    (Vinod Kumar Vavilapalli via yhemanth)

    MAPREDUCE-979. Fixed JobConf APIs related to memory parameters to 
    return values of new configuration variables when deprecated 
    variables are disabled. (Sreekanth Ramakrishnan via yhemanth)

    MAPREDUCE-277. Makes job history counters available on the job history
    viewers. (Jothi Padmanabhan via ddas)

    HADOOP-5625. Add operation duration to clienttrace. (Lei Xu 
    via cdouglas)

    HADOOP-5222. Add offset to datanode clienttrace. (Lei Xu via cdouglas)

    HADOOP-6218. Adds a feature where TFile can be split by Record
    Sequence number. Contributed by Hong Tang and Raghu Angadi.

yahoo-hadoop-0.20.1-3041192001

    MAPREDUCE-1088. Changed permissions on JobHistory files on local disk to
    0744. Contributed by Arun C. Murthy.

    HADOOP-6304. Use java.io.File.set{Readable|Writable|Executable} where
    possible in RawLocalFileSystem. Contributed by Arun C. Murthy.
    
yahoo-hadoop-0.20.1-3041192000

    MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band
    heartbeat on task-completion for better job-latency. Contributed by
    Arun C. Murthy
    Configuration changes:
      add mapreduce.tasktracker.outofband.heartbeat

    MAPREDUCE-1030. Fix capacity-scheduler to assign a map and a reduce task
    per-heartbeat. Contributed by Rahuk K Singh.

    MAPREDUCE-1028. Fixed number of slots occupied by cleanup tasks to one
    irrespective of slot size for the job. Contributed by Ravi Gummadi. 

    MAPREDUCE-964. Fixed start and finish times of TaskStatus to be
    consistent, thereby fixing inconsistencies in metering tasks.
    Contributed by Sreekanth Ramakrishnan.

    HADOOP-5976. Add a new command, classpath, to the hadoop 
    script. Contributed by Owen O'Malley and Gary Murry

    HADOOP-5784. Makes the number of heartbeats that should arrive 
    a second at the JobTracker configurable. Contributed by 
    Amareshwari Sriramadasu.

    MAPREDUCE-945. Modifies MRBench and TestMapRed to use 
    ToolRunner so that options such as queue name can be 
    passed via command line. Contributed by Sreekanth Ramakrishnan.

yahoo-hadoop-0.20.0-3006291003

    HADOOP:5420 Correct bug in earlier implementation
    by Arun C. Murthy

    HADOOP-5363 Add support for proxying connections to multiple 
    clusters with different versions to hdfsproxy. Contributed 
    by Zhiyong Zhang

    HADOOP-5780. Improve per block message prited by -metaSave 
    in HDFS. (Raghu Angadi)

yahoo-hadoop-0.20.0-2957040010
 
    HADOOP-6227. Fix Configuration to allow final parameters to be set 
    to null and prevent them from being overridden. Contributed by 
    Amareshwari Sriramadasu.

yahoo-hadoop-0.20.0-2957040007

    MAPREDUCE-430  Added patch supplied by Amar Kamat to allow 
                   roll forward on branch to includ externally committed
                   patch.

yahoo-hadoop-0.20.0-2957040006

    MAPREDUCE-768. Provide an option to dump jobtracker configuration in 
    JSON format to standard output. Contributed by V.V.Chaitanya

yahoo-hadoop-0.20.0-2957040004

    MAPREDUCE-834 Correct an issue created by merging this issue with
    patch attached to external Jira.

yahoo-hadoop-0.20.0-2957040003

    HADOOP-6184 Provide an API to dump Configuration in a JSON format. 
    Contributed by V.V.Chaitanya Krishna.

    MAPREDUCE-745  Patch added for this issue to allow branch-0.20 to 
    merge cleanly.

yahoo-hadoop-0.20.0-2957040000

    MAPREDUCE:478 Allow map and reduce jvm parameters, environment 
    variables and ulimit to be set separately.

    MAPREDUCE:682 Removes reservations on tasktrackers which are blacklisted. 
    Contributed by Sreekanth Ramakrishnan.

    HADOOP:5420 Support killing of process groups in LinuxTaskController 
    binary

    HADOOP-5488 Removes the pidfile management for the Task JVM from the 
    framework and instead passes the PID back and forth between the 
    TaskTracker and the Task processes. Contributed by Ravi Gummadi.

    MAPREDUCE:467 Provide ability to collect statistics about total tasks and 
    succeeded tasks in different time windows.

yahoo-hadoop-0.20.0.2949784002:

    MAPREDUCE-817. Add a cache for retired jobs with minimal job
    info and provide a way to access history file url

    MAPREDUCE-814. Provide a way to configure completed job history
    files to be on HDFS.

    MAPREDUCE-838 Fixes a problem in the way commit of task outputs
    happens. The bug was that even if commit failed, the task would be
    declared as successful. Contributed by Amareshwari Sriramadasu.

yahoo-hadoop-0.20.0.2902658004:

    MAPREDUCE-809 Fix job-summary logs to correctly record final status of 
    FAILED and KILLED jobs.  
    http://issues.apache.org/jira/secure/attachment/12414726/MAPREDUCE-809_0_20090728_yhadoop20.patch 

    MAPREDUCE-740 Log a job-summary at the end of a job, while
    allowing it to be configured to use a custom appender if desired.
    http://issues.apache.org/jira/secure/attachment/12413941/MAPREDUCE-740_2_20090717_yhadoop20.patch

    MAPREDUCE-771 Fixes a bug which delays normal jobs in favor of
    high-ram jobs.
    http://issues.apache.org/jira/secure/attachment/12413990/MAPREDUCE-771-20.patch

    HADOOP-5420 Support setsid based kill in LinuxTaskController.
    http://issues.apache.org/jira/secure/attachment/12414735/5420-ydist.patch.txt

    MAPREDUCE-733 Fixes a bug that when a task tracker is killed ,
    it throws exception. Instead it should catch it and process it and
    allow the rest of the flow to go through
    http://issues.apache.org/jira/secure/attachment/12413015/MAPREDUCE-733-ydist.patch

    MAPREDUCE-734 Fixes a bug which prevented hi ram jobs from being
    removed from the scheduler queue.
    http://issues.apache.org/jira/secure/attachment/12413035/MAPREDUCE-734-20.patch

    MAPREDUCE-693  Fixes a bug that when a job is submitted and the
    JT is restarted (before job files have been written) and the job
    is killed after recovery, the conf files fail to be moved to the
    "done" subdirectory.
    http://issues.apache.org/jira/secure/attachment/12412823/MAPREDUCE-693-v1.2-branch-0.20.patch

    MAPREDUCE-722 Fixes a bug where more slots are getting reserved
    for HiRAM job tasks than required.
    http://issues.apache.org/jira/secure/attachment/12412744/MAPREDUCE-722.1.txt

    MAPREDUCE-683 TestJobTrackerRestart failed because of stale
    filemanager cache (which was created once per jvm). This patch makes
    sure that the filemanager is inited upon every JobHistory.init()
    and hence upon every restart. Note that this wont happen in production
    as upon a restart the new jobtracker will start in a new jvm and
    hence a new cache will be created.
    http://issues.apache.org/jira/secure/attachment/12412743/MAPREDUCE-683-v1.2.1-branch-0.20.patch

    MAPREDUCE-709 Fixes a bug where node health check script does
    not display the correct message on timeout.
    http://issues.apache.org/jira/secure/attachment/12412711/mapred-709-ydist.patch

    MAPREDUCE-708 Fixes a bug where node health check script does
    not refresh the "reason for blacklisting".
    http://issues.apache.org/jira/secure/attachment/12412706/MAPREDUCE-708-ydist.patch

    MAPREDUCE-522 Rewrote TestQueueCapacities to make it simpler
    and avoid timeout errors.
    http://issues.apache.org/jira/secure/attachment/12412472/mapred-522-ydist.patch

    MAPREDUCE-532 Provided ability in the capacity scheduler to
    limit the number of slots that can be concurrently used per queue
    at any given time.
    http://issues.apache.org/jira/secure/attachment/12412592/MAPREDUCE-532-20.patch

    MAPREDUCE-211 Provides ability to run a health check script on
    the tasktracker nodes and blacklist nodes if they are unhealthy.
    Contributed by Sreekanth Ramakrishnan.
    http://issues.apache.org/jira/secure/attachment/12412161/mapred-211-internal.patch

    MAPREDUCE-516 Remove .orig file included by mistake.
    http://issues.apache.org/jira/secure/attachment/12412108/HADOOP-5964_2_20090629_yhadoop.patch

    MAPREDUCE-416 Moves the history file to a "done" folder whenever
    a job completes.
    http://issues.apache.org/jira/secure/attachment/12411938/MAPREDUCE-416-v1.6-branch-0.20.patch

    HADOOP-5980 Previously, task spawned off by LinuxTaskController
    didn't get LD_LIBRARY_PATH in their environment. The tasks will now
    get same LD_LIBRARY_PATH value as when spawned off by
    DefaultTaskController.
    http://issues.apache.org/jira/secure/attachment/12410825/hadoop-5980-v20.patch

    HADOOP-5981 This issue completes the feature mentioned in
    HADOOP-2838. HADOOP-2838 provided a way to set env variables in
    child process. This issue provides a way to inherit tt's env variables
    and append or reset it. So now X=$X:y will inherit X (if there) and
    append y to it.
    http://issues.apache.org/jira/secure/attachment/12410454/hadoop5981-branch-20-example.patch

    HADOOP-5419  This issue is to provide an improvement on the
    existing M/R framework to let users know which queues they have
    access to, and for what operations. One use case for this would
    that currently there is no easy way to know if the user has access
    to submit jobs to a queue, until it fails with an access control
    exception.
    http://issues.apache.org/jira/secure/attachment/12410824/hadoop-5419-v20.2.patch

    HADOOP-5420 Support setsid based kill in LinuxTaskController.
    http://issues.apache.org/jira/secure/attachment/12414735/5420-ydist.patch.txt

    HADOOP-5643 Added the functionality to refresh jobtrackers node
    list via command line (bin/hadoop mradmin -refreshNodes). The command
    should be run as the jobtracker owner (jobtracker process owner)
    or from a super group (mapred.permissions.supergroup).
    http://issues.apache.org/jira/secure/attachment/12410619/Fixed%2B5643-0.20-final


    HADOOP-2838 Now the users can set environment variables using
    mapred.child.env. They can do the following X=Y : set X to Y X=$X:Y
    : Append Y to X (which should be taken from the tasktracker)
    http://issues.apache.org/jira/secure/attachment/12409895/HADOOP-2838-v2.2-branch-20-example.patch
    
    HADOOP-5818. Revert the renaming from FSNamesystem.checkSuperuserPrivilege
    to checkAccess by HADOOP-5643.  (Amar Kamat via szetszwo)
    https://issues.apache.org/jira/secure/attachment/12409835/5818for0.20.patch

    HADOOP-5801. Fixes the problem: If the hosts file is changed across restart
    then it should be refreshed upon recovery so that the excluded hosts are
    lost and the maps are re-executed. (Amar Kamat via ddas)
    https://issues.apache.org/jira/secure/attachment/12409834/5801-0.20.patch

    HADOOP-5643. HADOOP-5643. Adds a way to decommission TaskTrackers 
    while the JobTracker is running. (Amar Kamat via ddas)
    https://issues.apache.org/jira/secure/attachment/12409833/Fixed+5643-0.20

    HADOOP-5419. Provide a facility to query the Queue ACLs for the
    current user.  (Rahul Kumar Singh via yhemanth)
    http://issues.apache.org/jira/secure/attachment/12409323/hadoop-5419-v20.patch

    HADOOP-5733. Add map/reduce slot capacity and blacklisted capacity to
    JobTracker metrics. (Sreekanth Ramakrishnan via cdouglas)
    http://issues.apache.org/jira/secure/attachment/12409322/hadoop-5733-v20.patch

    HADOOP-5738. Split "waiting_tasks" JobTracker metric into waiting maps and
    waiting reduces. (Sreekanth Ramakrishnan via cdouglas) 
    https://issues.apache.org/jira/secure/attachment/12409321/5738-y20.patch

    HADOOP-4842. Streaming now allows specifiying a command for the combiner.
    (Amareshwari Sriramadasu via ddas)
    http://issues.apache.org/jira/secure/attachment/12402355/patch-4842-3.txt

    HADOOP-4490. Provide ability to run tasks as job owners.
    (Sreekanth Ramakrishnan via yhemanth)
    http://issues.apache.org/jira/secure/attachment/12409318/hadoop-4490-br20-3.patch
    https://issues.apache.org/jira/secure/attachment/12410170/hadoop-4490-br20-3.2.patch

    HADOOP-5442. Paginate jobhistory display and added some search
    capabilities. (Amar Kamat via acmurthy)
    http://issues.apache.org/jira/secure/attachment/12402301/HADOOP-5442-v1.12.patch

    HADOOP-3327. Improves handling of READ_TIMEOUT during map output copying.
    (Amareshwari Sriramadasu via ddas)
    http://issues.apache.org/jira/secure/attachment/12399449/patch-3327-2.txt

    HADOOP-5113. Fixed logcondense to remove files for usernames
    beginning with characters specified in the -l option.
    (Peeyush Bishnoi via yhemanth)
    http://issues.apache.org/jira/secure/attachment/12409317/hadoop-5113-0.18.txt

    HADOOP-2898. Provide an option to specify a port range for
    Hadoop services provisioned by HOD.
    (Peeyush Bishnoi via yhemanth)
    http://issues.apache.org/jira/secure/attachment/12409316/hadoop-2898-0.20.txt

    HADOOP-4930. Implement a Linux native executable that can be used to
    launch tasks as users. (Sreekanth Ramakrishnan via yhemanth)
    http://issues.apache.org/jira/secure/attachment/12409402/hadoop-4930v20.patch