Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #8

Merged
merged 315 commits into from
Oct 20, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
315 commits
Select commit Hold shift + click to select a range
116016b
[SPARK-3582][SQL] not limit argument type for hive simple udf
adrian-wang Sep 23, 2014
3b8eefa
[SPARK-3536][SQL] SELECT on empty parquet table throws exception
ravipesala Sep 23, 2014
e73b48a
SPARK-2745 [STREAMING] Add Java friendly methods to Duration class
srowen Sep 23, 2014
ae60f8f
[SPARK-3481][SQL] removes the evil MINOR HACK
scwf Sep 23, 2014
1c62f97
[SPARK-3268][SQL] DoubleType, FloatType and DecimalType modulus support
gvramana Sep 23, 2014
a08153f
[SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLCon…
marmbrus Sep 23, 2014
8dfe79f
[SPARK-3647] Add more exceptions to Guava relocation.
Sep 23, 2014
d79238d
SPARK-3612. Executor shouldn't quit if heartbeat message fails to rea…
sryza Sep 23, 2014
b3fef50
[SPARK-3653] Respect SPARK_*_MEMORY for cluster mode
andrewor14 Sep 23, 2014
729952a
[SPARK-1853] Show Streaming application code context (file, line numb…
mubarak Sep 23, 2014
c429126
[Build] Diff from branch point
nchammas Sep 24, 2014
50f8633
[SPARK-3659] Set EC2 version to 1.1.0 and update version map
shivaram Sep 24, 2014
c854b9f
[SPARK-3634] [PySpark] User's module should take precedence over syst…
davies Sep 24, 2014
bb96012
[SPARK-3679] [PySpark] pickle the exact globals of functions
davies Sep 24, 2014
74fb2ec
[SPARK-3615][Streaming]Fix Kafka unit test hard coded Zookeeper port …
jerryshao Sep 25, 2014
8ca4ecb
[SPARK-546] Add full outer join to RDD and DStream.
staple Sep 25, 2014
b848771
[SPARK-2778] [yarn] Add yarn integration tests.
Sep 25, 2014
c3f2a85
SPARK-2932 [STREAMING] Move MasterFailureTest out of "main" source di…
srowen Sep 25, 2014
9b56e24
[SPARK-3690] Closing shuffle writers we swallow more important exception
epahomov Sep 25, 2014
ff637c9
[SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncac…
staple Sep 25, 2014
0dc868e
[SPARK-3584] sbin/slaves doesn't work when we use password authentica…
sarutak Sep 25, 2014
86bce76
SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHa…
zsxwing Sep 26, 2014
b235e01
[SPARK-3686][STREAMING] Wait for sink to commit the channel before ch…
harishreedharan Sep 26, 2014
1aa549b
[SPARK-3478] [PySpark] Profile the Python tasks
davies Sep 26, 2014
d16e161
SPARK-3639 | Removed settings master in examples
aniketbhatnagar Sep 26, 2014
ec9df6a
[SPARK-3614][MLLIB] Add minimumOccurence filtering to IDF
rnowling Sep 26, 2014
30461c6
[SPARK-3695]shuffle fetch fail output
adrian-wang Sep 26, 2014
8da10bf
[SPARK-3476] Remove outdated memory checks in Yarn
andrewor14 Sep 26, 2014
0ec2d2e
[SPARK-3531][SQL]select null from table would throw a MatchError
adrian-wang Sep 26, 2014
7364fa5
[SPARK-3393] [SQL] Align the log4j configuration for Spark & SparkSQLCLI
chenghao-intel Sep 26, 2014
f872e4f
Revert "[SPARK-3478] [PySpark] Profile the Python tasks"
JoshRosen Sep 26, 2014
5e34855
[SPARK-3543] Write TaskContext in Java and expose it through a static…
ScrapCodes Sep 27, 2014
a3feaf0
Close #2194.
rxin Sep 27, 2014
e976ca2
Slaves file is now a template.
sarahgerweck Sep 27, 2014
0cdcdd2
[Build]remove spark-staging-1030
scwf Sep 27, 2014
f0eea76
[SQL][DOCS] Clarify that the server is for JDBC and ODBC
marmbrus Sep 27, 2014
d8a9d1d
[SPARK-3675][SQL] Allow starting a JDBC server on an existing context
marmbrus Sep 27, 2014
9e8ced7
stop, start and destroy require the EC2_REGION
jeffsteinmetz Sep 27, 2014
2d972fd
[SPARK-1021] Defer the data-driven computation of partition bounds in…
erikerlandson Sep 27, 2014
436a773
Minor cleanup to tighten visibility and remove compilation warning.
rxin Sep 27, 2014
66107f4
Docs : use "--total-executor-cores" rather than "--cores" after spark…
CrazyJvm Sep 27, 2014
0800881
[SPARK-3676][SQL] Fix hive test suite failure due to diffs in JDK 1.6…
scwf Sep 27, 2014
f0c7e19
[SPARK-3680][SQL] Fix bug caused by eager typing of HiveGenericUDFs
marmbrus Sep 27, 2014
0d8cdf0
[SPARK-3681] [SQL] [PySpark] fix serialization of List and Map in Sch…
davies Sep 27, 2014
5b922bb
[SPARK-3543] Clean up Java TaskContext implementation.
rxin Sep 27, 2014
2482329
[SPARK-3389] Add Converter for ease of Parquet reading in PySpark
laserson Sep 28, 2014
9966d1a
SPARK-CORE [SPARK-3651] Group common CoarseGrainedSchedulerBackend va…
tigerquoll Sep 28, 2014
66e1c40
Minor fix for the previous commit.
rxin Sep 28, 2014
6918012
SPARK-3699: SQL and Hive console tasks now clean up appropriately
willb Sep 28, 2014
1f13a40
[SPARK-3715][Docs]minor typo
WangTaoTheTonic Sep 29, 2014
8e87418
Revert "[SPARK-1021] Defer the data-driven computation of partition b…
rxin Sep 29, 2014
25164a8
SPARK-2761 refactor #maybeSpill into Spillable
Sep 29, 2014
f350cd3
[SPARK-3543] TaskContext remaining cleanup work.
rxin Sep 29, 2014
0dc2b63
[SPARK-1545] [mllib] Add Random Forests
jkbradley Sep 29, 2014
1651cc1
[EC2] Cleanup Python parens and disk dict
nchammas Sep 29, 2014
657bdff
[CORE] Bugfix: LogErr format in DAGScheduler.scala
liyezhang556520 Sep 29, 2014
aedd251
[EC2] Sort long, manually-inputted dictionaries
nchammas Sep 29, 2014
587a0cd
[MLlib] [SPARK-2885] DIMSUM: All-pairs similarity
rezazadeh Sep 29, 2014
dab1b0a
[SPARK-3032][Shuffle] Fix key comparison integer overflow introduced …
jerryshao Sep 29, 2014
e43c72f
Add more debug message for ManagedBuffer
rxin Sep 29, 2014
0bbe7fa
[SPARK-3007][SQL]Add Dynamic Partition support to Spark Sql hive
baishuo Sep 29, 2014
51229ff
[graphX] GraphOps: random pick vertex bug
yingjieMiao Sep 30, 2014
dc30e45
Fixed the condition in StronglyConnectedComponents Issue: SPARK-3635
Sep 30, 2014
210404a
Minor cleanup of code.
rxin Sep 30, 2014
6b79bfb
[SPARK-3613] Record only average block size in MapStatus for large st…
rxin Sep 30, 2014
de700d3
[SPARK-3709] Executors don't always report broadcast block removal pr…
rxin Sep 30, 2014
b167a8c
[SPARK-3734] DriverRunner should not read SPARK_HOME from submitter's…
JoshRosen Sep 30, 2014
b64fcbd
Revert "[SPARK-3007][SQL]Add Dynamic Partition support to Spark Sql h…
pwendell Sep 30, 2014
157e7d0
HOTFIX: Ignore flaky tests in YARN
pwendell Sep 30, 2014
ab6dd80
[SPARK-3356] [DOCS] Document when RDD elements' ordering within parti…
srowen Sep 30, 2014
a01a309
SPARK-3745 - fix check-license to properly download and check jar
shaneknapp Sep 30, 2014
d3a3840
[Build] Post commit hash with timeout messages
nchammas Sep 30, 2014
8764fe3
SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention
srowen Sep 30, 2014
6c696d7
Remove compiler warning from TaskContext change.
rxin Sep 30, 2014
d75496b
[SPARK-3701][MLLIB] update python linalg api and small fixes
mengxr Oct 1, 2014
c5414b6
[SPARK-3478] [PySpark] Profile the Python tasks
davies Oct 1, 2014
eb43043
[SPARK-3747] TaskResultGetter could incorrectly abort a stage if it c…
rxin Oct 1, 2014
7bf6cc9
[SPARK-3751] [mllib] DecisionTree: example update + print options
jkbradley Oct 1, 2014
3888ee2
[SPARK-3748] Log thread name in unit test logs
rxin Oct 1, 2014
0bfd3af
[SPARK-3757] mvn clean doesn't delete some files
tsudukim Oct 1, 2014
abf588f
[SPARK-3749] [PySpark] fix bugs in broadcast large closure of RDD
davies Oct 1, 2014
dcb2f73
SPARK-2626 [DOCS] Stop SparkContext in all examples
srowen Oct 1, 2014
6390aae
[SPARK-3755][Core] Do not bind port 1 - 1024 to server in spark
scwf Oct 1, 2014
2fedb5d
[SPARK-3756] [Core]check exception is caused by an address-port colli…
scwf Oct 1, 2014
8cc70e7
[SQL] Kill dangerous trailing space in query string
liancheng Oct 1, 2014
b81ee0b
Typo error in KafkaWordCount example
gasparms Oct 1, 2014
17333c7
Python SQL Example Code
jyotiska Oct 1, 2014
fcad3fa
[SPARK-3746][SQL] Lock hive client when creating tables
marmbrus Oct 1, 2014
d61f2c1
[SPARK-3658][SQL] Start thrift server as a daemon
WangTaoTheTonic Oct 1, 2014
3508ce8
[SPARK-3708][SQL] Backticks aren't handled correctly is aliases
ravipesala Oct 1, 2014
f315fb7
[SPARK-3705][SQL] Add case for VoidObjectInspector to cover NullType
scwf Oct 1, 2014
f84b228
[SPARK-3593][SQL] Add support for sorting BinaryType
gvramana Oct 1, 2014
a31f4ff
[SQL] Made Command.sideEffectResult protected
liancheng Oct 1, 2014
4e79970
Revert "[SPARK-3755][Core] Do not bind port 1 - 1024 to server in spark"
pwendell Oct 1, 2014
45e058c
[SPARK-3729][SQL] Do all hive session state initialization in lazy val
marmbrus Oct 1, 2014
1b9f0d6
[SPARK-3704][SQL] Fix ColumnValue type for Short values in thrift server
scwf Oct 1, 2014
93861a5
SPARK-3638 | Forced a compatible version of http client in kinesis-as…
aniketbhatnagar Oct 2, 2014
29c3513
[SPARK-3446] Expose underlying job ids in FutureAction.
Oct 2, 2014
f341e1c
MAINTENANCE: Automated closing of pull requests.
pwendell Oct 2, 2014
bbdf1de
[SPARK-3371][SQL] Renaming a function expression with group by gives …
ravipesala Oct 2, 2014
6e27cb6
SPARK-1767: Prefer HDFS-cached replicas when scheduling data-local tasks
Oct 2, 2014
5b4a5b1
[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1"…
cocoatomo Oct 2, 2014
82a6a08
[SQL][Docs] Update the output of printSchema and fix a typo in SQL pr…
yhuai Oct 2, 2014
b4fb7b8
Modify default YARN memory_overhead-- from an additive constant to a …
nishkamravi2 Oct 2, 2014
c6469a0
[SPARK-3766][Doc]Snappy is also the default compress codec for broadc…
scwf Oct 2, 2014
5db78e6
[SPARK-3495] Block replication fails continuously when the replicatio…
tdas Oct 2, 2014
127e97b
[SPARK-3632] ConnectionManager can run out of receive threads with au…
tgravescs Oct 2, 2014
8081ce8
[SPARK-3755][Core] avoid trying privileged port when request a non-pr…
scwf Oct 3, 2014
42d5077
[DEPLOY] SPARK-3759: Return the exit code of the driver process
Oct 3, 2014
7de4e50
[SQL] Initilize session state before creating CommandProcessor
marmbrus Oct 3, 2014
1c90347
[SPARK-3654][SQL] Implement all extended HiveQL statements/commands w…
ravipesala Oct 3, 2014
2e4eae3
[SPARK-3366][MLLIB]Compute best splits distributively in decision tree
Oct 3, 2014
f0811f9
SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR
EugenCepoi Oct 3, 2014
9d320e2
[SPARK-3696]Do not override the user-difined conf_dir
WangTaoTheTonic Oct 3, 2014
22f8e1e
[SPARK-2693][SQL] Supported for UDAF Hive Aggregates like PERCENTILE
ravipesala Oct 3, 2014
fbe8e98
[SPARK-2778] [yarn] Add workaround for race in MiniYARNCluster.
Oct 3, 2014
bec0d0e
[SPARK-3007][SQL] Adds dynamic partitioning support
liancheng Oct 3, 2014
6a1d48f
[SPARK-3212][SQL] Use logical plan matching instead of temporary tabl…
marmbrus Oct 3, 2014
a8c52d5
[SPARK-3535][Mesos] Fix resource handling.
brndnmtthws Oct 3, 2014
358d7ff
[SPARK-3775] Not suitable error message in spark-shell.cmd
tsudukim Oct 3, 2014
e5566e0
[SPARK-3774] typo comment in bin/utils.sh
tsudukim Oct 3, 2014
30abef1
[SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA.
Oct 3, 2014
1eb8389
[SPARK-3763] The example of building with sbt should be "sbt assembly…
sarutak Oct 3, 2014
79e45c9
[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / Hi…
sarutak Oct 3, 2014
cf1d32e
[SPARK-1860] More conservative app directory cleanup.
mccheah Oct 3, 2014
32fad42
[SPARK-3597][Mesos] Implement `killTask`.
brndnmtthws Oct 5, 2014
a7c7313
SPARK-1656: Fix potential resource leaks
zsxwing Oct 5, 2014
1b97a94
[SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop…
liancheng Oct 5, 2014
e222221
HOTFIX: Fix unicode error in merge script.
pwendell Oct 5, 2014
79b2108
[Minor] Trivial fix to make codes more readable
viirya Oct 6, 2014
58f5361
[SPARK-3792][SQL] Enable JavaHiveQLSuite
scwf Oct 6, 2014
34b97a0
[SPARK-3645][SQL] Makes table caching eager by default and adds synta…
liancheng Oct 6, 2014
90897ea
[SPARK-3776][SQL] Wrong conversion to Catalyst for Option[Product]
Oct 6, 2014
8d22dbb
SPARK-3794 [CORE] Building spark core fails due to inadvertent depend…
srowen Oct 6, 2014
fd7b155
Rectify gereneric parameter names between SparkContext and Accumulabl…
Oct 6, 2014
c9ae79f
[SPARK-3765][Doc] Add test information to sbt build docs
scwf Oct 6, 2014
20ea54c
[SPARK-2461] [PySpark] Add a toString method to GeneralizedLinearModel
sryza Oct 6, 2014
4f01265
[SPARK-3786] [PySpark] speedup tests
davies Oct 6, 2014
2300eb5
[SPARK-3773][PySpark][Doc] Sphinx build warning
cocoatomo Oct 6, 2014
69c3f44
[SPARK-3479] [Build] Report failed test category
nchammas Oct 6, 2014
70e824f
[SPARK-3627] - [yarn] - fix exit code and final status reporting to RM
tgravescs Oct 7, 2014
d65fd55
[SPARK-3827] Very long RDD names are not rendered properly in web UI
falaki Oct 7, 2014
12e2551
[SPARK-3808] PySpark fails to start in Windows
tsudukim Oct 7, 2014
6550329
[SPARK-3762] clear reference of SparkEnv after stop
davies Oct 7, 2014
bc87cc4
[SPARK-3731] [PySpark] fix memory leak in PythonRDD
davies Oct 7, 2014
553737c
[SPARK-3825] Log more detail when unrolling a block fails
andrewor14 Oct 7, 2014
446063e
[SPARK-3777] Display "Executor ID" for Tasks in Stage page
zsxwing Oct 7, 2014
3d7b36e
[SPARK-3790][MLlib] CosineSimilarity Example
rezazadeh Oct 7, 2014
098c734
[SPARK-3486][MLlib][PySpark] PySpark support for Word2Vec
Ishiihara Oct 7, 2014
b32bb72
[SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10
dbtsai Oct 7, 2014
5912ca6
[SPARK-3398] [EC2] Have spark-ec2 intelligently wait for specific clu…
nchammas Oct 7, 2014
b69c9fb
[SPARK-3829] Make Spark logo image on the header of HistoryPage as a …
sarutak Oct 7, 2014
798ed22
[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python …
davies Oct 8, 2014
c781843
[SPARK-3836] [REPL] Spark REPL optionally propagate internal exceptions
ahirreddy Oct 8, 2014
35afdfd
[SPARK-3710] Fix Yarn integration tests on Hadoop 2.2.
Oct 8, 2014
7fca8f4
[SPARK-3788] [yarn] Fix compareFs to do the right thing for HDFS name…
Oct 8, 2014
f18dd59
[SPARK-3848] yarn alpha doesn't build on master
sarutak Oct 8, 2014
bc44187
HOTFIX: Use correct Hadoop profile in build
pwendell Oct 8, 2014
b92bd5a
[SPARK-3841] [mllib] Pretty-print params for ML examples
jkbradley Oct 8, 2014
add174a
[SPARK-3843][Minor] Cleanup scalastyle.txt at the end of running dev/…
sarutak Oct 8, 2014
a85f24a
[SPARK-3831] [SQL] Filter rule Improvement and bool expression optimi…
sarutak Oct 9, 2014
a42cc08
[SPARK-3713][SQL] Uses JSON to serialize DataType objects
liancheng Oct 9, 2014
00b7791
[SQL][Doc] Keep Spark SQL README.md up to date
Ishiihara Oct 9, 2014
4ec9319
[SPARK-3707] [SQL] Fix bug of type coercion in DIV
chenghao-intel Oct 9, 2014
e703357
[SPARK-3810][SQL] Makes PreInsertionCasts handle partitions properly
liancheng Oct 9, 2014
3e4f09d
[SQL] Prevents per row dynamic dispatching and pattern matching when …
liancheng Oct 9, 2014
bcb1ae0
[SPARK-3857] Create joins package for various join operators.
rxin Oct 9, 2014
f706823
Fetch from branch v4 in Spark EC2 script.
JoshRosen Oct 9, 2014
9c439d3
[SPARK-3856][MLLIB] use norm operator after breeze 0.10 upgrade
mengxr Oct 9, 2014
b9df8af
[SPARK-2805] Upgrade to akka 2.3.4
avati Oct 9, 2014
86b3929
[SPARK-3844][UI] Truncate appName in WebUI if it is too long
mengxr Oct 9, 2014
13cab5b
add spark.driver.memory to config docs
nartz Oct 9, 2014
14f222f
[SPARK-3158][MLLIB]Avoid 1 extra aggregation for DecisionTree training
chouqin Oct 9, 2014
1e0aa4d
[Minor] use norm operator after breeze 0.10 upgrade
witgo Oct 9, 2014
73bf3f2
[SPARK-3741] Make ConnectionManager propagate errors properly and add…
zsxwing Oct 9, 2014
b77a02f
[SPARK-3752][SQL]: Add tests for different UDF's
vidaha Oct 9, 2014
752e90f
[SPARK-3711][SQL] Optimize where in clause filter queries
Oct 9, 2014
2c88513
[SPARK-3806][SQL] Minor fix for CliSuite
scwf Oct 9, 2014
e7edb72
[SPARK-3868][PySpark] Hard to recognize which module is tested from u…
cocoatomo Oct 9, 2014
ec4d40e
[SPARK-3853][SQL] JSON Schema support for Timestamp fields
Oct 9, 2014
1faa113
Revert "[SPARK-2805] Upgrade to akka 2.3.4"
pwendell Oct 9, 2014
1c7f0ab
[SPARK-3339][SQL] Support for skipping json lines that fail to parse
yhuai Oct 9, 2014
0c0e09f
[SPARK-3412][SQL]add missing row api
adrian-wang Oct 9, 2014
bc3b6cb
[SPARK-3858][SQL] Pass the generator alias into logical plan node
Oct 9, 2014
ac30205
[SPARK-3813][SQL] Support "case when" conditional functions in Spark …
ravipesala Oct 9, 2014
4e9b551
[SPARK-3772] Allow `ipython` to be used by Pyspark workers; IPython s…
JoshRosen Oct 9, 2014
2837bf8
[SPARK-3798][SQL] Store the output of a generator in a val
marmbrus Oct 10, 2014
363baac
SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Uti…
srowen Oct 10, 2014
edf02da
[SPARK-3654][SQL] Unifies SQL and HiveQL parsers
liancheng Oct 10, 2014
421382d
[SPARK-3824][SQL] Sets in-memory table default storage level to MEMOR…
liancheng Oct 10, 2014
6f98902
[SPARK-3834][SQL] Backticks not correctly handled in subquery aliases
ravipesala Oct 10, 2014
411cf29
[SPARK-2805] Upgrade Akka to 2.3.4
avati Oct 10, 2014
90f73fc
[SPARK-3889] Attempt to avoid SIGBUS by not mmapping files in Connect…
aarondav Oct 10, 2014
72f36ee
[SPARK-3886] [PySpark] use AutoBatchedSerializer by default
davies Oct 10, 2014
1d72a30
HOTFIX: Fix build issue with Akka 2.3.4 upgrade.
pwendell Oct 10, 2014
0e8203f
[SPARK-2924] Required by scala 2.11, only one fun/ctor amongst overri…
ScrapCodes Oct 11, 2014
81015a2
[SPARK-3867][PySpark] ./python/run-tests failed when it run with Pyth…
cocoatomo Oct 11, 2014
7a3f589
[SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and…
cocoatomo Oct 11, 2014
69c67ab
[SPARK-2377] Python API for Streaming
giwa Oct 12, 2014
18bd67c
[SPARK-3887] Send stracktrace in ConnectionManager error replies
JoshRosen Oct 12, 2014
e5be4de
SPARK-3716 [GraphX] Update Analytics.scala for partitionStrategy assi…
NamelessAnalyst Oct 12, 2014
c86c976
[HOTFIX] Fix compilation error for Yarn 2.0.*-alpha
andrewor14 Oct 12, 2014
fc616d5
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
Oct 13, 2014
b4a7fa7
[SPARK-3905][Web UI]The keys for sorting the columns of Executor page…
witgo Oct 13, 2014
d8b8c21
Add echo "Run streaming tests ..."
giwa Oct 13, 2014
92e017f
[SPARK-3899][Doc]fix wrong links in streaming doc
scwf Oct 13, 2014
942847f
Bug Fix: without unpersist method in RandomForest.scala
omgteam Oct 13, 2014
39ccaba
[SPARK-3861][SQL] Avoid rebuilding hash tables for broadcast joins on…
rxin Oct 13, 2014
49bbdcb
[Spark] RDD take() method: overestimate too much
yingjieMiao Oct 13, 2014
46db277
[SPARK-3892][SQL] remove redundant type name
adrian-wang Oct 13, 2014
2ac40da
[SPARK-3407][SQL]Add Date type support
adrian-wang Oct 13, 2014
56102dc
[SPARK-2066][SQL] Adds checks for non-aggregate attributes with aggre…
liancheng Oct 13, 2014
d3cdf91
[SPARK-3529] [SQL] Delete the temp files after test exit
chenghao-intel Oct 13, 2014
73da9c2
[SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection …
ueshin Oct 13, 2014
e10d71e
[SPARK-3559][SQL] Remove unnecessary columns from List of needed Colu…
gvramana Oct 13, 2014
371321c
[SQL] Add type checking debugging functions
marmbrus Oct 13, 2014
e6e3770
SPARK-3807: SparkSql does not work for tables created using custom serde
chiragaggarwal Oct 13, 2014
9d9ca91
[SQL]Small bug in unresolved.scala
Ishiihara Oct 13, 2014
9eb49d4
[SPARK-3809][SQL] Fixes test suites in hive-thriftserver
liancheng Oct 13, 2014
4d26aca
[SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite
tdas Oct 14, 2014
186b497
[SPARK-3921] Fix CoarseGrainedExecutorBackend's arguments for Standal…
aarondav Oct 14, 2014
9b6de6f
SPARK-3178 setting SPARK_WORKER_MEMORY to a value without a label (m…
bbejeck Oct 14, 2014
7ced88b
[SPARK-3946] gitignore in /python includes wrong directory
tsudukim Oct 14, 2014
24b818b
[SPARK-3944][Core] Using Option[String] where value of String can be …
Shiti Oct 14, 2014
56096db
SPARK-3803 [MLLIB] ArrayIndexOutOfBoundsException found in executing …
srowen Oct 14, 2014
7b4f39f
[SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set
cocoatomo Oct 14, 2014
66af8e2
[SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in…
tsudukim Oct 15, 2014
18ab6bd
SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark App…
srowen Oct 15, 2014
293a0b5
[SPARK-2098] All Spark processes should support spark-defaults.conf, …
witgo Oct 15, 2014
044583a
[Core] Upgrading ScalaStyle version to 0.5 and removing SparkSpaceAft…
prudhvi953 Oct 16, 2014
4c589ca
[SPARK-3944][Core] Code re-factored as suggested
Shiti Oct 16, 2014
091d32c
[SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work…
davies Oct 16, 2014
99e416b
[SQL] Fixes the race condition that may cause test failure
liancheng Oct 16, 2014
2fe0ba9
SPARK-3874: Provide stable TaskContext API
ScrapCodes Oct 17, 2014
7f7b50e
[SPARK-3923] Increase Akka heartbeat pause above heartbeat interval
aarondav Oct 17, 2014
be2ec4a
[SQL]typo in HiveFromSpark
Oct 17, 2014
642b246
[SPARK-3941][CORE] _remainingmem should not increase twice when updat…
liyezhang556520 Oct 17, 2014
e7f4ea8
[SPARK-3890][Docs]remove redundant spark.executor.memory in doc
WangTaoTheTonic Oct 17, 2014
56fd34a
[SPARK-3741] Add afterExecute for handleConnectExecutor
zsxwing Oct 17, 2014
dedace8
[SPARK-3067] JobProgressPage could not show Fair Scheduler Pools sect…
YanTangZhai Oct 17, 2014
e678b9f
[SPARK-3973] Print call site information for broadcasts
shivaram Oct 17, 2014
c351862
[SPARK-3935][Core] log the number of records that has been written
jackylk Oct 17, 2014
803e7f0
[SPARK-3979] [yarn] Use fs's default replication.
Oct 17, 2014
adcb7d3
[SPARK-3855][SQL] Preserve the result attribute of python UDFs though…
marmbrus Oct 17, 2014
23f6171
[SPARK-3985] [Examples] fix file path using os.path.join
adrian-wang Oct 17, 2014
477c648
[SPARK-3934] [SPARK-3918] [mllib] Bug fixes for RandomForest, Decisi…
jkbradley Oct 17, 2014
f406a83
SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable
srowen Oct 18, 2014
05db2da
[SPARK-3952] [Streaming] [PySpark] add Python examples in Streaming P…
davies Oct 19, 2014
7e63bb4
[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)
JoshRosen Oct 19, 2014
d1966f3
[SPARK-3902] [SPARK-3590] Stabilize AsynRDDActions and add Java API
JoshRosen Oct 20, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 9 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
*~
*.#*
*#*#
*.swp
*.ipr
*.iml
*.iws
.idea/
.idea_modules/
sbt/*.jar
.settings
.cache
Expand All @@ -15,11 +18,12 @@ out/
third_party/libmesos.so
third_party/libmesos.dylib
conf/java-opts
conf/spark-env.sh
conf/streaming-env.sh
conf/log4j.properties
conf/spark-defaults.conf
conf/hive-site.xml
conf/*.sh
conf/*.cmd
conf/*.properties
conf/*.conf
conf/*.xml
conf/slaves
docs/_site
docs/api
target/
Expand Down Expand Up @@ -50,7 +54,6 @@ unit-tests.log
/lib/
rat-results.txt
scalastyle.txt
conf/*.conf
scalastyle-output.xml

# For Hive
Expand Down
3 changes: 3 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ log4j.properties
log4j.properties.template
metrics.properties.template
slaves
slaves.template
spark-env.sh
spark-env.cmd
spark-env.sh.template
log4j-defaults.properties
bootstrap-tooltip.js
Expand Down Expand Up @@ -58,3 +60,4 @@ dist/*
.*iws
logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
12 changes: 12 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## Contributing to Spark

Contributions via GitHub pull requests are gladly accepted from their original
author. Along with any pull requests, please state that the contribution is
your original work and that you license the work to the project under the
project's open source license. Whether or not you state this explicitly, by
submitting any copyrighted material via pull request, email, or other means
you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.

Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
for more information.
78 changes: 16 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ and Spark Streaming for stream processing.
## Online Documentation

You can find the latest Spark documentation, including a programming
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
guide, on the [project web page](http://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.

## Building Spark

Spark is built on Scala 2.10. To build Spark and its example programs, run:
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:

./sbt/sbt assembly
mvn -DskipTests clean package

(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).

## Interactive Scala Shell

Expand Down Expand Up @@ -71,73 +74,24 @@ can be run using:

./dev/run-tests

Please see the guidance on how to
[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting).

## A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
You can change the version by setting `-Dhadoop.version` when building Spark.

For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
versions without YARN, use:

# Apache Hadoop 1.2.1
$ sbt/sbt -Dhadoop.version=1.2.1 assembly

# Cloudera CDH 4.2.0 with MapReduce v1
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly

For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
with YARN, also set `-Pyarn`:

# Apache Hadoop 2.0.5-alpha
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly

# Cloudera CDH 4.2.0 with MapReduce v2
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly

# Apache Hadoop 2.2.X and newer
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly

When developing a Spark application, specify the Hadoop version by adding the
"hadoop-client" artifact to your project's dependencies. For example, if you're
using Hadoop 1.2.1 and build your application using SBT, add this entry to
`libraryDependencies`:

"org.apache.hadoop" % "hadoop-client" % "1.2.1"

If your project is built with Maven, add this to your POM file's `<dependencies>` section:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.2.1</version>
</dependency>


## A Note About Thrift JDBC server and CLI for Spark SQL

Spark SQL supports Thrift JDBC server and CLI.
See sql-programming-guide.md for more information about using the JDBC server and CLI.
You can use those features by setting `-Phive` when building Spark as follows.

$ sbt/sbt -Phive assembly
Please refer to the build documentation at
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions. See also
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
for guidance on building a Spark application that works with a particular
distribution.

## Configuration

Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.


## Contributing to Spark

Contributions via GitHub pull requests are gladly accepted from their original
author. Along with any pull requests, please state that the contribution is
your original work and that you license the work to the project under the
project's open source license. Whether or not you state this explicitly, by
submitting any copyrighted material via pull request, email, or other means
you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.

Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
for more information.
14 changes: 13 additions & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,9 @@
<include>com.google.common.**</include>
</includes>
<excludes>
<exclude>com.google.common.base.Optional**</exclude>
<exclude>com/google/common/base/Absent*</exclude>
<exclude>com/google/common/base/Optional*</exclude>
<exclude>com/google/common/base/Present*</exclude>
</excludes>
</relocation>
</relocations>
Expand Down Expand Up @@ -347,5 +349,15 @@
</plugins>
</build>
</profile>
<profile>
<id>kinesis-asl</id>
<dependencies>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>${commons.httpclient.version}</version>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
2 changes: 1 addition & 1 deletion bagel/src/test/resources/log4j.properties
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=false
log4j.appender.file.file=target/unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %p %c{1}: %m%n
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
8 changes: 7 additions & 1 deletion bin/compute-classpath.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,13 @@ rem Load environment variables from conf\spark-env.cmd, if it exists
if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"

rem Build up classpath
set CLASSPATH=%SPARK_CLASSPATH%;%SPARK_SUBMIT_CLASSPATH%;%FWDIR%conf
set CLASSPATH=%SPARK_CLASSPATH%;%SPARK_SUBMIT_CLASSPATH%

if not "x%SPARK_CONF_DIR%"=="x" (
set CLASSPATH=%CLASSPATH%;%SPARK_CONF_DIR%
) else (
set CLASSPATH=%CLASSPATH%;%FWDIR%conf
)

if exist "%FWDIR%RELEASE" (
for %%d in ("%FWDIR%lib\spark-assembly*.jar") do (
Expand Down
8 changes: 7 additions & 1 deletion bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,14 @@ FWDIR="$(cd "`dirname "$0"`"/..; pwd)"

. "$FWDIR"/bin/load-spark-env.sh

CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH"

# Build up classpath
CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:$FWDIR/conf"
if [ -n "$SPARK_CONF_DIR" ]; then
CLASSPATH="$CLASSPATH:$SPARK_CONF_DIR"
else
CLASSPATH="$CLASSPATH:$FWDIR/conf"
fi

ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION"

Expand Down
55 changes: 40 additions & 15 deletions bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,44 @@ fi

. "$FWDIR"/bin/load-spark-env.sh

# Figure out which Python executable to use
# In Spark <= 1.1, setting IPYTHON=1 would cause the driver to be launched using the `ipython`
# executable, while the worker would still be launched using PYSPARK_PYTHON.
#
# In Spark 1.2, we removed the documentation of the IPYTHON and IPYTHON_OPTS variables and added
# PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS to allow IPython to be used for the driver.
# Now, users can simply set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set
# PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver
# (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython
# and executor Python executables.
#
# For backwards-compatibility, we retain the old IPYTHON and IPYTHON_OPTS variables.

# Determine the Python executable to use if PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON isn't set:
if hash python2.7 2>/dev/null; then
# Attempt to use Python 2.7, if installed:
DEFAULT_PYTHON="python2.7"
else
DEFAULT_PYTHON="python"
fi

# Determine the Python executable to use for the driver:
if [[ -n "$IPYTHON_OPTS" || "$IPYTHON" == "1" ]]; then
# If IPython options are specified, assume user wants to run IPython
# (for backwards-compatibility)
PYSPARK_DRIVER_PYTHON_OPTS="$PYSPARK_DRIVER_PYTHON_OPTS $IPYTHON_OPTS"
PYSPARK_DRIVER_PYTHON="ipython"
elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
fi

# Determine the Python executable to use for the executors:
if [[ -z "$PYSPARK_PYTHON" ]]; then
PYSPARK_PYTHON="python"
if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then
echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2
exit 1
else
PYSPARK_PYTHON="$DEFAULT_PYTHON"
fi
fi
export PYSPARK_PYTHON

Expand All @@ -64,11 +99,6 @@ export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"
export OLD_PYTHONSTARTUP="$PYTHONSTARTUP"
export PYTHONSTARTUP="$FWDIR/python/pyspark/shell.py"

# If IPython options are specified, assume user wants to run IPython
if [[ -n "$IPYTHON_OPTS" ]]; then
IPYTHON=1
fi

# Build up arguments list manually to preserve quotes and backslashes.
# We export Spark submit arguments as an environment variable because shell.py must run as a
# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks.
Expand All @@ -88,9 +118,9 @@ if [[ -n "$SPARK_TESTING" ]]; then
unset YARN_CONF_DIR
unset HADOOP_CONF_DIR
if [[ -n "$PYSPARK_DOC_TEST" ]]; then
exec "$PYSPARK_PYTHON" -m doctest $1
exec "$PYSPARK_DRIVER_PYTHON" -m doctest $1
else
exec "$PYSPARK_PYTHON" $1
exec "$PYSPARK_DRIVER_PYTHON" $1
fi
exit
fi
Expand All @@ -106,10 +136,5 @@ if [[ "$1" =~ \.py$ ]]; then
else
# PySpark shell requires special handling downstream
export PYSPARK_SHELL=1
# Only use ipython if no command line arguments were provided [SPARK-1134]
if [[ "$IPYTHON" = "1" ]]; then
exec ${PYSPARK_PYTHON:-ipython} $IPYTHON_OPTS
else
exec "$PYSPARK_PYTHON"
fi
exec "$PYSPARK_DRIVER_PYTHON" $PYSPARK_DRIVER_PYTHON_OPTS
fi
2 changes: 1 addition & 1 deletion bin/pyspark2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ for %%d in ("%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*
)
if [%FOUND_JAR%] == [0] (
echo Failed to find Spark assembly JAR.
echo You need to build Spark with sbt\sbt assembly before running this program.
echo You need to build Spark before running this program.
goto exit
)
:skip_build_test
Expand Down
2 changes: 1 addition & 1 deletion bin/run-example2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ if exist "%FWDIR%RELEASE" (
)
if "x%SPARK_EXAMPLES_JAR%"=="x" (
echo Failed to find Spark examples assembly JAR.
echo You need to build Spark with sbt\sbt assembly before running this program.
echo You need to build Spark before running this program.
goto exit
)

Expand Down
4 changes: 2 additions & 2 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ else
exit 1
fi
fi
JAVA_VERSION=$("$RUNNER" -version 2>&1 | sed 's/.* version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
JAVA_VERSION=$("$RUNNER" -version 2>&1 | grep 'version' | sed 's/.* version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')

# Set JAVA_OPTS to be able to load native libraries and to set heap size
if [ "$JAVA_VERSION" -ge 18 ]; then
Expand Down Expand Up @@ -146,7 +146,7 @@ fi
if [[ "$1" =~ org.apache.spark.tools.* ]]; then
if test -z "$SPARK_TOOLS_JAR"; then
echo "Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SCALA_VERSION/" 1>&2
echo "You need to build spark before running $1." 1>&2
echo "You need to build Spark before running $1." 1>&2
exit 1
fi
CLASSPATH="$CLASSPATH:$SPARK_TOOLS_JAR"
Expand Down
2 changes: 1 addition & 1 deletion bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ for %%d in ("%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*
)
if "%FOUND_JAR%"=="0" (
echo Failed to find Spark assembly JAR.
echo You need to build Spark with sbt\sbt assembly before running this program.
echo You need to build Spark before running this program.
goto exit
)
:skip_build_test
Expand Down
5 changes: 3 additions & 2 deletions bin/spark-shell.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

set SPARK_HOME=%~dp0..
rem This is the entry point for running Spark shell. To avoid polluting the
rem environment, it just launches a new cmd to do the real work.

cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd --class org.apache.spark.repl.Main %* spark-shell
cmd /V /E /C %~dp0spark-shell2.cmd %*
Loading