Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6657] [PYSPARK] Fix doc warnings #6221

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions python/pyspark/mllib/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,11 +334,10 @@ def ndcgAt(self, k):
"""
Compute the average NDCG value of all the queries, truncated at ranking position k.
The discounted cumulative gain at position k is computed as:
sum,,i=1,,^k^ (2^{relevance of ''i''th item}^ - 1) / log(i + 1),
sum,,i=1,,^k^ (2^{relevance of ''i''th item}^ - 1) / log(i + 1),
and the NDCG is obtained by dividing the DCG value on the ground truth set.
In the current implementation, the relevance value is binary.
If a query has an empty ground truth set, zero will be used as ndcg together with
If a query has an empty ground truth set, zero will be used as NDCG together with
a log warning.
"""
return self.call("ndcgAt", int(k))
Expand Down
12 changes: 6 additions & 6 deletions python/pyspark/mllib/fpm.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,12 @@ class FPGrowth(object):
def train(cls, data, minSupport=0.3, numPartitions=-1):
"""
Computes an FP-Growth model that contains frequent itemsets.
:param data: The input data set, each element
contains a transaction.
:param minSupport: The minimal support level
(default: `0.3`).
:param numPartitions: The number of partitions used by parallel
FP-growth (default: same as input data).
:param data: The input data set, each element contains a
transaction.
:param minSupport: The minimal support level (default: `0.3`).
:param numPartitions: The number of partitions used by
parallel FP-growth (default: same as input data).
"""
model = callMLlibFunc("trainFPGrowthModel", data, float(minSupport), int(numPartitions))
return FPGrowthModel(model)
Expand Down
1 change: 1 addition & 0 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -943,6 +943,7 @@ def replace(self, to_replace, value, subset=None):
Columns specified in subset that do not have matching data type are ignored.
For example, if `value` is a string, and subset contains a non-string column,
then the non-string column is simply ignored.
>>> df4.replace(10, 20).show()
+----+------+-----+
| age|height| name|
Expand Down
3 changes: 2 additions & 1 deletion python/pyspark/streaming/kafka.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,12 @@ def createRDD(sc, kafkaParams, offsetRanges, leaders={},
.. note:: Experimental
Create a RDD from Kafka using offset ranges for each topic and partition.
:param sc: SparkContext object
:param kafkaParams: Additional params for Kafka
:param offsetRanges: list of offsetRange to specify topic:partition:[start, end) to consume
:param leaders: Kafka brokers for each TopicAndPartition in offsetRanges. May be an empty
map, in which case leaders will be looked up on the driver.
map, in which case leaders will be looked up on the driver.
:param keyDecoder: A function used to decode key (default is utf8_decoder)
:param valueDecoder: A function used to decode value (default is utf8_decoder)
:return: A RDD object
Expand Down