WIP: add convenience functions for selecting and sorting #26

keesterbrugge · 2019-01-28T14:07:33Z

Thanks for this library! I've been getting a lot of mileage out of it. Here are some function that I use that might be a nice addition to your library:

select-cols-regex: function that selects columns using a regular expression
compare-by: function that returns a comparator that makes sorting in a descending or ascending fashion per keyword easy. As opposed to the normal comparator it always sorts nil values last.

Cheers!

Say I want to filter columns based on some predicate like a regular expression, whether it is an element of some collection of keywords or some composition of these. In this case it would be nice to have a function like `filter-cols` as it simplifies the code as follows: From ``` clojure (->> df ;; .. some transformation, perhaps new columns are added (#(hc/select-cols (filter pred' (hc/cols %)) % ))) ``` to ``` clojure (->> df ;; .. some transformation, perhaps new columns are added (filter-cols pred')) ``` I've also reimplemented `select-cols-regex` using this new function

keesterbrugge · 2019-01-28T16:50:26Z

I've added the function filter-cols. Say I want to filter columns based on some predicate like a regular expression, their membership of some collection of keywords or some composition of these predicates. In this case it would be nice to have a function like filter-cols as it simplifies the code as follows:
From

(->> df
     ;; change collection of columns in some way, e.g. using (derive-cols ...) 
     (#(select-cols (filter pred' (cols %)) % )))

to

(->> df
     ;; change collection of columns in some way, e.g. using (derive-cols ...) 
     (filter-cols pred'))

I've reimplemented select-cols-regex using this new function

sbelak · 2019-01-28T18:42:04Z

Thanks for this. I really like. A couple of things:

I'm not 100% on select-cols-regex. Feels very specific (I'm guessing this is for messy data where you have col_name_1...n) and minimal convenience from the more generic filter-cols
I think select-cols-by better reflects semantics, as the fn operates on col names rather than values like filter family does.
I love the utility of compare-by, not entirely sold on the signature (but it might be correct). Two things feel like warts: mandatory :asc/:desc for each comparator and the fact we need these "magic" tokens. Have you considered using a combinator that flips 1 <-> -1 instead. So you'd write something like

(sort (compare-by :a (desc :b)) ...)

Using partition will probably yield cleaner code than loop.

keesterbrugge · 2019-01-31T11:49:25Z

Thanks for your feedback :)

I'll remove select-cols-regex.
I'll think about how to make compare-by a bit cleaner.

Question:
any reason you defined the function select-cols instead of using clojure.set/project? Is it because you don't want the return type to be a set?

sbelak · 2019-01-31T11:56:25Z

3 reasons. In order of importance:

sets break the ordering of the data
project can only be used with keywords, while select-cols works with any keyfn
while the set functions currently work on non-sets that's not a guarantee

remove fn as it doesn't add enough value

keesterbrugge · 2019-02-05T16:38:27Z

That makes sense.

I've added the function derive-cols* to convey how I'd like the derive-cols function to behave. I don't propose to include it as is.

The benefit of derive-cols* compared with the current derive-cols is that by taking ordering of the new-cols into account you can construct a new column and let that column then be the input of the next new column. The consequence is that you can write

(->> [{:a 1 :b 2}{:a 3 :b 10}] 
     (derive-cols* (ordered-map :c [inc :b] 
                                :d [inc :c]))) 
;; => ({:a 1, :b 2, :c 3, :d 4} {:a 3, :b 10, :c 11, :d 12})

or

(->> [{:a 1 :b 2}{:a 3 :b 10}] 
     (derive-cols* [:c [inc :b] 
                    :d [inc :c]]))

instead of

(->> [{:a 1 :b 2}{:a 3 :b 10}] 
     (derive-cols {:c [inc :b]})
     (derive-cols {:d [inc :c]}))

which becomes a bother when you have a long chain of new column derivations that have dependencies on each other.

@sbelak What do you think?

I don't know much about clojure.spec yet. I'll make an attempt to implement derive-cols*, select-cols-by and compare-by in a more coherent fashion with respect to the rest of the lib.

keesterbrugge added 2 commits January 28, 2019 15:02

add function that selects columns based on regex

02ef17d

Add comparator function for custom sorting

0f20d64

keesterbrugge changed the title ~~add function that selects columns based on regex~~ add convenience functions for selecting and sorting Jan 28, 2019

keesterbrugge changed the title ~~add convenience functions for selecting and sorting~~ WIP: add convenience functions for selecting and sorting Jan 28, 2019

keesterbrugge added 2 commits February 5, 2019 17:00

remove fn select-cols-regex

bc4a3ec

remove fn as it doesn't add enough value

add derive-cols* fn that adds columns sequentially

9609273

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add convenience functions for selecting and sorting #26

WIP: add convenience functions for selecting and sorting #26

keesterbrugge commented Jan 28, 2019 •

edited

Loading

keesterbrugge commented Jan 28, 2019

sbelak commented Jan 28, 2019 •

edited

Loading

keesterbrugge commented Jan 31, 2019

sbelak commented Jan 31, 2019

keesterbrugge commented Feb 5, 2019 •

edited

Loading

WIP: add convenience functions for selecting and sorting #26

Are you sure you want to change the base?

WIP: add convenience functions for selecting and sorting #26

Conversation

keesterbrugge commented Jan 28, 2019 • edited Loading

keesterbrugge commented Jan 28, 2019

sbelak commented Jan 28, 2019 • edited Loading

keesterbrugge commented Jan 31, 2019

sbelak commented Jan 31, 2019

keesterbrugge commented Feb 5, 2019 • edited Loading

keesterbrugge commented Jan 28, 2019 •

edited

Loading

sbelak commented Jan 28, 2019 •

edited

Loading

keesterbrugge commented Feb 5, 2019 •

edited

Loading