Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full text search (case insensitive) with "-contains" #740

Merged
merged 3 commits into from
Dec 11, 2019

Conversation

empireshades
Copy link

@empireshades empireshades commented Nov 12, 2019

Similar to #470 which got stale while waiting for python 2.7 compatibility review, and addresses issue #463 (full text search). This is a pull request to add full title and body (case-insensitive) search functionality using an argument "-S". Structurally similar to the way the tag, date searches work. Searching test has been created as well to verify functionality against the "basic" example journal.

Checklist

  • The code change is tested and works locally.
  • Tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?

@empireshades empireshades changed the title Added case-insensitive searching of entry title and body using a -S argument. Similar to https://github.com/jrnl-org/jrnl/pull/470 which got stale while waiting for python 2.7 compatibility review. Added case-insensitive searching of entry title and body using a -S argument. Nov 12, 2019
@empireshades empireshades mentioned this pull request Nov 12, 2019
@empireshades empireshades changed the title Added case-insensitive searching of entry title and body using a -S argument. Case-insensitive searching of entry title and body using a -S argument. Nov 14, 2019
Copy link
Contributor

@alichtman alichtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests to confirm this functionality works as expected? At a glance, this looks good to me, though. It's a pretty clean implementation of the feature.

@empireshades
Copy link
Author

Can you add some tests to confirm this functionality works as expected? At a glance, this looks good to me, though. It's a pretty clean implementation of the feature.

Done and thanks!

@empireshades empireshades changed the title Case-insensitive searching of entry title and body using a -S argument. Full text search (case insensitive) with “-S” Nov 16, 2019
@wren wren added the enhancement New feature or request label Nov 17, 2019
@wren
Copy link
Member

wren commented Nov 19, 2019

Hi, @empireshades. Thank you for the PR! Thanks for taking the initiative to submit it.

This is looking good. Just a few notes:

  • Can you please change the flag to be -contains instead of -search? All of the flags there are searching (dates, tags, etc). So adding a new flag that is named search doesn't seem intuitive to me. A user typing jrnl -from 2019 -contains "foo bar" feels more natural to me.
  • Can you please add some tests to make sure this plays nicely with the -and, -or, and -not flags? Your implementation should do just fine, but I like tests. :)
  • Can you drop the single-letter flag (-S)? I kinda like it, but all of our other flags are full words, so let's leave it out for now. If we want to add single-letter flags, let's get an issue started to talk about it, and look at it more comprehensively (and make sure all of the flags have single-letter equivalents that don't clash with each other).
  • Can you change the use of lower to casefold. lower can't handle other languages very well, and casefold is much more unicode friendly.
  • Can you move the search_plan.lower (now contains.casefold) to be defined just before the result list (where tagged and excluded are)? The current implementation would be a bit slower in journals with a lot of entries (since we're executing search_plain.lower twice per entry in the journal). Since the search string doesn't change during execution, there's no reason to keep recalculating it.

@wren
Copy link
Member

wren commented Nov 19, 2019

Here's a great post about the pitfalls of lower, and how to use casefold for better results: https://stackoverflow.com/a/29247821/569146

Copy link
Member

@wren wren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close. I added a few notes for changes.

Thank you!

@empireshades
Copy link
Author

Thanks for the feedback @wren. Please see my comments inline below.

  • Can you please change the flag to be -contains instead of -search? All of the flags there are searching (dates, tags, etc). So adding a new flag that is named search doesn't seem intuitive to me. A user typing jrnl -from 2019 -contains "foo bar" feels more natural to me.

Will do

  • Can you please add some tests to make sure this plays nicely with the -and, -or, and -not flags? Your implementation should do just fine, but I like tests. :)

Will do

  • Can you drop the single-letter flag (-S)? I kinda like it, but all of our other flags are full words, so let's leave it out for now. If we want to add single-letter flags, let's get an issue started to talk about it, and look at it more comprehensively (and make sure all of the flags have single-letter equivalents that don't clash with each other).

A search function should be quick and easy to use, something that a single letter (such as "-c"), would help facilitate. I've been using this fork multiple times per day for the past year and if I had to type out a full word (beyond say 4 characters), I definitely would not have picked it up in the first place. I can start an issue for this but also don't want this to get blocked in more discussion. Thoughts?

  • Can you change the use of lower to casefold. lower can't handle other languages very well, and casefold is much more unicode friendly.

I'm sold. Will do

  • Can you move the search_plan.lower (now contains.casefold) to be defined just before the result list (where tagged and excluded are)? The current implementation would be a bit slower in journals with a lot of entries (since we're executing search_plain.lower twice per entry in the journal). Since the search string doesn't change during execution, there's no reason to keep recalculating it.

Makes sense. Will do.

@wren
Copy link
Member

wren commented Nov 26, 2019

A search function should be quick and easy to use, something that a single letter (such as "-c"), would help facilitate.

Yup, I totally agree. I just want to make sure that we implement for all the switches at the the same time to make sure we have a comprehensive look at the switches (e.g. we have more than one flag that starts with the same letter, etc). In order to make it go quicker, if you want to suggest what each letter should be in the issue itself, it'll expedite the process.

@alichtman
Copy link
Contributor

alichtman commented Nov 26, 2019

What about -g for grep? Even though RE isn't supported, I think it may still be intuitive. Not sure. Have to think about it more.

@wren
Copy link
Member

wren commented Nov 29, 2019

@alichtman We can talk about the single line flags in a separate issue.

For the longer flag, I want to keep grep reserved for if/when we implement actual grep. This PR doesn't have most of the functionality that people expect from grep, so I think contains fits best.

@empireshades
Copy link
Author

@alichtman We can talk about the single line flags in a separate issue.

For the longer flag, I want to keep grep reserved for if/when we implement actual grep. This PR doesn't have most of the functionality that people expect from grep, so I think contains fits best.

If grep gets implemented, I'd be fine with dropping the 'contains' functionality. I pretty much use this as a 'grep -i' replacement.

@empireshades
Copy link
Author

This is really close. I added a few notes for changes.

Thank you!

Please take a look at the updates as I believe I took care of everything you requested except for suggesting single letter flags as a new issue.

@wren
Copy link
Member

wren commented Dec 9, 2019

Hi! I'll take a look at this on Tuesday.

@micahellison micahellison changed the title Full text search (case insensitive) with “-S” Full text search (case insensitive) with "-contains" Dec 11, 2019
@wren wren changed the base branch from master to develop December 11, 2019 04:24
Copy link
Member

@wren wren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thank you! 💯

@wren wren merged commit cb9d546 into jrnl-org:develop Dec 11, 2019
@lock
Copy link

lock bot commented May 21, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the 🔒 Outdated label May 21, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request 🔒 Outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants