Skip to content

Commit

Permalink
Initial Update
Browse files Browse the repository at this point in the history
  • Loading branch information
claudemyburgh committed Jul 20, 2024
1 parent c8b2b1a commit 2009e0c
Show file tree
Hide file tree
Showing 6 changed files with 254 additions and 95 deletions.
80 changes: 67 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,86 @@
# This is my package fuzzy-search
# Fuzzy Search

[![Latest Version on Packagist](https://img.shields.io/packagist/v/designbycode/fuzzy-search.svg?style=flat-square)](https://packagist.org/packages/designbycode/fuzzy-search)
[![Tests](https://img.shields.io/github/actions/workflow/status/designbycode/fuzzy-search/run-tests.yml?branch=main&label=tests&style=flat-square)](https://github.com/designbycode/fuzzy-search/actions/workflows/run-tests.yml)
[![Total Downloads](https://img.shields.io/packagist/dt/designbycode/fuzzy-search.svg?style=flat-square)](https://packagist.org/packages/designbycode/fuzzy-search)

This is where your description should go. Try and limit it to a paragraph or two. Consider adding a small example.

## Support us

[<img src="https://github-ads.s3.eu-central-1.amazonaws.com/fuzzy-search.jpg?t=1" width="419px" />](https://spatie.be/github-ad-click/fuzzy-search)

We invest a lot of resources into creating [best in class open source packages](https://spatie.be/open-source). You can support us by [buying one of our paid products](https://spatie.be/open-source/support-us).

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on [our contact page](https://spatie.be/about-us). We publish all received postcards on [our virtual postcard wall](https://spatie.be/open-source/postcards).
## Introduction
The Fuzzy Search package provides a simple and efficient way to perform fuzzy searches on a collection of texts using the Levenshtein distance algorithm. This package is useful when you need to search for texts that may contain typos or slight variations.

## Installation

You can install the package via composer:
To install the Fuzzy Search package, simply require it in your PHP project using Composer:

```bash
composer require designbycode/fuzzy-search
```

## Usage
### Creating a Fuzzy Search Instance
To create a Fuzzy Search instance, you need to pass an array of texts to search and an optional flag for case-insensitive search:

```php
use Designbycode\FuzzySearch\FuzzySearch;

$texts = ['apple', 'banana', 'orange', 'grape'];
$fuzzySearch = new FuzzySearch($texts, true); // Case-insensitive search
```

## Performing a Fuzzy Search
To perform a fuzzy search, call the `search` method and pass the search query and an optional maximum Levenshtein distance:

```php
$skeleton = new Designbycode\fuzzy-search();
echo $skeleton->echoPhrase('Hello, Designbycode!');
$query = 'aple';
$maxDistance = 2;
$results = $fuzzySearch->search($query, $maxDistance);
print_r($results); // Output: ['apple']
```

The search method returns an array of matching texts, sorted by their Levenshtein distance from the search query.

## Getting the Best Match
To get the best match from the search results, call the getBestMatch method:
```php
$bestMatch = $fuzzySearch->getBestMatch($results);
echo $bestMatch; // Output: 'apple'
```

## Levenshtein Distance Calculator
The Levenshtein Distance Calculator is a utility class that calculates the Levenshtein distance between two strings. This class is used internally by the Fuzzy Search package.

### Calculating the Levenshtein Distance
To calculate the Levenshtein distance between two strings, call the calculate method:

```php
use Designbycode\FuzzySearch\LevenshteinDistance;

$str1 = 'kitten';
$str2 = 'sitting';
$distance = LevenshteinDistance::calculate($str1, $str2);
echo $distance; // Output: 3
```

## Examples
### Example 1: Fuzzy Search with Case-Insensitive Search
```php
$texts = ['Apple', 'Banana', 'Orange', 'Grape'];
$fuzzySearch = new FuzzySearch($texts, true);

$query = 'aple';
$maxDistance = 2;
$results = $fuzzySearch->search($query, $maxDistance);
print_r($results); // Output: ['Apple']
```

### Example 2: Fuzzy Search with Case-Sensitive Search
```php
$texts = ['apple', 'banana', 'orange', 'grape'];
$fuzzySearch = new FuzzySearch($texts, false);

$query = 'Aple';
$maxDistance = 2;
$results = $fuzzySearch->search($query, $maxDistance);
print_r($results); // Output: []
```

## Testing
Expand Down
87 changes: 77 additions & 10 deletions src/FuzzySearch.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,96 @@

namespace Designbycode\FuzzySearch;

class FuzzySearch {
/**
* Fuzzy search class using Levenshtein distance algorithm
*/
class FuzzySearch
{
/**
* Array of texts to search
*/
private array $texts;

private $texts;
private $levenshteinDistance;
/**
* Levenshtein distance calculator
*/
private LevenshteinDistance $levenshteinDistance;

public function __construct(array $texts) {
/**
* Flag for case-insensitive search
*/
private bool $caseInsensitive;

/**
* Constructor
*
* @param array $texts Array of texts to search
* @param bool $caseInsensitive Flag for case-insensitive search (default: true)
*/
public function __construct(array $texts, bool $caseInsensitive = true)
{
$this->texts = $texts;
$this->levenshteinDistance = new LevenshteinDistance();
$this->caseInsensitive = $caseInsensitive;
}

public function search($query, $maxDistance = 2): array
/**
* Perform fuzzy search
*
* @param string $query Search query
* @param int $maxDistance Maximum Levenshtein distance (default: 2)
* @return array Array of matching texts
*/
public function search(string $query, int $maxDistance = 2): array
{
$query = strtolower($query); // Convert query to lowercase
$results = array();
// Convert query to lowercase if case-insensitive
$query = $this->caseInsensitive ? strtolower($query) : $query;

// Initialize results array
$results = [];

// Iterate through texts
foreach ($this->texts as $text) {
$textLowercase = strtolower($text); // Convert text to lowercase
// Convert text to lowercase if case-insensitive
$textLowercase = $this->caseInsensitive ? strtolower($text) : $text;

// Calculate Levenshtein distance
$distance = $this->levenshteinDistance->calculate($query, $textLowercase);

// Check if distance is within max distance
if ($distance <= $maxDistance) {
$results[] = $text; // Store original text (with original case)
// Store original text and distance
$results[] = ['text' => $text, 'distance' => $distance];
}
}
return $results;

// Sort results by distance
usort($results, function ($a, $b) {
return $a['distance'] - $b['distance'];
});

// Return sorted results
return array_column($results, 'text');
}

/**
* Get the best match from the search results
*
* @param array $results Search results
* @return string Best match
*/
public function getBestMatch(array $results): string
{
$bestMatch = '';
$minDistance = PHP_INT_MAX;

foreach ($results as $result) {
if ($result['distance'] < $minDistance) {
$minDistance = $result['distance'];
$bestMatch = $result['text'];
}
}

return $bestMatch;
}
}
96 changes: 54 additions & 42 deletions src/LevenshteinDistance.php
Original file line number Diff line number Diff line change
@@ -1,50 +1,62 @@
<?php

namespace Designbycode\FuzzySearch;

class LevenshteinDistance {
/**
* Calculate the Levenshtein distance between two strings.
*
* The Levenshtein distance is a measure of the minimum number of single-character edits
* (insertions, deletions or substitutions) required to change one word into the other.
*
* @param string $str1 The first string.
* @param string $str2 The second string.
* @return int The Levenshtein distance between the two strings.
*/
public static function calculate(string $str1, string $str2): int
{
// Create a 2D array to store the distances between substrings of $str1 and $str2
$distanceMatrix = array();

// Initialize the first row and column of the matrix
$str1Length = strlen($str1);
$str2Length = strlen($str2);
for ($i = 0; $i <= $str1Length; $i++) {
$distanceMatrix[$i][0] = $i; // Distance from empty string to $str1 substrings
}
for ($j = 0; $j <= $str2Length; $j++) {
$distanceMatrix[0][$j] = $j; // Distance from empty string to $str2 substrings
}
namespace Designbycode\FuzzySearch;

// Iterate through the characters of $str1 and $str2
for ($i = 1; $i <= $str1Length; $i++) {
for ($j = 1; $j <= $str2Length; $j++) {
// Calculate the cost of substitution (0 if characters match, 1 if they don't)
$substitutionCost = ($str1[$i - 1] == $str2[$j - 1]) ? 0 : 1;
use TypeError;

// Calculate the minimum distance between the current substrings
$insertionDistance = $distanceMatrix[$i - 1][$j] + 1; // Insert a character into $str1
$deletionDistance = $distanceMatrix[$i][$j - 1] + 1; // Delete a character from $str1
$substitutionDistance = $distanceMatrix[$i - 1][$j - 1] + $substitutionCost; // Substitute a character in $str1
class LevenshteinDistance
{
/**
* Calculate the Levenshtein distance between two strings.
*
* The Levenshtein distance is a measure of the minimum number of single-character edits
* (insertions, deletions or substitutions) required to change one word into the other.
*
* @param string $str1 The first string.
* @param string $str2 The second string.
* @return int The Levenshtein distance between the two strings.
*/
public static function calculate(mixed $str1, mixed $str2): int
{

// Choose the minimum distance
$distanceMatrix[$i][$j] = min($insertionDistance, $deletionDistance, $substitutionDistance);
}
}
if (!is_string($str1)) {
throw new TypeError('Argument 1 passed to LevenshteinDistance::calculate() must be of the type string');
}

if (!is_string($str2)) {
throw new TypeError('Argument 2 passed to LevenshteinDistance::calculate() must be of the type string');
}

// Create a 2D array to store the distances between substrings of $str1 and $str2
$distanceMatrix = [];

// Return the Levenshtein distance between the entire strings
return $distanceMatrix[$str1Length][$str2Length];
// Initialize the first row and column of the matrix
$str1Length = strlen($str1);
$str2Length = strlen($str2);
for ($i = 0; $i <= $str1Length; $i++) {
$distanceMatrix[$i][0] = $i; // Distance from empty string to $str1 substrings
}
for ($j = 0; $j <= $str2Length; $j++) {
$distanceMatrix[0][$j] = $j; // Distance from empty string to $str2 substrings
}

// Iterate through the characters of $str1 and $str2
for ($i = 1; $i <= $str1Length; $i++) {
for ($j = 1; $j <= $str2Length; $j++) {
// Calculate the cost of substitution (0 if characters match, 1 if they don't)
$substitutionCost = ($str1[$i - 1] == $str2[$j - 1]) ? 0 : 1;

// Calculate the minimum distance between the current substrings
$insertionDistance = $distanceMatrix[$i - 1][$j] + 1; // Insert a character into $str1
$deletionDistance = $distanceMatrix[$i][$j - 1] + 1; // Delete a character from $str1
$substitutionDistance = $distanceMatrix[$i - 1][$j - 1] + $substitutionCost; // Substitute a character in $str1

// Choose the minimum distance
$distanceMatrix[$i][$j] = min($insertionDistance, $deletionDistance, $substitutionDistance);
}
}

// Return the Levenshtein distance between the entire strings
return $distanceMatrix[$str1Length][$str2Length];
}
}
5 changes: 0 additions & 5 deletions tests/ExampleTest.php

This file was deleted.

50 changes: 25 additions & 25 deletions tests/FuzzySearchTest.php
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
<?php

use Designbycode\FuzzySearch\FuzzySearch;
use Designbycode\FuzzySearch\FuzzySearch;

test('search returns similar matches', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('aple');
expect($results)->toEqual(['apple']);
});
test('search returns similar matches', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('aple');
expect($results)->toEqual(['apple']);
});

test('search returns no results for non-existent query', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('xyz');
expect($results)->toEqual([]);
});
test('search returns no results for non-existent query', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('xyz');
expect($results)->toEqual([]);
});

test('search returns exact match', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('apple');
expect($results)->toEqual(['apple']);
});
test('search returns exact match', function () {
$index = ['apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('apple');
expect($results)->toEqual(['apple']);
});

test('search is case-insensitive', function () {
$index = ['Apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('aPpLe');
expect($results)->toEqual(['Apple']);
});
test('search is case-insensitive', function () {
$index = ['Apple', 'banana', 'orange', 'grape', 'pear'];
$fuzzySearch = new FuzzySearch($index);
$results = $fuzzySearch->search('aPpLe');
expect($results)->toEqual(['Apple']);
});
Loading

0 comments on commit 2009e0c

Please sign in to comment.