Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor regex validation checks to a dedicated method #64

Merged
merged 3 commits into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ columns:
# Of course it's an ultimatum to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
# But have it your way, then happy debugging! https://regex101.com.
regex: /^[\d]{2}$/

# Checks length of a string including spaces (multibyte safe).
Expand Down Expand Up @@ -525,22 +526,22 @@ Schema is invalid: ./tests/schemas/demo_invalid.yml
+-------+------------------+--------------+----- demo_invalid.yml -----------------------------------------------+

(1/1) Invalid file: ./tests/fixtures/demo.csv
+------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+------------------+------------------------------------------------------------------------------------------------------+
| 1 | | filename_pattern | Filename "./tests/fixtures/demo.csv" does not match pattern: "/demo-[12].csv$/i" |
| 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
| 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
| 2 | 2:Float | num_max | The number of the value "4825.185", which is greater than the expected "4825.184" |
| 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
| | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+
+-------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+
| Line | id:Column | Rule | Message |
+-------+------------------+------------------+------------------------------------------------------------------------------------------------------+
| undef | | filename_pattern | Filename "./tests/fixtures/demo.csv" does not match pattern: "/demo-[12].csv$/i" |
| 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
| 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
| 2 | 2:Float | num_max | The number of the value "4825.185", which is greater than the expected "4825.184" |
| 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
| | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+-------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+


Found 9 issues in CSV file.
Expand Down
4 changes: 4 additions & 0 deletions csv-blueprint.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@

\date_default_timezone_set('UTC');

\set_error_handler(static function ($severity, $message, $file, $line): void {
throw new \ErrorException($message, 0, $severity, $file, $line);
});

(new CliApplication('CSV Blueprint', '@git-version@'))
->registerCommandsByPath(PATH_ROOT . '/src/Commands', __NAMESPACE__)
->setLogo(
Expand Down
1 change: 1 addition & 0 deletions schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ columns:
# Of course it's an ultimatum to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
# But have it your way, then happy debugging! https://regex101.com.
regex: /^[\d]{2}$/

# Checks length of a string including spaces (multibyte safe).
Expand Down
4 changes: 3 additions & 1 deletion src/Rules/Cell/IsDomain.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

namespace JBZoo\CsvBlueprint\Rules\Cell;

use JBZoo\CsvBlueprint\Utils;

final class IsDomain extends AbstractCellRule
{
protected const HELP_OPTIONS = [
Expand All @@ -26,7 +28,7 @@ public function validateRule(string $cellValue): ?string
{
$domainPattern = '/^(?!-)[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})$/';

if (\preg_match($domainPattern, $cellValue) === 0) {
if (Utils::testRegex($domainPattern, $cellValue)) {
return "Value \"<c>{$cellValue}</c>\" is not a valid domain";
}

Expand Down
4 changes: 3 additions & 1 deletion src/Rules/Cell/IsFloat.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

namespace JBZoo\CsvBlueprint\Rules\Cell;

use JBZoo\CsvBlueprint\Utils;

class IsFloat extends AbstractCellRule
{
protected const HELP_OPTIONS = [
Expand All @@ -24,7 +26,7 @@ class IsFloat extends AbstractCellRule

public function validateRule(string $cellValue): ?string
{
if (\preg_match('/^-?\d+(\.\d+)?$/', $cellValue) === 0) {
if (Utils::testRegex('/^-?\d+(\.\d+)?$/', $cellValue)) {
return "Value \"<c>{$cellValue}</c>\" is not a float number";
}

Expand Down
4 changes: 3 additions & 1 deletion src/Rules/Cell/IsGeohash.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

namespace JBZoo\CsvBlueprint\Rules\Cell;

use JBZoo\CsvBlueprint\Utils;

class IsGeohash extends AbstractCellRule
{
protected const HELP_OPTIONS = [
Expand All @@ -24,7 +26,7 @@ class IsGeohash extends AbstractCellRule

public function validateRule(string $cellValue): ?string
{
if (\preg_match('/^[0-9b-hj-km-np-z]{1,}$/', $cellValue) === 0) {
if (Utils::testRegex('/^[0-9b-hj-km-np-z]{1,}$/', $cellValue)) {
return "Value \"<c>{$cellValue}</c>\" is not a valid Geohash";
}

Expand Down
4 changes: 3 additions & 1 deletion src/Rules/Cell/IsInt.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

namespace JBZoo\CsvBlueprint\Rules\Cell;

use JBZoo\CsvBlueprint\Utils;

final class IsInt extends AbstractCellRule
{
protected const HELP_OPTIONS = [
Expand All @@ -24,7 +26,7 @@ final class IsInt extends AbstractCellRule

public function validateRule(string $cellValue): ?string
{
if (\preg_match('/^-?\d+$/', $cellValue) === 0) {
if (Utils::testRegex('/^-?\d+$/', $cellValue)) {
return "Value \"<c>{$cellValue}</c>\" is not an integer";
}

Expand Down
4 changes: 3 additions & 1 deletion src/Rules/Cell/IsUsaMarketName.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

namespace JBZoo\CsvBlueprint\Rules\Cell;

use JBZoo\CsvBlueprint\Utils;

final class IsUsaMarketName extends AllowValues
{
protected const HELP_OPTIONS = [
Expand All @@ -24,7 +26,7 @@ final class IsUsaMarketName extends AllowValues

public function validateRule(string $cellValue): ?string
{
if (\preg_match('/^[A-Za-z\s\'\-\.,]+, [A-Z]{2}$/u', $cellValue) === 0) {
if (Utils::testRegex('/^[A-Za-z\s\'\-\.,\(\)]+, [A-Z-]{2,6}$/u', $cellValue)) {
return "Invalid market name format for value \"<c>{$cellValue}</c>\". " .
'Market name must have format "<green>New York, NY</green>"';
}
Expand Down
3 changes: 2 additions & 1 deletion src/Rules/Cell/Regex.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ final class Regex extends AbstractCellRule
"Of course it's an ultimatum to verify any sort of string data.",
'Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.',
'Remember that if you want to solve a problem with regex, you now have two problems.',
'But have it your way, then happy debugging! https://regex101.com',
];

protected const HELP_OPTIONS = [
Expand All @@ -42,7 +43,7 @@ public function validateRule(string $cellValue): ?string
return 'Regex pattern is not defined';
}

if (\preg_match($regex, $cellValue) === 0) {
if (Utils::testRegex($regex, $cellValue)) {
return "Value \"<c>{$cellValue}</c>\" does not match the pattern \"<green>{$regex}</green>\"";
}

Expand Down
17 changes: 17 additions & 0 deletions src/Utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -201,4 +201,21 @@ public static function matchTypes(
return isset($mapOfValidConvertions[$expectedType])
&& \in_array($actualType, $mapOfValidConvertions[$expectedType], true);
}

public static function testRegex(string $regex, string $cellValue): bool
{
if ($regex === '' || $cellValue === '') {
return false;
}

try {
if (\preg_match($regex, $cellValue) === 0) {
return true;
}
} catch (\Throwable) {
return false;
}

return false;
}
}
4 changes: 2 additions & 2 deletions src/Validators/CsvValidator.php
Original file line number Diff line number Diff line change
Expand Up @@ -103,14 +103,14 @@ private function validateFile(bool $quickStop = false): ErrorSuite
if (
$filenamePattern !== null
&& $filenamePattern !== ''
&& \preg_match($filenamePattern, $this->csv->getCsvFilename()) === 0
&& Utils::testRegex($filenamePattern, $this->csv->getCsvFilename())
) {
$error = new Error(
'filename_pattern',
'Filename "<c>' . Utils::cutPath($this->csv->getCsvFilename()) .
"</c>\" does not match pattern: \"<c>{$filenamePattern}</c>\"",
'',
ColumnValidator::FALLBACK_LINE,
Error::UNDEFINED_LINE,
);

$errors->addError($error);
Expand Down
7 changes: 4 additions & 3 deletions src/Validators/Error.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,13 @@ public function __construct(

public function __toString(): string
{
$columnStr = $this->getColumnName() === '' ? '' : ", column \"{$this->getColumnName()}\"";

if ($this->line === self::UNDEFINED_LINE) {
return "\"{$this->getRuleCode()}\", column \"{$this->getColumnName()}\". {$this->getMessage()}.";
return "\"{$this->getRuleCode()}\"{$columnStr}. {$this->getMessage()}.";
}

return "\"{$this->getRuleCode()}\" at line <red>{$this->getLine()}</red>, " .
"column \"{$this->getColumnName()}\". {$this->getMessage()}.";
return "\"{$this->getRuleCode()}\" at line <red>{$this->getLine()}</red>{$columnStr}. {$this->getMessage()}.";
}

public function getRuleCode(): string
Expand Down
Loading
Loading