Skip to content

Latest commit

 

History

History
108 lines (90 loc) · 3.71 KB

File metadata and controls

108 lines (90 loc) · 3.71 KB

Azure Cognitive Search Python Custom Skill For Dates Extraction

This code is a Python Custom Skill, for Azure Cognitive Search, based on Azure Functions for Python. It extracts the first date from the input string. If you need all of the dates, or the time, change the code as you need.

The Built-in Entity Recognition cognitive skill for dates will return all dates of the document, in multiple formats, as you can see in the image below. If it is not a problem for you, you don't need to use this custom skill.

Dates

Required steps

  1. Follow this tutorial.
  2. Use the Python code below as your init.py file. Customize it with your storage account details, also with your csv file name and target column. As you can see below, my sample csv file target column name is Term. That helps the idea that this code will extract pre-defined terms from the documents content.
  3. Don't forget to add azure.functions and datefinder to your requirements.txt file.
  4. Connect your published custom skill to your Cognitive Search Enrichment Pipeline. Plesae check the section below the code in this file. For more information, click here.

Python Code

The Python code for this skill is here.

Add this skill to your Cogntive Search Enrichment Pipeline

Your skillset will have this extra section below.

 {
            "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
            "name": "First Date",
            "description": "Get the first date detected in a string",
            "context": "/document",
            "uri": "your-Pyhton-Azure-Functions-published-URL",
            "httpMethod": "POST",
            "timeout": "PT30S",
            "batchSize": 1,
            "degreeOfParallelism": null,
            "inputs": [
             {
               "name": "text",
               "source": "/document/content"
             }
                   ],
        "outputs": [
          {
            "name": "text",
            "targetName": "date"
          }
            ],
            "httpHeaders": {}
           }

Sample Input

Use the JSON input below to test your function. Get familiar with the code behavior in the different situations.

The test is a tribute to the most popular football club in the world, Flamengo, from Rio de Janeiro. It was founded in 1895 and has over 45 million fans in Brazil alone. The team was champion in its two most important matches of 2019, the Brazilian championship and the Copa Libertadores of America.

{
    "values": [
      {
        "recordId": "0",
        "data":
           {
            "text": ["Flamengo was founded on November 15th 1895"]
           }
      } ,
        {
        "recordId": "1",
        "data":
           {
            "text": [""]
           }
      } ,    
      {
        "recordId": "2",
        "data":
           {
            "text": ["Flamengo campeão de tudo em 2019!"]
           }
      } 
    ]
}

Expected Output

{
    "values": [{
        "recordId": "0",
        "data": {
            "text": "1895-11-15"
        }
    }, {
        "recordId": "1",
        "errors": [{
            "message": "Could not complete operation for record."
        }]
    }, {
        "recordId": "2",
        "data": {
            "text": "2019-01-31"
        }
    }]
}