Skip to content

Latest commit

 

History

History
140 lines (103 loc) · 8.08 KB

README.md

File metadata and controls

140 lines (103 loc) · 8.08 KB

Delivery and Pickup Services in Berlin

Logo of "Delivery and Pickup Services in Berlin" dataset

This repository contains RDF data on businesses in Berlin that offer delivery and/or pickup of products. The data includes restaurants and other food establishments, but also any other kind of business that had to close its doors due to the current Corona/Covid-19 situation, and is now relying on delivery and pickup services.

The RDF data can be found in data/target/lieferdienste.ttl.

Format and Vocabulary

The RDF data is serialised as Turtle, mainly using terminology from schema.org, as well as some additional terms for things not covered by schema.org.

Source

The data is derived from the GeoJSON version of the dataset Gastronomien, Laden- und andere Geschäfte mit Liefer- und Abholservice, which is published under CC BY 3.0 DE by Berlin's Senate Department for Economics, Energy and Public Enterprises (Senatsverwaltung für Wirtschaft, Energie und Betriebe). The data was collected using a Web form by Technologiestiftung Berlin (TSB).

Target

The following is an example of the target data in Turtle format:

@prefix schema: <https://schema.org/> .
@prefix business: <https://daten.berlin.de/ds/business/> .
@prefix ohspec: <https://daten.berlin.de/ds/opening-hours/> .
@prefix delivery: <https://daten.berlin.de/vocab/delivery/> .

business:b_199
    a schema:FoodEstablishment ;
    delivery:deliveryComment "Ab 50€ Bestellwert liefern wir im Umkreis von 4 km"@de ;
    delivery:deliveryPossible true ;
    delivery:osmIdentifier "2109017886" ;
    delivery:pickupComment "Wiener Schnitzel, Saftgulasch, Kässpätzle"@de ;
    delivery:pickupPossible true ;
    schema:address [
        a schema:PostalAddress ;
        schema:addressCountry "DE" ;
        schema:addressLocality "Berlin" ;
        schema:postalCode "10777" ;
        schema:streetAddress "Motzstraße 34"
    ] ;
    schema:description "Wiener Schnitzel, Saftgulasch, Kässpätzle"@de ;
    schema:geo [
        a schema:GeoCoordinates ;
        schema:latitude 52.49753 ;
        schema:longitude 13.34682
    ] ;
    schema:name "Sissi - Restaurant" ;
    schema:openingHoursSpecification ohspec:oh_278e11cdddf2f42726fb2fac4e1111f72ed9017e, ohspec:oh_2c11fbcd2c781fb19a336fea0c88c00747565eb6, ohspec:oh_5de23a8d9c8abdc8b52dbe39e00f75621bfc112f, ohspec:oh_80ce7a5a1350c9e01de2226d594e9b29b5d04b52, ohspec:oh_94af436a280831d1e04578c68a19807e3db1730c, ohspec:oh_c45231a188cdbdfba02e2879d9a62aca1914de58, ohspec:oh_cb784c0fceb5a1bfcd322a81083be00404bfa9ab ;
    schema:telephone "+49302101801" ;
    schema:url "https://sissi-berlin.de" .

# ...

ohspec:oh_278e11cdddf2f42726fb2fac4e1111f72ed9017e
    a schema:OpeningHoursSpecification ;
    schema:closes "20:30:00" ;
    schema:dayOfWeek schema:Tuesday ;
    schema:opens "17:00:00" .
  • The local part of the business URI (b_2) is minted using the unique_id feature of the source data (the id field is internal to the SimpleSearch tool used to create the dataset and is not stable between the dataset's versions).
  • Each business is either a schema:FoodEstablishment or a schema:LocalBusiness, based on the source data's art field.
  • The URIs of the opening hours specification (ohspec:oh_...) are a hash over a normalized string of the opening hours for a particular day. That means that opening hours specifications can be shared by several businesses.
  • business (namespace for business URIs), ohspec (namespace for opening hours specifications) and delivery (custom properties not covered by schema.org) are just URI namespaces and don't currently resolve to anything.
  • delivery:osmIdentifier is the OpenStreetMap identifier for this business (see Matching Businesses to OSM).

Conversion Process

Conversion is driven by the Makefile.

The target Turtle data can be created as follows:

make data/target/lieferdienste.ttl

This will trigger:

  • creating the neccessary folders,
  • downloading the source data (to data/source),
  • converting the source to N-triples (in data/temp),
  • finding matches between the businesses in the dataset to OpenStreetMap identifiers, and
  • converting the intermediate N-triples to Turtle (in data/target).

Matching Businesses to OSM

Many of the businesses in this dataset also have a representation in OpenStreetMap (OSM). Since OSM is so widely known and used as a reference point in other projects, we try to link to it. This is done as follows:

  • Using the Overpass API, we query OSM for all amenity and shop nodes within the bounding box of Berlin (see bin/get_osm_nodes.rb).
  • For all business entities in the delivery dataset, we find all OSM nodes that are within a range of 100 metres, using the Haversine formula.
  • Why such a wide radius? The geocoder approaches used by the delivery dataset and OSM are quite different, so the same business can often have coordinates that are up to 100 metres apart in both sources.
  • Starting with the closest OSM match, we then compare the names of business A with the name of the potential match B.
  • A complete match (of the normalized name) will be accepted straight away.
  • If no complete match is found, we check if A is contained in B or vice versa (to match things like Pizzaria da Mario and Da Mario).
  • Some manual matches have been included as well.
  • There is a lot of room for improvement here, obviously, but the result is good enough for now.
  • For an overview of how many businesses have successfully received an OSM identifier, see Type Statistics.

Requirements

  • The main conversion script bin/convert_businesses.rb is written in Ruby (>=2.4) and requires the linkeddata gem, as well as the uuid gem.
  • For the matching to OpenStreetMap, the haversine and overpass-api-ruby gems are needed.
  • The script to convert the intermedia N-Triples file to Turtle (bin/to_ttl.sh) is a simple shell script and requires the rapper command to be available. rapper is part of Redland. For Mac OS, there is a brew formula to install Redland. There are also packages for Debian and other distributions.
  • jq is required for handling intermediate JSON files.
  • The arq query tool that comes with Apache Jena (for generating the 'statistics' in the README).

Logo

Type Statistics

type_uri comment type_count
schema:FoodEstablishment total 769
schema:FoodEstablishment with OSM-link 522
schema:LocalBusiness total 479
schema:LocalBusiness with OSM-link 218
schema:Organization all businesses (total) 1248
schema:Organization all businesses (with OSM-link) 740

License

All software in this repository is published under the MIT License. All data in this repository is published under CC BY 3.0 DE.


2020, Knud Möller, BerlinOnline Stadtportal GmbH & Co. KG

Repository: https://github.com/knudmoeller/berlin-pickup-and-delivery

Last changed: 2020-07-29