-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathquickstart.Rmd
203 lines (135 loc) · 9.42 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{rsdmx quickstart guide}
-->
# rsdmx quickstart guide
The goal of this document is to get you up and running with rsdmx as quickly as possible.
``rsdmx`` provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework.
## SDMX - a short introduction
The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data:
* standard formats
* web-service specifications
SDMX allows to disseminate both **data** (a dataset) and **metadata** (the description of the dataset).
For this, the SDMX standard provides various types of _documents_, also known as _messages_. Hence there will be:
* **data** SDMX-ML _documents_. The two main _document_ types are the ``Generic`` and ``Compact`` ones. The latter aims to provide a more compact XML document. They are other data _document_ types derivating from the ones previously mentioned.
* **metadata** SDMX-ML _documents_. The main metadata _document_ is known a ``Data Structure Definition`` (DSD). As its name indicates, it _describes_ the structure and organization of a dataset, and will generally include all the master/reference data used to characterize a dataset. The 2 main types of metadata are (1) the ``concepts``, which correspond to the _dimensions_ and/or _attributes_ of the dataset, and (2) the ``codelists`` which inventory the possible values to be used in the representation of _dimensions_ and _attributes_.
For more information about the SDMX standards, you can visit the [SDMX website](https://sdmx.org/), or this [introduction by EUROSTAT](https://ec.europa.eu/eurostat/web/sdmx-infospace/trainings-tutorials).
## How to deal with SDMX in R
[rsdmx](https://cran.r-project.org/package=rsdmx) offers a low-level set of tools to read **data** and **metadata** in the SDMX-ML format. Its strategy is to make it very easy for the user. For this, a unique function named ``readSDMX`` has to be used, whatever it is a ``data`` or ``metadata`` document, or if it is ``local`` or ``remote`` datasource.
What ``rsdmx`` does support:
* a SDMX format abstraction library, with focus on the the main SDMX standard XML format (SDMX-ML), and the support of the three format standard versions (``1.0``, ``2.0``, ``2.1``)
* an interface to SDMX web-services for a list of well-known data providers, such as OECD, EUROSTAT, ECB, UN FAO, UN ILO, etc (a list that should grow in a near future!). See it [in action](https://github.com/opensdmx/rsdmx/blob/master/vignettes/quickstart.Rmd#using-the-helper-approach)!
Let's see then how to use ``rsdmx``!
## Install rsdmx
``rsdmx`` can be installed from CRAN or from its development repository hosted in Github. For the latter, you will need the ``remotes`` package and run:
```{r, eval=FALSE, results="hide"}
remotes::install_github("opensdmx/rsdmx")
```
## Load rsdmx
To load rsdmx in R, do the following:
```{r,eval = TRUE,results="hide"}
library(rsdmx)
```
## Read dataset documents
This section will introduce you on how to read SDMX *dataset* documents, either from _remote_ datasources, or from _local_ SDMX files.
### Read _remote_ datasets
#### using the _raw_ approach (specifying the complete request URL)
The following code snipet shows you how to read a dataset from a remote data source, taking as example the [OECD StatExtracts portal](https://data-explorer.oecd.org/): [https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC......./all/?startPeriod=2020&endPeriod=2020](https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC......./all/?startPeriod=2020&endPeriod=2020)
```{r, echo = FALSE}
myUrl <- "https://sdmx.oecd.org/public/rest/data/DSD_PRICES@DF_PRICES_N_CP01/GRC......./all/?startPeriod=2020&endPeriod=2020"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
```
You can try it out with other datasources, such as from the [**EUROSTAT portal**](https://ec.europa.eu/eurostat): [https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/NAMA_10_GDP/A.CP_MEUR.B1GQ.BE+LU](https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/NAMA_10_GDP/A.CP_MEUR.B1GQ.BE+LU)
The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and [more request examples](https://github.com/opensdmx/rsdmx/wiki#read-remote-datasets).
#### using the _helper_ approach
Now, the service providers above mentioned are known by ``rsdmx`` which let users using ``readSDMX`` with the helper parameters. The list of service providers can be retrieved doing:
```{r,eval = FALSE,results="hide"}
providers <- getSDMXServiceProviders();
as.data.frame(providers)
```
Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me!
Let's see how it would look like for querying an ``OECD`` datasource:
```{r, message = FALSE}
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020)
df <- as.data.frame(sdmx)
head(df)
```
It is also possible to query a dataset together with its "definition", handled
in a separate SDMX-ML document named ``DataStructureDefinition`` (DSD). It is
particularly useful when you want to enrich your dataset with all labels. For this,
you need the DSD which contains all reference data.
To do so, you only need to append ``dsd = TRUE`` (default value is ``FALSE``),
to the previous request, and specify ``labels = TRUE`` when calling ``as.data.frame``,
as follows:
```{r, message = FALSE}
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020,
dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
head(df)
```
For embedded service providers that require a user authentication/subscription key or token,
it is possible to specify it in ``readSDMX`` with the ``providerKey`` argument. If provided,
and that the embedded provider requires a specific key parameter, the latter will be appended
to the SDMX web-request.
Note that in case you are reading SDMX-ML documents with the native approach (with
URLs), instead of the embedded providers, it is also possible to associate a DSD
to a dataset by using the function ``setDSD``. Let's try how it works:
```{r, eval = FALSE,message = FALSE,results="hide"}
#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "DSD_PRICES@DF_PRICES_N_CP01",
key = list("GRC", NULL, NULL, NULL, NULL, NULL, NULL, NULL), start = 2020, end = 2020)
#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "DSD_PRICES")
#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)
```
### Read _local_ datasets
This example shows you how to use ``rsdmx`` with _local_ SDMX files, previously downloaded from [EUROSTAT](https://ec.europa.eu/eurostat).
```{r, eval = FALSE}
#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)
#read local SDMX (set isURL = FALSE)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)
```
By default, ``readSDMX`` considers the data source is remote. To read a local file, add ``isURL = FALSE``.
## Read metadata documents
This section will introduce you on how to read SDMX **metadata** complete ``data structure definition`` (DSD)
### Data Structure Definition (DSD)
This example illustrates how to read a complete DSD using a [OECD StatExtracts portal](https://data-explorer.oecd.org/) data source.
```{r, echo = FALSE}
dsdUrl <- "https://sdmx.oecd.org/public/rest/datastructure/all/DSD_PRICES/latest/?references=children"
dsd <- readSDMX(dsdUrl)
```
``rsdmx`` is implemented in object-oriented way with ``S4`` classes and methods. The properties of ``S4`` objects are named ``slots`` and can be accessed with the ``slot`` method. The following code snippet allows to extract the list of ``codelists`` contained in the DSD document, and read one codelist as ``data.frame``.
```{r,eval = FALSE,results="hide"}
#get codelists from DSD
cls <- slot(dsd, "codelists")
#get list of codelists
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id"))
#get a codelist
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS")
```
In a similar way, the ``concepts`` of the dataset can be extracted from the DSD and read as ``data.frame``.
```{r,eval = FALSE,results="hide"}
#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))
```
## Save & Reload SDMX R objects
It is possible to save SDMX R objects as RData file (.RData, .rda, .rds), to then
be able to reload them into the R session. It could be of added value for users that
want to keep their SDMX objects in R data files, but also for fast loading of large SDMX
objects (e.g. DSD objects) for use in statistical analyses and R-based web-applications.
To save a SDMX R object to RData file:
```{r, eval = FALSE, results="hide"}
saveSDMX(sdmx, "tmp.RData")
```
To reload a SDMX R object from RData file:
```{r, eval = FALSE, results="hide"}
sdmx <- readSDMX("tmp.RData", isRData = TRUE)
```