Skip to content

Commit

Permalink
Added support for symbol language specification in loader and updated…
Browse files Browse the repository at this point in the history
… readme
  • Loading branch information
zgornel committed Sep 30, 2018
1 parent 8fe5b7d commit ea440d1
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 10 deletions.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,18 @@ This package is a simple API to **ConceptNetNumberbatch**.

The following examples illustrate some common usage patterns:

```julia>
```julia
julia> using Conceptnet, Languages
path_h5 = download_embeddings(url=CONCEPTNET_HDF5_LINK, localfile="./_conceptnet_/conceptnet.h5");
file_conceptnet = download_embeddings(url=CONCEPTNET_HDF5_LINK,
localfile="./_conceptnet_/conceptnet.h5");
# [ Info: Download ConceptNetNumberbatch to ./_conceptnet_/conceptnet.h5...
# % Total % Received % Xferd Average Speed Time Time Time Current
# Dload Upload Total Spent Left Speed
# 100 127M 100 127M 0 0 3646k 0 0:00:35 0:00:35 --:--:-- 4107k
"./_conceptnet_/conceptnet.h5"
# "./_conceptnet_/conceptnet.h5"

# Load embeddings
julia> conceptnet = load_embeddings(path_h5, languages=[Languages.English()])
julia> conceptnet = load_embeddings(file_conceptnet, languages=:en)
# ConceptNet{Languages.English} (compressed): 1 language(s), 150875 embeddings

julia> conceptnet["apple"] # Get embeddings for a single word
Expand All @@ -50,13 +51,13 @@ julia> conceptnet[["apple", "pear", "cherry"]] # Get embeddings for multiple wo

```julia
# Load multiple languages
julia> conceptnet = load_embeddings(path_h5, languages=[Languages.English(), Languages.French()])
julia> conceptnet = load_embeddings(file_conceptnet, languages=[:en, :fr])
# ConceptNet{Language} (compressed): 2 language(s), 174184 embeddings

julia> conceptnet["apple"] # fails, language must be specified
# ERROR: ...

julia> [conceptnet[:en, "apple"] conceptnet[:fr, "poire"]] # languages can be specified also as Languages.English(), Languages.French()
julia> [conceptnet[:en, "apple"] conceptnet[:fr, "poire"]]
# 300×2 Array{Int8,2}:
# 0 -2
# 0 -2
Expand Down Expand Up @@ -103,7 +104,7 @@ julia> # `keys` returns an iterator for all words
- fast for retrieving embeddings of exact matches
- fast for retrieving embeddings of wildcard matches (`xyzabcish` is matched to `######ish`)
- if neither exact or wildcard matches exist, retrieval can be based on string distances (slow, see `src/search.jl`)

- for another package handling word embeddings, check out [Embeddings.jl](https://github.com/JuliaText/Embeddings.jl)


## Installation
Expand All @@ -121,5 +122,7 @@ This code has an MIT license and therefore it is free.
## References

[1] [ConceptNetNumberbatch GitHub homepage](https://github.com/commonsense/conceptnet-numberbatch)

[2] [ConceptNet GitHub homepage](https://github.com/commonsense/conceptnet5)
[3] [Embeddings.jl - another embeddings package](https://github.com/JuliaText/Embeddings.jl)

[3] [Embeddings.jl GitHub homepage](https://github.com/JuliaText/Embeddings.jl)
10 changes: 8 additions & 2 deletions src/files.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,16 @@ function load_embeddings(filepath::AbstractString;
keep_words=String[],
languages::Union{Nothing,
Languages.Language,
Vector{<:Languages.Language}
Vector{<:Languages.Language},
Symbol,
Vector{Symbol}
}=nothing)
if languages == nothing
if languages isa Nothing
languages = unique(collect(values(LANGUAGES)))
elseif languages isa Symbol
languages = LANGUAGES[languages]
elseif languages isa Vector{Symbol}
languages = [LANGUAGES[lang] for lang in languages]
end

if any(endswith.(filepath, [".gz", ".gzip"]))
Expand Down

0 comments on commit ea440d1

Please sign in to comment.