diff --git a/README.md b/README.md index 29ec9f2..93f96c0 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ There are four key objects within Biblyser: ## Quick start -Biblyser can either be installed with pip or clones from the Github repository. +Biblyser can either be installed with pip or cloned from the Github repository. ```python pip install biblyser @@ -27,7 +27,7 @@ git clone https://github.com/GEUS-Glaciology-and-Climate/Biblyser When cloning from the Github repository, you will need to create a conda environment with the required package dependencies by installing the Biblyser's dependencies using pip. ```python -pip install pybyliometrics, habanero, scholarly, gender_guesser, pandas, beautifulsoup4 +pip install pybyliometrics, habanero, scholarly, gender_guesser, pandas, numpy ``` Try running one of the example scripts from the repository to see that it works. To access the Scopus API through the pybliometrics package, you will need to configure your API key. @@ -43,7 +43,7 @@ An API key or Insttoken is needed to use the Scopus API. An API key can be gener After this initial set-up, no editing of the example scripts should be needed - the scripts should run as they are. If they don't, there is likely an issue with your python environment. -## Name.py +## name.py The Name object holds attributes about an individual to aid in searching for associated publications. This can be initialised using an individual's full name, with job title and gender as optional inputs, and additional keyword inputs for Orcid ID, Scopus ID, Google Scholar ID, and h-index. ```python @@ -61,7 +61,7 @@ n = Name('Jane Emily Doe', Various name and initial formats are computed from Name object, which maximise the chance of finding all associated publications. The gender of each name can either be provided during initialisatoin, or guessed using `gender_guesser`. The gender definition is used later on to analyse gender distribution in a **BibCollection**. -## Organisation.py +## organisation.py The Organisation object holds a collection of **Name** objects which represent a group of authors, department, or organisation. The GEUS G&K organisation can be fetched either from the GEUS G&K Pure portal (only retrieves registered authors) or from the staff directory webpage (all G&K members). This information is fed directly into an Organisation object. ```python @@ -112,7 +112,7 @@ df = org.asDataFrame() ``` -## Bib.py +## bib.py A Bib object holds the relevant information associated with a single publication, namely: + DOI @@ -142,12 +142,12 @@ Bib attributes are populated using the Scopus API provided by [pybliometrics](ht Authorship of a publication can be queried within the Bib object, including queries by organisation and (guessed) gender. -## BibCollection.py +## bibcollection.py A BibCollection object holds a collection of **Bib** objects, i.e. a database of all associated or selected publications. A BibCollection can be initialised from an **Organisation** (for which the BibCollection will search for all publications linked to each name in the organisation), a list of **Bib** objects, or a list of doi strings. ```python from biblyser.organisation import Organisation -from biblyser.bibCollection import BibCollection +from biblyser.bibcollection import BibCollection #BibCollection from an Organisation @@ -199,7 +199,7 @@ df = bibs.asDataFrame() ``` ## Computing gender metrics -Genders of each author within the Bib object are firstly guessed, and if the guessed gender is not certian then a gender database is used to check if the author and an associated gender exists. This database is an Organisation object, retaining all information about each author's name and gender. If a name is not found in the database then the user is prompted to manually define the gender, and then retains this new addition. +Genders of each author within the Bib object are firstly guessed, and if the guessed gender is not certian then a gender database is used to check if the author and an associated gender exists. This database is an **Organisation** object, retaining all information about each author's name and gender. If a name is not found in the database then the user is prompted to manually define the gender, and then retains this new addition. ```python import copy @@ -211,6 +211,25 @@ gdb = copy.copy(org) bibs.getAllGenders(gdb) ``` +The computed gender metrics can be used to determine a diversity index for an individual or organisation. This diversity index is based on the gender and affiliation/country composition in all publication authorships. Generally, this is determined from publications in the last five years, but can be changed as an optional parameter. + +```python +from biblyser.bibcollection import calcDivIdx + +calcDivIdx('Penelope How', #Name + 5, #Years to calculate + scopus=True, #Bibs from scopus + scholar=False, #from scholar + crossref=False, #from crossref + check=True) #User check bibs? +``` + +An example script for calculating diveristy index is available in the Github repository [here](https://github.com/GEUS-Glaciology-and-Climate/Biblyser/blob/main/biblyser/examples/getDiv.py), which can be run from the command line. + +```python +python getDiv calcDivIdx --name "Penelope How" +``` + ## Further development we are working on + Incorporation of other search APIs for publications, such as [Web Of Science](https://pypi.org/project/wos/) + Fetch journal impact factor diff --git a/docs/source/diversityindex.rst b/docs/source/diversityindex.rst index 547113b..e35a0cf 100644 --- a/docs/source/diversityindex.rst +++ b/docs/source/diversityindex.rst @@ -1,4 +1,22 @@ Diversity Index =============== -The diversity index is a metric for evaluating diversity in an individual's co-authorship. +The computed bibcollection metrics can be used to determine a diversity index for an individual or organisation. This diversity index is based on the gender and affiliation/country composition in all publication authorships. Generally, this is determined from publications in the last five years, but can be changed as an optional parameter. + +.. code-block:: python + + from biblyser.bibcollection import calcDivIdx + + calcDivIdx('Penelope How', #Name + 5, #Years to calculate + scopus=True, #Bibs from scopus + scholar=False, #from scholar + crossref=False, #from crossref + check=True) #User check bibs? + +An example script for calculating diveristy index is available in the Github repository [here](https://github.com/GEUS-Glaciology-and-Climate/Biblyser/blob/main/biblyser/examples/getDiv.py), which can be run from the command line. + +.. code-block:: python + + python getDiv calcDivIdx --name "Penelope How" + diff --git a/docs/source/guide.rst b/docs/source/guide.rst index a33773d..5064baa 100644 --- a/docs/source/guide.rst +++ b/docs/source/guide.rst @@ -8,7 +8,7 @@ The Name object holds attributes about an individual to aid in searching for ass .. code-block:: python - from Name import Name + from biblyser.name import Name # With fullname string n = Name('Jane Emily Doe') @@ -29,7 +29,7 @@ The Organisation object holds a collection of **Name** objects which represent a .. code-block:: python - from Organisation import Organisation, fetchWebInfo + from biblyser.organisation import Organisation, fetchWebInfo def fetchWebInfo(url, parser, fid, classtype, classid): '''Get all up-to-date information (e.g. names, titles) from a @@ -92,7 +92,7 @@ A Bib object can either be initiated from a doi string, a title string, or from .. code-block:: python - from Bib import Bib + from biblyser.bib import Bib # Bib object from doi string pub = Bib(doi='10.5194/tc-11-2691-2017') @@ -114,8 +114,8 @@ A BibCollection object holds a collection of **Bib** objects, i.e. a database of .. code-block:: python - from Organisation import Organisation - from BibCollection import BibCollection + from biblyser.organisation import Organisation + from biblyser.bibcollection import BibCollection # BibCollection from an Organisation names = ['Penelope How', 'Nanna B. Karlsson', 'Kenneth D. Mankoff'] diff --git a/docs/source/index.rst b/docs/source/index.rst index 2d5ac96..77c14e2 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,6 +6,8 @@ Biblyser ========== +**Biblyser** is an object-oriented Python workflow for computing and analysing bibliometrics for an individual or organisation. + .. toctree:: :maxdepth: 2 :caption: Contents: diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 3d38c69..6510228 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -4,24 +4,21 @@ Installation Quickstart ---------- -Clone `this repository `_ into your local directory +Biblyser can either be installed with pip or cloned from `this repository `_ into your local directory. .. code-block:: python - git clone https://github.com/GEUS-Glaciology-and-Climate/Biblyser - -Create a conda environment with the required package dependencies, either using the environment file provided in the repository. + pip install biblyser .. code-block:: python - conda env create --file environment.yml - + git clone https://github.com/GEUS-Glaciology-and-Climate/Biblyser -Or by installing the packages into your conda environment with pip +When cloning the repository, you will need to create a python environment with the required package dependencies, which can be installed with pip. either using the environment file provided in the repository. .. code-block:: python - pip install pybyliometrics, habanero, scholarly, gender_guesser, pandas + pip install pybyliometrics, habanero, scholarly, gender_guesser, pandas, numpy Scopus API configuration diff --git a/docs/source/modules.rst b/docs/source/modules.rst index a562dfa..9fd92a6 100644 --- a/docs/source/modules.rst +++ b/docs/source/modules.rst @@ -1,7 +1,7 @@ Modules ======= -Name +name ---- .. automodule:: name @@ -10,7 +10,7 @@ Name :show-inheritance: -Organisation +organisation ------------ .. automodule:: organisation @@ -19,7 +19,7 @@ Organisation :show-inheritance: -Bib +bib --- .. automodule:: bib @@ -28,7 +28,7 @@ Bib :show-inheritance: -BibCollection +bibcollection ------------- .. automodule:: bibcollection diff --git a/setup.py b/setup.py index 09f6d35..fe711e5 100644 --- a/setup.py +++ b/setup.py @@ -29,8 +29,8 @@ "Bug Tracker": "https://github.com/GEUS-Glaciology-and-Climate/Biblyser/issues", }, keywords="publications citations academia science bibliometrics", -# package_dir={"": "Biblyser"}, -# packages=setuptools.find_packages(where="Biblyser"), +# package_dir={"": "biblyser"}, + #packages=setuptools.find_packages(where="biblyser"), packages=setuptools.find_packages(), classifiers=[ "Programming Language :: Python :: 3",