Skip to content

ProgramFiles/TransferLearning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Domain-Independent Ontology Learning Method based on Transfer Learning

This project is about a state-of-art ontology learning method to learn ontology from Web pages with a small amount of in-domain training data. We attempt to introduce transfer learning to ontology learning to solve the following problems in current ontology learning methods:

  • 1)Current pattern-based ontology learning methods are domain-independent but with a low effect, because they mainly focus on learning ontology from texts and ignore the semi-structured data in the Web.

  • 2)Current ontology learning methods, based on traditional machine learning, consider both textual and semi-structured data information in the Web, but they are domain-dependent, which means humans need to label a lot of training data for new domains to train models.

We think introducing transfer learning can solve these two problems at the same time, because:

    1. Like traditional machine learning algorithms, transfer learning algorithms are feature-based, so they can utilize both textual and semi-structured information in Web pages by defining different features.
    1. Different from traditional machine learning, transfer learning aims at learning knowledge from out-of-domains. Therefore, when applying transfer learning algorithms to build learning models for a new domain, people only need to label far less data than using traditional ones.

To validate our method, we collect data from four domains:

    1. Wiki pages of the Fortune Global 500 firms (shorted by CM);
    1. Computer science researchers’ profiles pages (shorted by P);
    1. Famous computer science conferences’ pages (shorted by CF);
    1. Famous computer science journals’ pages (shorted by J).

Each domain contains 50 Web sites, and we upload some of them in (Web pages).

Our method contains 3 steps:

    1. We first introduce VIPS algorithm [1] to segment a Web page to text units, and then build a vision tree to organize these units according to web vision. In the vision tree, all these units are stored separately in leaf nodes. Inner nodes in vision tree reflect structure information among their children, which means children of an inner node may have some similarities in visions or semantics. We give a running example to explain this process (Running Expample of VIPS Algorithm). And we also upload some vision tree in (VIPS Result), which are organized in extensible markup language (xml).
    1. In the second step, terms, which are the concepts and instances in the corresponding ontology, are recognized from these units by TF-Mnt. For a new domain, TF-Mnt will automatically select a proper previous domain based on the domain similarity measured by the correlation coefficient and then constructs transfer knowledge by combining knowledge learned from the previous domain with domain similarity, and domain knowledge by training little labeled data in the new domain. To construct TF-Mnt, we first define some features in this four domain (Defined Feature), and then label some data for validataion (Labeled Data).
    1. In the third step, is-a and subclass-of relations between concepts and instances are captured by analyzing the visual structures of the Web page (Running Example of Relations Learning). With terms and relations, ontology of a Web page can be finally constructed by decoding them in Resource Description Framework (RDF) language(Final Ontologies).

[1] Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma. VIPS: a Vision-based Page Segmentation Algorithm. Microsoft Technical Report, 2003.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%