Skip to content

MarcoDima/SAtoST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

																		<p>SAtoST<p>

-- From Suffix Array and Lcp to Suffix Tree Algorithm Implementation

Author: Marco Di Martino

Description

Given a text T, with its suffix array and array lcp, the algorithm creates the suffix tree of the text T. Text and the two arrays are read from the attached files

  • banana.txt
  • mississippi.txt
  • ab.txt

The main() reads one of these files, generates the suffix tree and then prints it to a file called tree.txt. The print of the tree structure is made using a special format that easly makes it possible to copy-and-paste it to the site http://www.webgraphviz.com/ in order to be able to display the tree. Once the tree has been created, the text and the arrays are printed to the standard output together with all nodes of the tree, but for a clearer view of the tree I reccomend the use of webgraphviz.

Tree Structure

Since the length of the text (and so also the number of nodes of the tree) is not known, I decided to use a child-sibiling tree representation. So the tree I create has the following structure:

  • each node is identyfied by its path label and string depth (easy implementation to identify a single node without giving a 'name')
  • for each node is also known (if they are): - node parent - node child - node sibiling Creation and addition are separeted: first creation of the node is made, and second it is added to tree structure. Once I got the tree, the string depth of each node is calculated.

Methods implemented

void string_depth(node_tree * node, int strdpt);
node_tree * create_node(node_tree * node, char * label);
void add_node(node_tree * parent, node_tree * node);
char * substring(char * T, int i, int l);
void generate_suffix_tree(char * T, node_tree * root, int * sa, int * lcp, 	int i, int j, int l);
boolean is_leaf(node_tree * node);
void tree_visit(node_tree * node);
void print(node_tree * node, FILE * file);
void print_tree_on_file(node_tree * node, FILE * file);
void destroy_tree(node_tree * node);

Creation of the Suffix Tree

This is done by the procedure "generate_suffix_tree". This sub gets the text, the arrays, a pointer to the root of the tree and the dimension of the arrays. The generation is done recursively taking every time the minimum value of the current interval of the array and calling recursively a subinterval (i,j). - first check wheter actual interval contains just one element (i==j), if so, a leaf is created. - if not, calculate min value e the position of the min. - Taking the position of the min, a new node is created and added to the tree, but this won't be a leaf because if it was, it would have already been created after the check i==j. With help of the substring method, the path label of this node can be calculated: it will have a path label starting from the text in position suffix array in position pos_min and will have a length given from lcp[pos_min] value. - every time the min is found in the current interval of the array lcp, a recursive call is done to the portion (i,h), where i is the current starting index and h is the index where the min has been found. Once the call ends, index i is updated with the value i=h+1; this is done because the recursive call will have generated all nodes belonging to its interval and so I can just look at the remaining part.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published