Merge pull request #30 from glevita-uc/v2.0

Added code to parse WY and bug fixes for TN and AR
UniCourt · Jan 28, 2022 · 8611834 · 8611834
2 parents 2024eac + 0b3ef0e
commit 8611834
Show file tree

Hide file tree

Showing 6 changed files with 1,557 additions and 419 deletions.
diff --git a/Readme.md b/Readme.md
@@ -61,6 +61,41 @@ Currently this code supports following states:
    **Original RTF:** https://archive.org/download/gov.co.crs.bulk
 
 
+7. ###Idaho (ID):
+
+   **Code repo:** https://github.com/UniCourt/cic-code-id
+
+   **Code pages:** https://unicourt.github.io/cic-code-id
+
+   **Original files can be found here:** https://archive.org/details/govlaw?and%5B%5D=subject%3A%22idaho.gov%22+AND+subject%3A%222020+Code%22&sin=&sort=titleSorter
+
+
+8. ###Virginia (VA):
+
+   **Code repo:** https://github.com/UniCourt/cic-code-va
+
+   **Code pages:** https://unicourt.github.io/cic-code-va
+
+   **Original RTF:**  https://archive.org/download/gov.va.code/
+
+
+9. ###Vermont (VT):
+
+   **Code repo:** https://github.com/UniCourt/cic-code-vt
+
+   **Code pages:** https://unicourt.github.io/cic-code-vt
+
+   **Original RTF:** https://archive.org/download/gov.vt.code
+
+10. ###Wyoming (WY):
+
+      **Code repo:** https://github.com/UniCourt/cic-code-wy
+
+      **Code pages:** https://unicourt.github.io/cic-code-wy
+
+      **Original RTF:** https://archive.org/details/gov.wy.code/
+
+
 In subsequent months, we intend to add two more features:
 
 1. Extend the code to handle the official codes Colorado and Idaho.

diff --git a/html_parser/ar_html_parser.py b/html_parser/ar_html_parser.py
@@ -92,6 +92,7 @@ def get_class_name(self):
                                                                   and tag.get('class')[0] not in self.tag_type_dict.values()):
             self.tag_type_dict['ol_p'] = [self.tag_type_dict['ol_p'], ol_p_2_tag['class'][0]]
 
+        print(self.tag_type_dict)
         print('updated class dict')
 
     def remove_junk(self):
@@ -307,6 +308,7 @@ def convert_paragraph_to_alphabetical_ol_tags(self):
         prev_chap_id = None
         p_tag = self.soup.find('p', {'class': self.tag_type_dict['ol_p']})
         while p_tag:
+
             set_p_tag = True
             if not re.search(r'\w+', p_tag.get_text()):
                 continue
@@ -548,6 +550,9 @@ def convert_paragraph_to_alphabetical_ol_tags(self):
                 p_tag = p_tag.find_next_sibling(lambda tag: tag.name == 'p' and re.search('.+', tag.get_text()))
         print('ol tags added')
 
+
+
+
     def create_case_notes_nav_tag(self):
         """
             - match and find analysis navigation tag
@@ -916,7 +921,7 @@ def clean_html_and_add_cite(self):
                     ol_num = re.sub(r'\(|\)', '', ol_reg.group())
                     a_id = f'{a_id}ol1{ol_num}'
                 text = re.sub(fr'\s{re.escape(match)}',
-                              f'<cite class="ocar"><a href="{a_id}" target="{target}">{match}</a></cite>', inside_text,
+                              f' <cite class="ocar"><a href="{a_id}" target="{target}">{match}</a></cite>', inside_text,
                               re.I)
                 tag.append(text)