Skip to content

Commit

Permalink
Merge pull request #30 from glevita-uc/v2.0
Browse files Browse the repository at this point in the history
Added code to parse WY and bug fixes for TN and AR
  • Loading branch information
glen-uc authored Jan 28, 2022
2 parents 2024eac + 0b3ef0e commit 8611834
Show file tree
Hide file tree
Showing 6 changed files with 1,557 additions and 419 deletions.
35 changes: 35 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,41 @@ Currently this code supports following states:
**Original RTF:** https://archive.org/download/gov.co.crs.bulk


7. ###Idaho (ID):

**Code repo:** https://github.com/UniCourt/cic-code-id

**Code pages:** https://unicourt.github.io/cic-code-id

**Original files can be found here:** https://archive.org/details/govlaw?and%5B%5D=subject%3A%22idaho.gov%22+AND+subject%3A%222020+Code%22&sin=&sort=titleSorter


8. ###Virginia (VA):

**Code repo:** https://github.com/UniCourt/cic-code-va

**Code pages:** https://unicourt.github.io/cic-code-va

**Original RTF:** https://archive.org/download/gov.va.code/


9. ###Vermont (VT):

**Code repo:** https://github.com/UniCourt/cic-code-vt

**Code pages:** https://unicourt.github.io/cic-code-vt

**Original RTF:** https://archive.org/download/gov.vt.code

10. ###Wyoming (WY):

**Code repo:** https://github.com/UniCourt/cic-code-wy

**Code pages:** https://unicourt.github.io/cic-code-wy

**Original RTF:** https://archive.org/details/gov.wy.code/


In subsequent months, we intend to add two more features:

1. Extend the code to handle the official codes Colorado and Idaho.
Expand Down
7 changes: 6 additions & 1 deletion html_parser/ar_html_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ def get_class_name(self):
and tag.get('class')[0] not in self.tag_type_dict.values()):
self.tag_type_dict['ol_p'] = [self.tag_type_dict['ol_p'], ol_p_2_tag['class'][0]]

print(self.tag_type_dict)
print('updated class dict')

def remove_junk(self):
Expand Down Expand Up @@ -307,6 +308,7 @@ def convert_paragraph_to_alphabetical_ol_tags(self):
prev_chap_id = None
p_tag = self.soup.find('p', {'class': self.tag_type_dict['ol_p']})
while p_tag:

set_p_tag = True
if not re.search(r'\w+', p_tag.get_text()):
continue
Expand Down Expand Up @@ -548,6 +550,9 @@ def convert_paragraph_to_alphabetical_ol_tags(self):
p_tag = p_tag.find_next_sibling(lambda tag: tag.name == 'p' and re.search('.+', tag.get_text()))
print('ol tags added')




def create_case_notes_nav_tag(self):
"""
- match and find analysis navigation tag
Expand Down Expand Up @@ -916,7 +921,7 @@ def clean_html_and_add_cite(self):
ol_num = re.sub(r'\(|\)', '', ol_reg.group())
a_id = f'{a_id}ol1{ol_num}'
text = re.sub(fr'\s{re.escape(match)}',
f'<cite class="ocar"><a href="{a_id}" target="{target}">{match}</a></cite>', inside_text,
f' <cite class="ocar"><a href="{a_id}" target="{target}">{match}</a></cite>', inside_text,
re.I)
tag.append(text)

Expand Down
Loading

0 comments on commit 8611834

Please sign in to comment.