Skip to content
This repository has been archived by the owner on Dec 25, 2023. It is now read-only.

Laboratory work #3, Vladislava Tsvetkova - 22FPL2 #165

Closed
wants to merge 140 commits into from

Conversation

Vladays
Copy link

@Vladays Vladays commented Nov 23, 2023

No description provided.

Vladays and others added 30 commits September 15, 2023 11:03
@@ -26,7 +24,7 @@ def __init__(self, end_of_word_token: str) -> None:
end_of_word_token (str): A token denoting word boundary
"""
self._end_of_word_token = end_of_word_token
self._storage = {end_of_word_token: 0}
self._storage = {'_': 0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What '_' is ?

return token[0]

return None
return self._storage[element]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure?

for token in element:
if token.isalpha():
self._put(token)
if token in (' ', self._end_of_word_token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you create it on each iteratiuon. Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create it once before loop

@@ -1 +1 @@
10
0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -105,13 +104,9 @@ def get_token(self, element_id: int) -> Optional[str]:
"""
if not isinstance(element_id, int):
return None
if element_id not in self._storage.values()):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the imlementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed it. Right now only checks are on the function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's here now

print(greedy_text_generator.run(51, 'Vernon'))
n_gram_language_model = NGramLanguageModel(encoded[:100], 7)
print(n_gram_language_model.build())
greedy_text_generator = GreedyTextGenerator(n_gram_language_model, text_processor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add build, please

for token in element:
if token.isalpha():
self._put(token)
if token in (' ', self._end_of_word_token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create it once before loop

@@ -272,6 +260,7 @@ class NGramLanguageModel:
_encoded_corpus (tuple): Encoded text
"""


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

max_freq_tokens = [token for token, freq in tokens.items() if freq == max_freq]
max_freq_tokens = sorted(max_freq_tokens, reverse=True)
encoded_prompt += (max_freq_tokens[0],)
best_predictions = [token for token, freq in next_tokens.items() if freq == max(next_tokens.values())]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can calculate max only once, not in the loop

@@ -105,13 +104,9 @@ def get_token(self, element_id: int) -> Optional[str]:
"""
if not isinstance(element_id, int):
return None
if element_id not in self._storage.values()):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed it. Right now only checks are on the function


for n_gram in set(n_grams):
number_of_n_grams = n_grams.count(n_gram)
context_count = len([context for context in n_grams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Count

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants