-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated words in word cloud #17
Comments
Hi, thanks for reaching out. Text preprocessing is not (yet?) part of this library. You haven't shown that part of your code (i.e., where var text = "negative negative negative text quite quite quite quite quite containing words";
var freqs = new Dictionary<string, int>();
var whitespaces = new Regex(@"\s+");
foreach (var word in whitespaces.Split(text))
{
if (!freqs.TryGetValue(word, out var freq))
{
freq = 0;
}
freqs[word] = freq + 1;
}
var entries = freqs.Select(p => new WordCloudEntry(p.Key, p.Value));
var wordCloud = new WordCloudInput(entries)
{
// ...
}; That code tokenizes text into words (separating them by whitespace), counts their frequency (populating Let me know if that helps. |
Hello and thanks for the quick reply! We figured out what our issue was. In our example we did we generated both a svg and an png image, so basically we did both step 5 and step 6 in the readme (https://github.com/knowledgepicker/word-cloud/blob/master/README.md) in the same request. We guess that they share state in some way and hence we get double of everything? Just thought it might be interesting for others if they end up with the same thing. |
You're right, |
Hello and thanks for a great lib!
We noticed that when we have quite few words in our cloud the resulting image (or svg representation) contains duplicated words. For instance the base text "negative negative negative text quite quite quite quite quite containing words" does after grouping come out like this:
![image](https://user-images.githubusercontent.com/99714343/206002354-75d59745-e124-4845-bc46-574262cfdf65.png)
Our setup is basically copied from the examples:
We don't really notice this problem when we have a higher amount of words but is this the expected behavior for word clouds with quite few words in them?
Thanks in advance!
The text was updated successfully, but these errors were encountered: