This repository contains a dataset of Swahili sentences, consisting of 84249 sentences.It can be a valuable resource for various natural language processing (NLP) tasks. This dataset was sourced from public repository.
Note: If you have information about the source or licensing details, please reach out or submit a pull request.
- Number of Sentences: 84249
- Language: Swahili
- File Format: CSV
- Data Structure: One sentence per row
This dataset can be used for a variety of applications, including:
- Language detection
- NLP research
- Language model training
- Content filtering and moderation
- Cross-lingual research
- Educational purposes
"""
Load and preview dataset.
"""
import pandas as pd
df = pd.read_csv('swahili_sentences.csv')
df.head()
sentence | |
---|---|
0 | mkutano wa biashara je ungependa kupata mualiko maalum kuhudhuria kwenye ... |
1 | kadiri ya hesabu yake hao mwaka walikuwa milioni lakini watu wa nje ... |
2 | jina linatokana na neno la kilatini scio yaani najua kwa maana pana ... |
3 | historia ya scientology inaendana kabisa na maisha ya mwanzilishi ... |
4 | kisha kupata umaarufu wa muda mfupi akaelekea upande wa roho aliyoiona ... |
If you use this dataset in your research or applications, please consider citing this repository. A proper citation will be added here once available.
If you have more data to add into the dataset, ideas on how to improve it, or any questions, please feel free to open an issue or submit a pull request. Any contributions you make are greatly appreciated.