Datasets¶
Details regarding popular datasets for various NLP problems that are supported by the wild-nlp.
Base class¶
CoNLL 2003¶
-
class
wildnlp.datasets.conll.
CoNLL
(*args, **kwargs)[source]¶ Bases:
wildnlp.datasets.base.Dataset
The CoNLL-2003 shared task data for language-independent named entity recognition. For details see: https://www.clips.uantwerpen.be/conll2003/ner/
-
apply
(aspect, apply_to_ne=False)[source]¶ Parameters: - aspect – transformation function
- apply_to_ne – if False, transformation won’t be applied to Named Entities. If True, transformation will be applied only to Named Entities.
Returns: modified dataset in the following form:
[{tokens: array(<tokens>) pos_tags: array(<pos_tags>), chunk_tags: array(<chunk_tags>), ner_tags: array(<ner_tags>}, ..., ]
-
load
(path)[source]¶ Reads a CoNLL dataset file and loads into internal data structure in the following form:
[{tokens: array(<tokens>) pos_tags: array(<pos_tags>), chunk_tags: array(<chunk_tags>), ner_tags: array(<ner_tags>}, ..., ]
Parameters: path – A path to a file with CoNLL data Returns: None
-
save
(data, path)[source]¶ Saves data in the CoNLL format
Parameters: data – list of dictionaries in the following form: [{tokens: array(<tokens>) pos_tags: array(<pos_tags>), chunk_tags: array(<chunk_tags>), ner_tags: array(<ner_tags>}, ..., ]
Parameters: path – Path to save the file. If the file exists, it will be overwritten. Returns: None
-
SNLI¶
-
class
wildnlp.datasets.snli.
SNLI
(*args, **kwargs)[source]¶ Bases:
wildnlp.datasets.base.Dataset
The SNLI dataset supporting the task of natural language inference. For details see: https://nlp.stanford.edu/projects/snli/
SQuAD¶
-
class
wildnlp.datasets.squad.
SQuAD
[source]¶ Bases:
wildnlp.datasets.base.Dataset
The SQuAD dataset. For details see: https://rajpurkar.github.io/SQuAD-explorer/
IMDB¶
-
class
wildnlp.datasets.imdb.
IMDB
(*args, **kwargs)[source]¶ Bases:
wildnlp.datasets.base.Dataset
The IMDB dataset containing movie reviews for a sentiment analysis. The dataset consists of 50 000 reviews of two classes, negative and positive. Each review is stored in a separate text file. For details see: http://ai.stanford.edu/~amaas/data/sentiment/
-
load
(path)[source]¶ Loads a SNLI dataset.
Parameters: path – A path to single file, directory containing review files or list of paths to such directories. Returns: None
-