Aspects¶
Functions that can be applied to sentences to corrupt them in a controled way. Corrupted sentences can be then used to test NLP models’ robustness.
Base class¶
-
class
wildnlp.aspects.base.
Aspect
[source]¶ Base, abstract class. All the aspects must implement the __call__ method.
-
__call__
(sentence)[source]¶ We want to directly call objects of the Aspect class for easy chaining. This function will be applied to sentences.
-
Utility functions¶
-
wildnlp.aspects.utils.
compose
(*functions)[source]¶ Chains multiple aspects into a single function.
Parameters: functions – Object(s) of the Callable instance. Returns: chained function Example:
from wildnlp.aspects.utils import compose from wildnlp.aspects import Swap, QWERTY composed_aspect = compose(Swap(), QWERTY()) modified_text = composed_aspect('Text to corrupt')
Articles¶
-
class
wildnlp.aspects.articles.
Articles
(swap_probability=0.5, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Randomly removes or swaps articles into wrong ones.
Caution
Uses random numbers, default seed is 42.
Characters removal¶
-
class
wildnlp.aspects.remove_char.
RemoveChar
(char=None, words_percentage=50, characters_percentage=10, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Randomly removes characters from words.
Note
Note that you may specify white space as a character to be removed but it’ll be processed differently.
Caution
Uses random numbers, default seed is 42.
-
__init__
(char=None, words_percentage=50, characters_percentage=10, seed=42)[source]¶ Parameters: - words_percentage – Percentage of words in a sentence that should be transformed. If greater than 0, always at least single word will be transformed.
- characters_percentage – Percentage of characters in a word that should be transformed. If greater than 0 always at least single character will be transformed.
- char – If specified only that character will be randomly removed. The specified character can also be a white space.
- seed – Random seed.
-
Characters swapping¶
-
class
wildnlp.aspects.swap.
Swap
(transform_percentage=100, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Randomly swaps two characters within a word, excluding punctuations. It’s possible that the same two characters will be swapped, so the word won’t be changed, for example letter can become letter after swapping.
Caution
Uses random numbers, default seed is 42.
Digits2Words¶
-
class
wildnlp.aspects.digits2words.
Digits2Words
[source]¶ Bases:
wildnlp.aspects.base.Aspect
Converts numbers into words. Handles floating numbers as well.
All numbers will be converted
Misspelling¶
-
class
wildnlp.aspects.misspelling.
Misspelling
(use_homophones=False, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Misspells words appearing in the Wikipedia list of commonly misspelled English words (default): https://en.wikipedia.org/wiki/Commonly_misspelled_English_words
Tip
You can use homophones instead: https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/Homophones
If a word has more then one common misspelling, the replacement is selected randomly.
All words that have any misspellings listed will be replaced.
Caution
Uses random numbers, default seed is 42.
Punctuation¶
-
class
wildnlp.aspects.punctuation.
Punctuation
(char=', ', add_percentage=0, remove_percentage=100, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Randomly adds or removes specified punctuation marks. The implementation guarantees that punctuation marks won’t be appended to the original ones or won’t replace them after removal.
With default settings all occurrences of the specified punctuation mark will be removed.
Example:
Sentence, have a comma. Possible transformations: - Sentence have, a comma. - Sentence, have, a, comma. Impossible transformations: - Sentence,, have a comma.
Caution
Uses random numbers, default seed is 42.
-
__init__
(char=', ', add_percentage=0, remove_percentage=100, seed=42)[source]¶ Parameters: - char – Punctuation mark that will be removed or added to sentences.
- add_percentage – Max percentage of white spaces in a sentence to be prepended with punctuation marks.
- remove_percentage – Max percentage of existing punctuation marks that will be removed.
- seed – Random seed.
QWERTY¶
-
class
wildnlp.aspects.qwerty.
QWERTY
(words_percentage=1, characters_percentage=10, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
Simulates errors made while writing on a QWERTY-type keyboard. Characters are swapped with their neighbors on the keyboard.
Caution
Uses random numbers, default seed is 42.
-
__init__
(words_percentage=1, characters_percentage=10, seed=42)[source]¶ Parameters: - words_percentage – Percentage of words in a sentence that should be transformed. If greater than 0, always at least single word will be transformed.
- characters_percentage – Percentage of characters in a word that should be transformed. If greater than 0 always at least single character will be transformed.
- seed – Random seed.
-
Sentiment words masking¶
-
class
wildnlp.aspects.sentiment_masking.
SentimentMasking
(char='*', use_positive=False, seed=42)[source]¶ Bases:
wildnlp.aspects.base.Aspect
This aspect reflects attempts made by Internet users to mask profanity or hate speech in online forums to evade moderation. We perform masking (replacing random, single character with for example an asterisk) of negative (or positive for completeness) words from Opinion Lexicon: http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
All words that are listed will be transformed.
Caution
Uses random numbers, default seed is 42.