text_utils module¶
- class text_utils.CharByCharSynhthetizer(rnn, char_init, encode_lambda, onehot_encoder, decode_lambda, ts, n_step, path_out)¶
Bases:
object
Synthetize text (char-by-char) from a trained RNN using a one-hot encoder.
- sample(lenght, p)¶
Weighted sampling of next character based on RNN predicitons.
- class text_utils.OneHotEncoder(length)¶
Bases:
object
One-hot encoder class.
- length¶
The length of the one-hot encoding.
- Type
int
- __init__(layers)¶
Constuctor.
- __call__(x, encode=True)¶
Encode a sequence of integers into a one-hot encoded vectors, or decode a sequence of one-hot encoded vectors into a sequence of integers.
- __repr__()¶
Returns the string representation of class.
- text_utils.add_eol_to_text(text, eol='.')¶
- text_utils.char_to_idx(char, chars)¶
Convert a char to an index from the encoder np array.
- Parameters
char (str) – A char.
chars (np.ndarray) – All chars.
- Returns
The index repre of char, of shape (,).
- Return type
np.ndarray
Notes
None
- text_utils.decode(encoding, chars)¶
Decode a sequence of indices into a sequence of chars based on the encoder.
- Parameters
encoding (np.ndarray) – The sequence of index representation of the chars, of shape (n_chars,)
chars (np.ndarray) – All chars.
- Returns
decoding – The sequence of chars, of shape (n_chars,)
- Return type
np.ndarray
Notes
None
- text_utils.encode(decoding, chars)¶
Encode a sequence of chars into a sequence of indices based on the encoder.
- Parameters
decoding (np.ndarray) – The sequence of chars, of shape (n_chars,)
chars (np.ndarray) – All chars.
- Returns
encoding – The sequence of index representation of the chars, of shape (n_chars,)
- Return type
np.ndarray
Notes
None
- text_utils.give_emoji_free_text(text)¶
- text_utils.idx_to_char(idx, chars)¶
Convert an index to char in the encoder np array.
- Parameters
idx (int) – The index repr of a char.
chars (np.ndarray) – All chars.
- Returns
The char.
- Return type
str
Notes
None
- text_utils.limit_text_length(df, col_name, max_length=140)¶
- text_utils.make_decoded_dataset(dataset)¶
Decode a dataset of strings into a list of characters.
- Parameters
dataset (list) – A list of strings (contexts) maybe of varying size.
- Returns
decoded_dataset – A list of lists (contexts) where a context is a list of characters.
- Return type
list
Notes
None
- text_utils.make_encoded_dataset(decoded_dataset, chars)¶
Encode a dataset of list of charcters into a list of integers.
- Parameters
decoded_dataset (list) – A list of lists (contexts) where a context is a list of characters.
chars (np.ndarray) – All chars.
- Returns
encoded_dataset – A list of lists (contexts) where a context is a list of integers. An integer corresponds to its index in chars.
- Return type
list
Notes
None
- text_utils.make_one_hot_encoded_dataset(encoded_dataset, onehot_encoder)¶
One-hot encode a dataset of list of integers into a list of one-hot encoded vectors.
- Parameters
encoded_dataset (list) – A list of lists (contexts) where a context is a list of integers. An integer corresponds to its index in chars.
onehot_encoder (OneHotEncoder) – A one-hot encoder initilaized with chars (all unique characters in the dataset).
- Returns
onehot_encoded_dataset – A list of one-hot encoded vectors (contexts). The index of 1s in the vectors corresponds to the index of the character in chars.
- Return type
list
Notes
None
- text_utils.synthetize(rnn, eol, chars, onehot_encoder, ts, path_out)¶
- text_utils.unique_characters(data)¶
Get the list of unique characters in a data.
- Parameters
data (list) – A list of strings. The strings may be of different lenghts.
- Returns
The list of unique characters in all of the strings in data.
- Return type
np.ndarray
Notes
None