File I/O¶
main¶
data frame reader¶
-
class
nlper.file_io.dataframe_reader.FileReader(path: str, allowed_extensions: Sequence = '.jsonl', '.jl')¶ Extraction of the raw data files into pandas data frames. Starts with fetching file paths and names.
- Parameters
path (str) – Path to folder with raw data files
allowed_extensions (sequence) – Types of allowed files extension
-
read_json_lines_files() → Dict[str, pandas.core.frame.DataFrame]¶ Reads json lines raw files to pandas data frames and stores it inside dict with name of file as key.
Example output:
{ 'BBC' : pd.DataFrame(...), 'CNN' : pd.DataFrame(...) }- Returns
Dictionary with file names and data frames
- Return type
dict
data frame writer¶
-
class
nlper.file_io.dataframe_writer.FileWriter(path: str, output_type: str = 'pickle')¶ Saving the data into pandas data frames. Currently supports saving files in CSV and Pickle format.
- Parameters
path (str) – Path to folder to save files
output_type (str) – Format of saved files
-
merge_dataframes() → pandas.core.frame.DataFrame¶ Merges multiple data frames into single.
- Returns
Merged data frames
- Return type
pd.DataFrame
-
resolve_output_format_type_and_save(data: pandas.core.frame.DataFrame, name: str) → None¶ Resolved output format type and saves a single data frame. Currently supports saving only Pickle and CSV file types using python.
- Parameters
data (pd.DataFrame) – Data frame to save.
name (str) – Name under which save data frame to
-
save_dataframe(name: str) → None¶ Calls method to resolve output format and save single data frame.
- Parameters
name (str) – Name of data frame to save
-
save_dataframes(name: str) → None¶ Calls method to resolve output format and save multiple data frame.
- Parameters
name (str) – Name of data frame to save
-
save_file(data: Any, name: str, merge_data: Any = None, output_type: str = None) → str¶ Resolves how to process saving all data frames into files regarding the passed arguments. If
merge_datais set toTrue, all data frames are merged into single one.- Parameters
data (dict, pd.DataFrame) – Dictionary with file names as key and data frames as values, or single data frame.
name (str) – Name of output file(s)
merge_data (bool, optional) – Flag to merge of not multiple data frames into one, optional
output_type (str, optional) – Format to save data frame(s), if not specified using one from
__init__method.
- Returns
File saving location
- Return type
str
file type resolver¶
-
class
nlper.file_io.file_type_resolver.FileTypesResolver¶ Supported file types readers.
reader¶
-
class
nlper.file_io.reader.CsvReader¶
-
class
nlper.file_io.reader.HtmlReader¶
-
class
nlper.file_io.reader.JsonReader¶
-
class
nlper.file_io.reader.Reader¶ -
open_file(filepath: str) → Any¶ Safely opens and returns file specified in file path.
- Parameters
filepath (str) – File path to open
- Returns
Opened file or FileNotFoundError
- Return type
any
-
-
class
nlper.file_io.reader.TextReader¶
writer¶
-
class
nlper.file_io.writer.CsvWriter¶
-
class
nlper.file_io.writer.JsonWriter¶
-
class
nlper.file_io.writer.PickleWriter¶
-
class
nlper.file_io.writer.Writer¶ -
static
create_dir(directory: str) → None¶ Creates a directory for split data frame parts if not exists.
- Parameters
directory (str) – Directory to create
-
write(path: str, file: Any) → None¶ Safely writes file to specified location.
- Parameters
path (str) – Path to save file
file (any) – File to save
-
static