Dataset

SQLDataset

Creating a SQLDataset from a table in a db

SQLDataset should be used for creating datasets based on SQL database source. SQLDatasets must be provided with a sqlalchemy.engine.Connectable or a valid connection string.

When writing the load_training_data() and load_prediction_data(), they must accept a connection in their arguments - this will be provided at runtime by the SQLDataset.

FileDataset

Creating a FileDataset from a csv file

When we create a FileDataset, we need to specify the location of our datafiles - this will be available in the self.file_path attribute. ML Tooling can A more elaborate example of using this dataset can be found at ../notebooks/Titanic Demo.ipynb.

When a Dataset is correctly defined, you can use all the methods defined in Dataset

Copying Datasets

If you have two datasets defined, you can copy data from one into the other. For example, if you have defined a SQLDataset and want to copy it into a file:

This will read the data from the SQL database and write it to a csv file named titanic.csv

A common usecase for this is to move data from a central datastore into a local datastore, keeping two database tables in sync.

Demo Datasets

If you want to test your model on a demo datasets from Dataset loading utilities, you can use the function load_demo_dataset()

>>> from ml_tooling.data import load_demo_dataset
>>>
>>> bostondata = load_demo_dataset("boston")
>>> # Remember to setup a train test split!
>>> bostondata.create_train_test()
<BostonData - Dataset>