Welcome to ANAI Documentation

What is ANAI?

ANAI is an Automated Machine Learning Python Library that works with tabular data. It is intended to save time when performing data analysis. It will assist you with everything right from the beginning i.e Ingesting data using the inbuilt connectors, preprocessing, feature engineering, model building, model evaluation, model tuning and much more.

Getting started

Let's get started

1) Python venv:
   pip install anai

Available Modelling Techniques

1) Classification

2) Regression

Basic Example

import anai ai = anai.run(filepath="data/iris.csv", target="class", predictor="lr")

Arguments

df : Pandas DataFrame
    DataFrame to be used for modelling.
target : str
    Target Column Name 
filepath : str
    Filepath of the dataframe to be loaded.
df_kwargs : dict
    Keyword arguments for the dataframe loading function. Only used if filepath is not None.
except_columns : list, optional
    List of Columns to be excluded from the dataset
predictor : list
            Predicting models to be used
params : dict
            dictionary containing parameters for model
test_size: float or int, default=.2
            If float, should be between 0.0 and 1.0 and represent
            the proportion of the dataset to include in
            the test split.
            If int, represents the absolute number of test samples.
cv_folds : int
        No. of cross validation folds. Default = 10
pca : bool
    if True will apply PCA on Train and Validation set. Default = False
lda : str
    if True will apply LDA on Train and Validation set. Default = False
pca_kernel : str
        Kernel to be use in PCA. Default = 'linear'
n_components_lda : int
        No. of components for LDA. Default = 1
n_components_pca : int
        No. of components for PCA. Default = 2
smote : Bool,
    Whether to apply SMOTE. Default = True
k_neighbors : int
    No. of neighbors for SMOTE. Default = 1
verbose : boolean
    Verbosity of models. Default = False
exclude_models : list
    List of models to be excluded when using predictor = 'all' . Default = []
path : list
    List containing path to saved model and scaler. Default = None
    Example: [model.pkl, scaler.pkl]
random_state : int
    Random random_state for reproducibility. Default = 42
tune : boolean
        when True Applies Optuna to find best parameters for model
        Default is False
optuna_sampler : Function
    Sampler to be used in optuna. Default = TPESampler()
optuna_direction : str
    Direction of optimization. Default = 'maximize'
    Available Directions:
        maximize : Maximize
        minimize : Minimize
optuna_n_trials : int
    No. of trials for optuna. Default = 100
metric : str,
    metric to be used for model evaluation. Default = 'r2' for regressor and 'accuracy' for classifier
suppress_task_detection: Bool 
    Whether to suppress automatic task detection. Default = False
task : str
    Task to be used for model evaluation. Default = None
    Only applicable when suppress_task_detection = True
    Available Tasks:
        classification : Classification
        regression : Regression

Return

 ai : regression or classification object
    Returns a regression or classification object