| Download

teste

Project: Jupyter01

Path: Segunda Avaliação.ipynb

Views: ¹⁹

Kernel: Python 3 (Ubuntu Linux)

Dataset Mushroom

Disciplina: Tópicos Especiais em Sistemas de Informação

Por: Mateus Oliveira e Daniel Farias

Questão 1

Dados sobre o Dataset

Nome: Mushroms Database

Descrição: Registros de cogumelos retirados do guia de campo da sociedade Audubon para cogumelos norte americanos (Audubon Society Field Guide to North American Mushrooms) (1981). G. H. Lincoff (Pres.), Alfred A. Knopf

Volume de dados: 8.224 registros

Principais colunas (TOTAL DE 8):

poison: edible=e, poisonous=p

refere-se à toxicidade do cogumelo

cap-shape: bell=b,conical=c,convex=x,flat=f,knobbed=k,sunken=s

refere-se à forma do "chapeu" do cogumelo

cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y

refere-se à cor do cogumelo

bruises? bruises=t,no=f

refere-se se o cogumelo possui ou não manchas

odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s

refere-se ao cheiro do cogumelo

ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z

essa coluna refere-se à forma do anel do cogumelo

population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y

refere-se à população do cogumelo

habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

refere-se ao habitat do cogumelo

DADOS GERAIS DO DATASET

Número de atributos: 22 colunas (todos nominalmente valorados)

0. poison: edible=e, poisonous=p ----> esse foi criado no dataset como parte da base de dados

1. cap-shape: bell=b,conical=c,convex=x,flat=f,knobbed=k,sunken=s

2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s

3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y

4. bruises?: bruises=t,no=f

5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s

6. gill-attachment: attached=a,descending=d,free=f,notched=n

7. gill-spacing: close=c,crowded=w,distant=d

8. gill-size: broad=b,narrow=n

9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y

10. stalk-shape: enlarging=e,tapering=t

11. stalk-root: bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?

12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s

13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s

14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y

15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y

16. veil-type: partial=p,universal=u

17. veil-color: brown=n,orange=o,white=w,yellow=y

18. ring-number: none=n,one=o,two=t

19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z

21. population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y

22. habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

Campos em branco: 2480 (denotados por "?"), todos do atributo #11.

Objetivo do Trabalho Sobre o Dataset

Avaliar características de cogumelos (cor, cheiros, toxicidade...)
Relacionar algumas características do cogumelo com a sua toxicidade
O resultado esperado após a execução dos algoritmos é a definição se um cogumelos com determinadas características pode ser venenoso ou não.

In [2]:

## 0 - Pré-requisitos para análise
import time
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis, LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_score, train_test_split
from scipy.stats import bayes_mvs
from sklearn.multiclass import OneVsOneClassifier, OneVsRestClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, ExtraTreesClassifier
from sklearn.svm import LinearSVC
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, precision_score, recall_score
from sklearn.neighbors import NearestNeighbors
from sklearn import linear_model
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RepeatedStratifiedKFold, train_test_split
import numpy, itertools, warnings
# Ignorando Possiveis Warnings:
warnings.filterwarnings('ignore')
#from pyod.models.knn import KNN

df = pd.read_csv('mushrooms.csv')
new_df = df.copy()

KEEP = ("poison", "cap-shape", "cap-color", "bruises", "odor", "ring-type", "population", "habitat")
METHODS = ('mnb', 'bnb', 'rfc', 'gnb', 'svm', 'tree', 'etc', 'qda', 'lda', 'ab', 'ovo', 'ovr') #'knn', 'regr',
METHODS_DICT = {'mnb':'MultinomialNB','bnb':'BernoulliNB','rfc':'RandomForestClassifier','gnb':'GaussianNB',
                'svm':'LinearSVC','knm':'NearestNeighbors','tree':'DecisionTreeClassifier','regr':'LinearRegression',
                'etc':'ExtraTreesClassifier','qda':'QuadraticDiscriminantAnalysis','lda':'LinearDiscriminantAnalysis',
                'ab':'AdaBoostClassifier','ovo':'OneVsOneClassifier','ovr':'OneVsRestClassifier'}

SUBPLOT_NUMBER = 432

In [7]:

def _algorithm_builder(method = 'mnb'):
    if method == 'mnb':
        return MultinomialNB()
    elif method == 'bnb':
        return BernoulliNB()
    elif method == 'rfc':
        return RandomForestClassifier(n_estimators=100, max_depth=2,random_state=0)
    elif method == 'gnb':
        return GaussianNB()
    elif method == 'svm':
        return LinearSVC()
    elif method == 'knn':
        return NearestNeighbors(n_neighbors=2, algorithm='ball_tree')
    elif method == 'tree':
        return DecisionTreeClassifier()
    elif method == 'regr':
        return linear_model.LinearRegression()
    elif method == 'etc':
        return ExtraTreesClassifier(n_estimators=10, max_depth=None,min_samples_split=2, random_state=0)
    elif method == 'qda':
        return QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,store_covariance=False,
                                                                      store_covariances=None, tol=0.0001)
    elif method == 'lda':
        return LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')
    elif method == 'ovo':
        return OneVsOneClassifier(LinearSVC(random_state = 0))
    elif method == 'ovr':
        return OneVsRestClassifier(LinearSVC(random_state = 0))
    elif method == 'ab':
        return AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),algorithm="SAMME",n_estimators=200)

def cleanup(new_df):
    for c in df.columns:
        print("Uniques da coluna: %s" % c)
        lista = new_df[c].unique()
        
        if(len(lista) == 2)
            new_df[c].apply(lambda x: (0,1)[x == lista[0]])
        if(len(lista) == 1)
            new_df.drop(c, axis = 1, inplace=True)
        
    return new_df
    
def show_uniques(df = df):
    for c in df.columns:
        print("Uniques da coluna: %s" % c)
        for u in df[c].unique():
            print(u)

def _get_data(new_df):
    y = df['poison'].apply(lambda x: (0,1)[x == 'p'])
    x = pd.get_dummies(new_df)
    return {'x': x, 'y': y}
    
def apply_algorithms(df = df, method = 'mnb'):
    input_data = _get_data(df)
    manager = _algorithm_builder(method)
    k = 10
    scores = cross_val_score(manager, input_data['x'], input_data['y'], cv = k)
    return scores

def plot_scatter(df = df, headers = KEEP):
    pd.plotting.scatter_matrix(df[list(headers)], figsize=[40, 40])
    plt.show()
    
def matriz_confusao(df = df, method = 'mnb'):
    input_data = _get_data(df)
    manager = _algorithm_builder(method)
    x_treino, x_teste, y_treino, y_teste = train_test_split(input_data['x'], input_data['y'], test_size=0.25, random_state=0)
    y_pred = manager.fit(x_treino, y_treino).predict(x_teste)

    # Compute confusion matrix
    print("Matriz de confusão para: %s" % method)
    cm1 = confusion_matrix(y_teste, y_pred)
    print(cm1)

def get_dataframe_by_headers(df = df, headers = KEEP): 
    new_df = df.copy()
    for i in new_df:   
        if(i not in headers):
            new_df.drop(i, axis = 1, inplace=True)
    return new_df

def find_best_combination(df = df):
    df_headers = list()
    combinations = list()
    
    for i in df:
        df_headers.append(i)
    
    #headers = set(df_headers)
    
    new_keep = df_headers[1::]
    
    for L in range(0, len(new_keep)+1):
        for subset in itertools.combinations(new_keep, L):
            if(len(subset) <= len(new_keep)):
                combinations.append(subset)
    return set(combinations)

def get_best_headers(df, method, all = False):
    index = 0
    # Bests
    bests = list()
    
    headerset = (KEEP,)
    
    if(all):
        headerset = find_best_combination(df)
    
    temp_dict = {}
    temp_dict['method'] = method
    maximum = 0
    index = 1
    print('Trying method %s' % method)
    total = len(headerset)
    for i in headerset:
        temp_df = get_dataframe_by_headers(df, i)
        result = apply_algorithms(temp_df, method)
        score = numpy.mean(result)
        #print('%d/%d - %.5f - %s' % (index, total, score, i))
        index += 1
        if(score > maximum):
            temp_dict['score'] = score
            temp_dict['header'] = i
            maximum = score
    
    return temp_dict

def plot_chart(df = df):
    input_data = _get_data(df)
    manager = _algorithm_builder(method)
    
def get_correlation(df = df):
    input_data = _get_data(df)
    #manager = _algorithm_builder(method)
    correlations = input_data['x'].corr()
    return correlations

def plot_correlation(df = df, xlabels = [''] , ylabels = ['']):
    correlations = get_correlation(df)
    fig = plt.figure()
    ax = fig.add_subplot(122)
    cax = ax.matshow(correlations, vmin=-1, vmax=1)
    fig.colorbar(cax)
    limit = len(df.columns)-1
    ticks = numpy.arange(0,limit,1)
    #ax.set_xticks(df.columns)
    #ax.set_yticks(ticks)
    if xlabels:
        ax.set_xticklabels(xlabels)
    if ylabels:
        ax.set_yticklabels(ylabels)
    plt.show()
    print(limit)
    
def avaliacao_metrica(df = df, method = 'mnb'):
    input_data = _get_data(df)
    manager = _algorithm_builder(method)
    x_treino, x_teste, y_treino, y_teste = train_test_split(input_data['x'], input_data['y'], test_size=0.25, random_state=0)
    y_pred = manager.fit(x_treino, y_treino).predict(x_teste)
    inicio = time.time()
    print("Acurácia: ",accuracy_score(y_teste, y_pred))
    print("Precisão: ",precision_score(y_teste, y_pred, average='macro') )
    print("Recall: ",recall_score(y_teste, y_pred, average='micro') )
    fim = time.time()
    print("tempo de execução: ",fim - inicio)
    
def plota_cor(filtro = df, label="cor dos cogumelos"):
    df_colors = filtro
    p_brown = df_colors[df_colors['cap-color'] == 'n']['cap-color'].count()
    p_buff = df_colors[df_colors['cap-color'] == 'b']['cap-color'].count()
    p_cinnamon = df_colors[df_colors['cap-color'] == 'c']['cap-color'].count()
    p_gray = df_colors[df_colors['cap-color'] == 'g']['cap-color'].count()
    p_greener = df_colors[df_colors['cap-color'] == 'r']['cap-color'].count()
    p_pink = df_colors[df_colors['cap-color'] == 'p']['cap-color'].count()
    p_purple = df_colors[df_colors['cap-color'] == 'u']['cap-color'].count()
    p_red = df_colors[df_colors['cap-color'] == 'e']['cap-color'].count()
    p_white = df_colors[df_colors['cap-color'] == 'w']['cap-color'].count()
    p_yellow = df_colors[df_colors['cap-color'] == 'y']['cap-color'].count()

    labels = ('Marrom', 'couro', 'canela', 'cinza', 'verde', 'rosa', 'roxo', 'vermelho', 'branco', 'amarelo' )
    sizes = [p_brown, p_buff, p_cinnamon, p_gray, p_greener, p_pink, p_purple, p_red, p_white, p_yellow, ]
    colors = ['brown', '#ff00ff', 'black', 'gray', 'green', 'pink', 'purple', 'red', 'white', 'yellow']

    # Plot
    plt.pie(sizes, labels=labels, colors=colors,
    autopct='%1.1f%%', shadow=True, startangle=180)
    print(label)
    plt.axis('equal')
    plt.show()
    
def plota_cheiro(filtro = df, label="cheiro dos cogumelos", ):
    dframe = filtro

    o_almond = dframe[dframe['odor'] == 'a']['odor'].count()
    o_anise = dframe[dframe['odor'] == 'l']['odor'].count()
    o_creosote = dframe[dframe['odor'] == 'c']['odor'].count()
    o_fishy = dframe[dframe['odor'] == 'y']['odor'].count()
    o_foul = dframe[dframe['odor'] == 'f']['odor'].count()
    p_musty = dframe[dframe['odor'] == 'm']['odor'].count()
    p_none = dframe[dframe['odor'] == 'n']['odor'].count()
    p_pungent = dframe[dframe['odor'] == 'p']['odor'].count()
    p_spicy = dframe[dframe['odor'] == 's']['odor'].count()

    labels = ('amêndoa', 'anis', 'cresoto', 'duvidoso', 'falta', 'mofado', 'sem cheiro', 'pungente', 'picante')
    sizes = [o_almond, o_anise, o_creosote, o_fishy, o_foul, p_musty, p_none, p_pungent, p_spicy]
    colors = ['brown', '#ff00ff', 'orange', 'gray', 'green', 'pink', 'purple', 'red', 'white', 'yellow']

    # Plot
    plt.pie(sizes, labels=labels, colors=colors,
    autopct='%1.1f%%', shadow=True, startangle=180)
    print(label)
    plt.axis('equal')
    plt.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-39c9486487c1> in <module>()
     35     return {'x': x, 'y': y}
     36 
---> 37 def apply_algorithms(df = df, method = 'mnb'):
     38     input_data = _get_data(df)
     39     manager = _algorithm_builder(method)
NameError: name 'df' is not defined

In [6]:

def plota_populacao(filtro=df, label="populacao dos cogumelos"):
    dframe = filtro

    p_abundant = dframe[dframe['population'] == 'a']['population'].count()
    p_clustered = dframe[dframe['population'] == 'c']['population'].count()
    p_numerous = dframe[dframe['population'] == 'n']['population'].count()
    p_scattered = dframe[dframe['population'] == 's']['population'].count()
    p_several = dframe[dframe['population'] == 'v']['population'].count()
    p_solitary = dframe[dframe['population'] == 'y']['population'].count()

    labels = ('abudante', 'agrupado', 'numeroso', 'espalhado', 'varios', 'solitario')
    sizes = [p_abundant, p_clustered, p_numerous, p_scattered, p_several, p_solitary]
    colors = ['brown', 'blue', 'gray', 'green', 'pink', 'red']

    # Plot
    plt.pie(sizes, labels=labels, colors=colors,
    autopct='%1.1f%%', shadow=True, startangle=90)
    print(label)
    plt.axis('equal')
    plt.show()

def plota_habitat(filtro=df, label="habitat dos cogumelos"):
    dframe = filtro
    h_grasses = dframe[dframe['habitat'] == 'g']['habitat'].count()
    h_leaves = dframe[dframe['habitat'] == 'l']['habitat'].count()
    h_meadows = dframe[dframe['habitat'] == 'm']['habitat'].count()
    h_paths = dframe[dframe['habitat'] == 'p']['habitat'].count()
    h_urban = dframe[dframe['habitat'] == 'u']['habitat'].count()
    h_waste = dframe[dframe['habitat'] == 'w']['habitat'].count()
    h_woods = dframe[dframe['habitat'] == 'd']['habitat'].count()


    labels = ('grama', 'folhas', 'campo', 'estradas', 'urbano', 'lixo', 'árvores')
    sizes = [h_grasses, h_leaves, h_meadows, h_paths, h_urban, h_waste, h_woods]
    colors = ['green', 'brown', 'gray', 'yellow', 'pink', 'red', 'blue']

    # Plot
    plt.pie(sizes, labels=labels, colors=colors,
    autopct='%1.1f%%', shadow=True, startangle=90)
    print(label)
    plt.axis('equal')
    plt.show()
    
def plota_shape(filtro=df, label="formato do chapéu dos cogumelos"):

    dframe = df

    s_bell = dframe[dframe['cap-shape'] == 'b']['cap-shape'].count()
    s_conical = dframe[dframe['cap-shape'] == 'c']['cap-shape'].count()
    s_convex = dframe[dframe['cap-shape'] == 'x']['cap-shape'].count()
    s_flat = dframe[dframe['cap-shape'] == 'f']['cap-shape'].count()
    s_knobbed = dframe[dframe['cap-shape'] == 'k']['cap-shape'].count()
    s_sunken = dframe[dframe['cap-shape'] == 's']['cap-shape'].count()

    labels = ('sino', 'cônico', 'convexo', 'plano', 'taça', 'afundado')
    sizes = [s_bell, s_conical, s_convex, s_flat, s_knobbed, s_sunken]
    colors = ['green', 'brown', 'gray', 'yellow', 'pink', 'blue']

    # Plot
    plt.pie(sizes, labels=labels, colors=colors,
    autopct='%1.1f%%', shadow=True, startangle=90)
    print(label)
    plt.axis('equal')
    plt.show()
    
def show_dist(df = df):
    for i in df.columns:
        plt.title('Histograma de: %s' % i)
        plt.grid(True)
        df[i].hist()
        plt.show()

def print_dummies(df = df):
    x = pd.get_dummies(df)
    print(x)

def crosstab(df = df, method = 'mnb'):
    input_data = _get_data(df)
    manager = _algorithm_builder(method)
    x_treino, x_teste, y_treino, y_teste = train_test_split(input_data['x'], input_data['y'], test_size=0.25, random_state=0)
    output = pd.crosstab(y_teste, modelot.predict(x_teste), rownames=['Real'], colnames=['Predito'], margins=True)
    print(output)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-762f20c8acb6> in <module>()
----> 1 def plota_populacao(filtro=df, label="populacao dos cogumelos"):
      2     dframe = filtro
      3 
      4     p_abundant = dframe[dframe['population'] == 'a']['population'].count()
      5     p_clustered = dframe[dframe['population'] == 'c']['population'].count()
NameError: name 'df' is not defined

In [0]:

# 2 - a)

# plotagem de gráficos e análise dos mesmos

## com base nos dados mais fáceis de se obter por uma pessoa com pouco conhecimento específico sobre o assunto

# cogumelos comestiveis vs venenosos 

poisonous = df[df['poison'] == 'p']['poison'].count()
edible = df[df['poison'] == 'e']['poison'].count()

labels = ('Venenosos (%s)'%poisonous, 'Comestíveis (%s)'%edible)
sizes = [poisonous, edible]
colors = ['red', 'green']

# Plot
plt.pie(sizes, labels=labels, colors=colors,
autopct='%1.1f%%', shadow=True, startangle=10)
print("Cogumelos venenosos e comestíveis:")
plt.axis('equal')
plt.show()

In [5]:

# cores dos cogumelos 

plota_cor(filtro=df, label = "cor dos cogumelos")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-a01a6f8ed278> in <module>()
      1 # cores dos cogumelos
      2 
----> 3 plota_cor(filtro=df, label = "cor dos cogumelos")

NameError: name 'plota_cor' is not defined

In [0]:

# cores dos cogumelos não-venenosos 

plota_cor(df[df['poison'] == 'e'], label = "cor dos cogumelos não venenosos")

In [0]:

# cores dos cogumelos venenosos 

plota_cor(df[df['poison'] == 'p'], label = "cor dos cogumelos venenosos")

In [0]:

# cheiro dos cogumelos 

plota_cheiro(filtro = df, label = "cheiro dos cogumelos")

In [0]:

# cheiro dos cogumelos venenosos

plota_cheiro(filtro = df[df['poison'] == 'p'], label = "cheiro dos cogumelos venenosos")

In [0]:

# cheiro dos cogumelos não-venenosos

plota_cheiro(filtro = df[df['poison'] == 'e'], label = "cheiro dos cogumelos venenosos")

In [0]:

# população dos cogumelos

plota_populacao(filtro=df, label="população dos cogumelos")

In [0]:

# população dos cogumelos venenosos

plota_populacao(filtro=df[df['poison'] == 'p'], label="população dos cogumelos venenosos")

In [0]:

# população dos cogumelos não venenosos

plota_populacao(filtro=df[df['poison'] == 'e'], label="população dos cogumelos não venenosos")

In [0]:

# habitat dos cogumelos

plota_habitat(filtro = df, label="habitat dos cogumelos")

In [0]:

# habitat dos cogumelos não venenosos

plota_habitat(filtro = df[df['poison'] == 'e'], label="habitat dos cogumelos não venenosos")

In [0]:

# habitat dos cogumelos venenosos

plota_habitat(filtro = df[df['poison'] == 'p'], label="habitat dos cogumelos venenosos")

In [0]:

# formato do chapeu dos cogumelos

plota_shape(filtro=df, label="formato dos cogumelos")

In [0]:

# formato do chapeu dos cogumelos nao venenosos

plota_shape(filtro=df[df['poison'] == 'e'], label="formato dos cogumelos não venenosos")

In [0]:

# formato do chapeu dos cogumelos venenosos

plota_shape(filtro=df[df['poison'] == 'p'], label="formato dos cogumelos não venenosos")

In [0]:

# matriz de correlação 

print("Matriz de Correlação")
plot_correlation()

In [0]:


print("Starting up...")

METHODS = ('mnb','rfc', 'gnb', 'tree', 'lda', 'ab', 'qda') #'svm','knn','regr', 

aftermaths = list()

for j in METHODS:
    print()
    aftermaths.append(get_best_headers(df, j)['score'])
print("\nfim")
print(aftermaths)

In [0]:

#t = numpy.arange(0.0, 2.0, 0.01)
#s = 1 + numpy.sin(2*numpy.pi*t)
plt.plot(METHODS, aftermaths)

plt.xlabel('Algoritmo')
plt.ylabel('Percentagem de acerto')
plt.title('Testes com algoritmos diferentes')
plt.grid(True)
plt.show()

In [0]:

for j in METHODS:
    print ("\nALGORITMO %s" %METHODS_DICT[j])
    avaliacao_metrica(df, j)

In [0]:

for j in METHODS:
    matriz_confusao(df, j)

In [0]:

plot_correlation(get_dataframe_by_headers(), ylabels = ["poison", "cap-shape", "cap-color", "bruises", "odor", "ring-type", "population", "habitat"])

In [0]:

plot_correlation(get_dataframe_by_headers(headers = ('poison', 'bruses')), ylabels = ["poison", "cap-shape"], xlabels = ["poison", "cap-shape"])

In [0]:

show_dist(get_dataframe_by_headers())

In [0]:

show_dist()

In [0]:

#plot_scatter()

print_dummies()

In [0]:

print_dummies(get_dataframe_by_headers())

In [122]:

for j in METHODS:
    print ("\nALGORITMO %s" %METHODS_DICT[j])
    avaliacao_metrica(get_dataframe_by_headers(), j)

ALGORITMO MultinomialNB
Acurácia:  0.9985228951255539
Precisão:  0.9985902255639098
Recall:  0.9985228951255539
tempo de execução:  0.0039520263671875

ALGORITMO RandomForestClassifier
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.0023920536041259766

ALGORITMO GaussianNB
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.0035805702209472656

ALGORITMO DecisionTreeClassifier
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.0032019615173339844

ALGORITMO LinearDiscriminantAnalysis
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.004513978958129883

ALGORITMO AdaBoostClassifier
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.003444194793701172

ALGORITMO QuadraticDiscriminantAnalysis
Acurácia:  1.0
Precisão:  1.0
Recall:  1.0
tempo de execução:  0.004183053970336914

In [0]:

#KEEP = ("poison", "cap-shape", "cap-color", "bruises", "odor", "ring-type", "population", "habitat")
crosstab()

In [0]:

show_uniques()

clean_df = cleanup()