[Maki TANAKA] - Fab Futures - Data Science
Home About

< Home

1: Introduction¶

Assignment:¶

  • Select and document a data set to analyze
  • Connect to a JupyterLab server and become familiar with the user interface

First setting¶

I did first several setting for making "my page" in jupyter notebook.

Put "my name"¶

Firstly we can see the "[Your-Name-Here]" on the first page.

No description has been provided for this image

So I put my name in the theme/index.html.j2

No description has been provided for this image

1-1. Jupyter¶

This is the first time to use Jupyter Notebook as a document tool.

This page is shown at first.

No description has been provided for this image

When selecting the file at the slide bar or just click the "Python 3" just after the Notebook, we can start to edit it.

if new page is open, you can see as follows;

No description has been provided for this image

Each line is called "cell" and it has 3 types, "code", "Markdown", "Raw". After choosing the type, and start writing.

No description has been provided for this image

1-1-2. Use jupyter notebook in local envrionment¶

As we've heard web jupyter notebook would be closed after this course, I decided to use it in local environment too.

0. Preparation

  • I made directory for working this course.

  • pull the all data in gitlab into local environment. No description has been provided for this image

  • Click the square button and copy it's address and write command on the terminal
    % git clone "the address from ssh"

All data are now in the local environment.

My computer is Mac and I installed jupyter as follows;

  1. Check the existance of python % python --version

  2. Activate conda % conda activate base

  3. Install jupyter % conda install jupyter

    -> I installed several library afterwards, as it was needed.

When I finished all jupyter install, execute this command on the terminal.
% jupyter lab

1-2. Dataset¶

I chose the wine dataset in Kaggle. It sounds really interesting and good start for me who love wine!

From Kaggle , the discription of this dataset is as follows;

About Dataset
The data was used with many others for comparing various classifiers. In a classification context, this is a well posed problem with "well behaved" class structures. A good data set for first testing of a new classifier, but not very challenging.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
The attributes are:

  • Alcohol
  • Malic acid
  • Ash
  • Alcalinity of ash
  • Magnesium
  • Total phenols
  • Flavanoids
  • Nonflavanoid phenols
  • Proanthocyanins
  • Color intensity
  • Hue
  • OD280/OD315 of diluted wines
  • Proline

For Each Attribute: All attributes are continuous

No statistics available, but suggest to standardise variables for certain uses (e.g. for us with classifiers which are NOT scale invariant)

NOTE: 1st attribute is class identifier (target)(1-3)

Acknowledgements:
This dataset is also available from Kaggle & UCI machine learning repository, https://archive.ics.uci.edu/dataset/109/wine