Hands-on Data Science with Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/09/24/manning.html

Introduction

Warning! The post includes (self)promotion.

Recently with Łukasz Kraiński we have published with Manning
the Hands-on Data Science with Julia liveProject.

In this post I want to discuss the idea behind this format and what you can
expect inside.

Why liveProject format?

If you are reading this post most likely you know that I write a lot on Julia
Slack, Julia Discourse, StackOverflow [julia] tag, give various tutorials,
write package documentation, and finally I do weekly updates to this blog.

So how is liveProject different format so that I have decided to give it a try
instead of doing e.g. a new JuliaAcademy course?

First, the content is project oriented. This means that you get a business
description of the challenge and should try to finish the task yourself. If
something is challenging to finish then you have three levels of support:
hints, partial solution, and finally full solution for every task. It might
seem as not very significant, but I believe that trying to do something on ones
own (and getting only as much help as really required) is superior to just
reading through some example codes.

The second valuable option is that on the platform you can compare your
implementation with other solutions to the same problem and also discuss it
with other coders or mentors.

All this means that although the projects are marked as intermediate level
content (since such experience is required to do the tasks on your own) even if
one is a beginner the material is still useful, with a twist that in this case
you probably will have to rely more on the provided help to be able to finish
the tasks.

I will see how it works out and maybe write a post with conclusions after some
time of giving the liveProject format a try.

What is inside?

The Hands-on Data Science with Julia liveProject is divided into five
parts. Here I will focus on commenting Julia packages are featured in
them (all packages are using their latest versions as for the time of writing
this post; I list them incrementally as they are introduced in consecutive
projects):

  1. data prepossessing: Arrow.jl, Chain.jl, CSV.jl, DataFrames.jl, FreqTables.jl,
    Plots.jl, StatsBase.jl;
  2. clustering (k-means, DBSCAN): Clustering.jl, Distances.jl;
  3. dimensionality reduction (PCA, t-SNE, UMAP): Conda.jl, MultivariateStats.jl,
    PyCall.jl;
  4. predictive modelling – regression problems (random forest, GLM):
    DecisionTree.jl, GLM.jl, HypothesisTests.jl;
  5. predictive modelling – classification problems (XGBoost): ROCAnalysis.jl,
    XGBoost.jl.

Conclusions

As opposed to standard teaching materials liveProjects are a paid content and
that is why I have issued a (self)promotion warning at the beginning of this
post. However, a good thing is that the first project on
data preprocessing is available for free so it is easy to check if you
like the content that we have prepared.