summaryrefslogtreecommitdiff
path: root/README.md
blob: 1579e7c16e87f48b97710e546a4faf0f240db1f4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
# 2020 project for the data mining course
* [link to assignment](http://www.cs.uu.nl/docs/vakken/mdm/assignment1-2020.pdf)
* [link to article datasets part 2](https://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/promise2007-dataset-20a.pdf)


This project is a from scratch implementation of a classification tree using
the theory we learned in the course. We were awarded a 10/10 for the coding
part, and 8.5/10 for the report, the code was said to be very readable, and we
also avoided time bottle necks using numpy.

The concepts of impurity reduction and the gini-index were used to construct an
algorithm that computes the "best split" at each step. We were also required to
implement ensembling methods.