diff options
| author | mike <mike1994vink@gmail.com> | 2021-03-17 17:34:31 +0100 |
|---|---|---|
| committer | mike <mike1994vink@gmail.com> | 2021-03-17 17:34:31 +0100 |
| commit | 5ab1872bc99282722726c65142a28b87aacaca5c (patch) | |
| tree | adf0c8e80b67170cfd8b549bced36118cc14f8b2 /README.md | |
| parent | cee6b4d208b207345f82cc58b614a91071443fa3 (diff) | |
change(readme)
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 28 |
1 files changed, 8 insertions, 20 deletions
@@ -1,25 +1,13 @@ -# 2020_data_mining_assignments +# 2020 project for the data mining course * [link to assignment](http://www.cs.uu.nl/docs/vakken/mdm/assignment1-2020.pdf) * [link to article datasets part 2](https://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/promise2007-dataset-20a.pdf) -## Part1, tree algorithm/implementation: -- [X] [tree_grow](https://github.com/Vinkage/2020_data_mining_assignments/blob/e650ad27d13b392f5b6535906e36176cb0777650/assignment1.py#L321-L406) functie die het [pseudocode in de slides](./media/tree_grow_pseudo_code.png) volgt -- [X] tree_grow aanpassen voor n_feat, een paar lines die zeggen dat de hoeveelheid cols die aan exhaustivesplitsearch gegeven worden random uit x gepakt moeten worden -- [X] tree_grow_b bootstrap versie van tree grow, die een lijst van tree construct door met replacement rows uit x te kiezen -- [X] [tree_pred functie](https://github.com/Vinkage/2020_data_mining_assignments/blob/da8ca975fb9d11d3801fef66344736e675734c42/assignment1.py#L77-L103) met efficiente conditional branches -- [X] tree_pred_b een functie die een lijst van tree kan gebruiken om een voorspelling te maken voor rows in een data array x -- [X] Figure out how we want to compute the confusion matrix (scipy?) -- [X] Test prediction of single tree on pima indians data with nmin 20 and minleaf 5, check with confusion matrix in [link to assignment](http://www.cs.uu.nl/docs/vakken/mdm/assignment1-2020.pdf) -- [X] Test prediction on [educode](https://uu.educode.nl/login/?next=/submissions/646/feedback/95016) +This project is a from scratch implementation of a classification tree using +the theory we learned in the course. We were awarded a 10/10 for the coding +part, and 8.5/10 for the report, the code was said to be very readable, and we +also avoided time bottle necks using numpy. - -## Part2, data analysis: -- [ ] Datasets collecten uit de literature -- [ ] Datasets describen, exploren/plotten/formatten als het nodig is - -### Official steps - - -## The report - +The concepts of impurity reduction and the gini-index were used to construct an +algorithm that computes the "best split" at each step. We were also required to +implement ensembling methods. |
