-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9006197
commit 37d1646
Showing
16 changed files
with
90 additions
and
10 deletions.
There are no files selected for viewing
File renamed without changes.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+48.1 KB
(160%)
data/real_streams_gt/INSECTS-incremental_imbalanced_norm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+3.67 KB
(150%)
data/real_streams_gt/clf_INSECTS-abrupt_imbalanced_norm.npy
Binary file not shown.
Binary file modified
BIN
+3.67 KB
(150%)
data/real_streams_gt/clf_INSECTS-gradual_imbalanced_norm.npy
Binary file not shown.
Binary file modified
BIN
+3.67 KB
(150%)
data/real_streams_gt/clf_INSECTS-incremental_imbalanced_norm.npy
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Drift annotation procedure | ||
|
||
Real concept drifts are associated with a change (usually a decrease) in the quality of the classification achieved by the classifier. If the classifier was trained using data from a concept other than the current one, its recognition quality should decrease, as the classifier is not *familiar* with the current data deistribution. If an increase in quality is observed, it can be suspected that the data distribution is close to the previous concept and there are fewer samples in areas of overlap between class samples - which is also related to the change in concept. | ||
|
||
A human expert marked the locations of drifts based on the classification quality of three classifiers: Gaussian Naive Bayes (GNB) and Multilayer Perceptron (MLP) and Extreme Learning Machine (ELM) in the Test-Then-Train experimental protocol. For every chunk of data, the classifiers were first used in the inference and quality evaluation procedure, then trained using a new portion of data. Such a protocol should allow for the most accurate determination of real concept drifts at the beginning of stream processing, in particular a clear identification of the change between the first and second concept. Training the classifier with subsequent portions of data, especially in the case of MLP, which is *forgetting* the previous data distributions, should enable the identification of further concept changes. | ||
|
||
Partial fitting MLP by default performs only one iteration of weight optimization, so at the beginning of stream processing the recognition quality using MLP is lower and later, if the concept is stable, it increases. | ||
|
||
It should be emphasized that the processed streams were previously divided into chunks and pruned of those batches containing only single class samples. This makes the identified drift moments specific to the transformed streams used in the experiments and should not be used as unambiguous drift moments in the original streams for the purposes of other studies. | ||
|
||
Below we present the classification results using scetterplot (top row) and plot (bottom row) for the processed streams. The quality obtained by GNB is marked in blue, the MLP is marked in gold, and in red - ELM. The x-axis shows the identified moments of drift, determined based on changes in classification quality. | ||
|
||
### Electricity | ||
![electricity](data/real_streams_gt/electricity.png) | ||
|
||
### Covtype | ||
![covtype](data/real_streams_gt/covtypeNorm-1-2vsAll-pruned.png) | ||
|
||
### Poker | ||
![poker](data/real_streams_gt/poker-lsn-1-2vsAll-pruned.png) | ||
|
||
### Insect abrupt | ||
![insect-abrupt](data/real_streams_gt/INSECTS-abrupt_imbalanced_norm.png) | ||
|
||
### Insect gradual | ||
![insect-grad](data/real_streams_gt/INSECTS-gradual_imbalanced_norm.png) | ||
|
||
### Insect incremental | ||
![insect-abrupt](data/real_streams_gt/INSECTS-incremental_imbalanced_norm.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters