-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work on the simplification of multi-relation edges in networks #255
Conversation
PR se-sic#250 introduces the 'simplify.multiple.relations' configuration option, which can alter the edge-simplification algorithm for multi-networks. Correspondingly, the edge-attribute handling does need adjustment to work with this configuration option. Disambiguate and sort the 'relation' edge-attribute when simplifying multi-networks with the 'simplify.multiple.relations' option set. This works towards se-sic#251. Signed-off-by: Maximilian Löffler <[email protected]>
As the 'relation' edge-attribute potentially holds a list of relation sources, it needs to be unlisted first before it can be disambiguated. This works towards se-sic#251. Signed-off-by: Maximilian Löffler <[email protected]>
As the 'relation' edge-attribute can only be singular values if 'simplify.multiple.relations' is not set, we can set the previously introduced lambda as default. This works towards se-sic#251. Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @MaLoefUDS. I have not performed a detailed review yet, but I had a quick look at the tests (without checking them in detail, I just skimmed them).
I have some questions targeting the second test ("Simplify author-network with relation = c('cochange', 'mail') using both algorithms"):
Is there a reason why you call the simplify.network
function manually and pass the simplify.multiple.relations
parameter manually to the simplify.network
function? Isn't it possible to configure this via the recently introduced network-conf parameter simplify.multiple.relations
(see PR #250) in combination with using simplify = TRUE
in the network conf?
Maybe I overlook something or this does not belong here, but I would like to see the interplay of network conf and automatic simplification in the network builder also tested properly. If I am not mistaken, this should be easy to adjust in the test. As the automatic simplification calls the simplify.network
function internally, it should not be necessary to test this function separately, as there is already the basic test above. So, just calling get.author.network
on the network builder should be enough to derive the actual network that is compared to the expected network.
And a second idea: simplify.networks
(the plural version) also uses the the new simplify.multiple.relations
parameter. Do we already have basic tests for the plural variant? If so, would it be easily adjustable to also test the new parameter in the plural version, to make sure that the parameter is correctly passed to the single version inside?
(In addition, I spotted already two minor issues, see the comments below.)
Yeah, the only reason I did not use the network conf to set the simplification parameters were a lack of knowledge on my side 😅. I just tried it out and I discovered something interesting. Creating a network, then simplifying it does not yield the same result as creating a network with the I am not sure if that is an actually intended behavior but I guess its not and it needs further investigation. Also, we should decide on which is favored, |
This is really interesting and I'd guess that this behavior is not intended. In general, I've assumed that both ways of simplifying the network should end up in the same network, since the network-construction function of the NetworkBuilder actually calls the simplify function. So, maybe there is something additional going on in the NetworkBuilder after the simplification function is called? That's the only potential difference that comes to my mind.
Indeed, this needs further investigation. Could you please provide concrete examples of which edge attributes are affected and how their usual values look like? Also checking if there are some additional statements in the NetworkBuilder that could make a difference could be helpful. Actually, I don't know yet whether |
I have investigated the issue and I think I found a bug in To get everyone on board, the pipeline for creating networks with multiple relations is as follows:
So when we create a network with According to its documentation This is the end of my bug report and now I will use this comment to answer some of your other questions I have not answered yet:
We have exactly 0 tests that test this function, however, adding tests similar to the ones I already added should not be too difficult.
Every mutually exclusive edge attribute between two relations is affected. For example, in mail edges, we have the
After revisiting the R documentations of |
Another problem just came to my mind, but I have not verified it yet. According to the network creation pipeline in my last comment, I do not think |
Indeed, thanks for spotting this! Unfortunately, I just changed the call to a simplification function that is called inside the network creation but before the different networks are merged, my bad. I guess the problem could be fixed when adding the following statements at two specific places: if (network.conf$get.value("simplify")) {
net = simplify.network(net,
simplify.multiple.relations = network.conf$get.value("simplify.multiple.relations"))
} (1) After Line 758 in 56ff0b3
(2) After Line 820 in 56ff0b3
This way, the networks should be simplified after the different networks are combined. My previous change in line 1312, should not affect anything, so I assume we can keep this as I have implemented it? It should not make any difference, if I am not mistaken. |
Puh, that's really interesting. Thanks for the thorough investigations. In general, I agree with you. However, we need to put some more efforts into analyzing how these So, further investigations are necessary before we can decide on how to deal with this problem on our side and whether there really is a bug that we should report to |
Let me concretize my findings: Consider an arbitrary I have also checked the R documentation[doc] regarding |
To be honest, I don't get the last sentence here. What's the relation between
True. Or, more colloquially speaking: |
Regarding the misunderstanding: In my first comment about the "bug" I made it sound like that if one entry of any column is a list then any missing values will be In this case, I actually believe |
Signed-off-by: Maximilian Löffler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments on your last two commits, see below.
For the last commit: Could you please adjust the commit message - the commit only cleans up the tests, but the commit message suggests that the actual implementation (not the tests) was changed.
* Enable 'simplify.multiple.relations' config parameter to work properly. Previously, networks would be simplified before merging datasources, i.e., before edges could even have multiple relations * Circumvent undocumented behavior in 'plyr::rbind.fill' that occurs when merging already simplified networks (see PR#255) Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
To get an overview of the status of this PR: The only part that is still missing in this PR are tests for the "plural version" |
Yes, the tests are on my bucketlist for today. I don't remember anything that we did talk about but not yet implemented it in this PR ^^. |
I was just wondering, since Edit: Nevermind I just realized that these parameters directly map to |
I agree that there is no need for "elaborate" |
Signed-off-by: Maximilian Löffler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments below–I only performed a "syntactical" review, not yet a "semantic" one, that is, I did not yet think about the test scenarios yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm done with my review now. Looks good to me.
I've a few suggestions on how to improve the tests with regard to other attributes etc. If you think my suggestions are beneficial improvements to the tests and easy to realize, I would appreciate it if you could extend the tests a little bit.
If you have good arguments why my suggestions are unnecessary, I am also open for them 😄
Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
Im unsure whether the |
just because it's not an explicitly showcased feature now doesn't mean it has to stay that way 😉
We don't really need an update, but I still think it can be beneficial to demonstrate the new feature there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two small comments regarding the NEWS.md:
Regarding the Update (1/2): When plotting, we decide the color of an edge based on the relation of said edge, as seen here. For multi-relational edges we need to think about what we can do. We can probably take the first or the last element in the list to obtain a color that is already in use for other edge relations. I am new to ggplot, so I don't really know if we can just concat all elements in the relations list and get a new color from that. Update (2/2): I tried concating the relations and it seems to work now, I will finish up a commit so you can have a look. |
Great! First of all, thanks for spotting this bug! Nobody would ever have thought of fixing the plotting functionality to deal with edges that represent multiple relations. I've said it many times already: Playing around with the
Thanks for solving the problem. I really like your fix. I've tried plotting networks with multiple different combinations of relations on single edges, and all the different combinations get different edge colors, as intended. |
Concat all relations of an edge to a single string to determine the edges color by, since ggplot2 does not accept lists as an identifier. Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
Signed-off-by: Maximilian Löffler <[email protected]>
Prerequisites
showcase.R
with respect to my changes.dev
.Description
Adjust edge attribute handling slightly to make it compatible with the
simplify.multiple.relations
configuration option introduced in PR #250. In addition, provide more tests for network simplification.Changelog
Added:
Fixed:
get.data.sources.from.relations
when confronted with networks that include edges that stem from multiple relations (e.g., have been simplified usingsimplify.multiple.relations
)Changed:
relation
edge-attribute when simplifying throughigraph