-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Multi Agent Beat Tracker #333
base: main
Are you sure you want to change the base?
[WIP] Multi Agent Beat Tracker #333
Conversation
…ted histogram processor
…and add numpy-compatible slicing access
This variant is slower than the previous one but is more flexible.
65a0c80
to
0b9de3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had the chance to test everything, thus only some general comments.
madmom/features/beats.py
Outdated
---------- | ||
max_agents : int | ||
Max number of agents. | ||
tempi : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be in line with other beat trackers, this should be num_tempi
.
Additionally, all these parameters should be available in __init__
, otherwise they cannot be controlled. If they are optional (i.e. initialised with default values) the should be marked accordingly.
Child agents will inherit a percentage of their parents. | ||
|
||
""" | ||
# TODO: Should we use @classmethod to set all values from outside? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@classmethod is probably not the right choice for this. Usually these parameters are just added to __init__
.
Since you documented them here (as numpydoc requires the init-params to be documented at class-level), these should be available to __init__
. Vice versa, the params of init must be documented here as well.
madmom/features/beats.py
Outdated
TEMPI = 3 | ||
INDUCTION_TIME = 2. | ||
|
||
class Agent(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather make this a class at module level. Although it is only used by MultiAgentBeatTrackingProcessor
it is IMHO cleaner to move this class one level up. I'd rename it to BeatTrackingAgent(Processor) or something like that, maybe add a leading underscore if it is not usable otherwise.
CORRECTION_FACTOR = 0.25 | ||
INHERIT_SCORE_FACTOR = 0.9 | ||
|
||
# used for normalizing the score, set from outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are set from outside, normal class variables are just fine (lowercase without the leading _).
interval=self.interval + int(error * 0.5), | ||
prediction=self.prediction + int(error * 0.5), | ||
beats=self.beats) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done in less lines of code with a simple loop.
Maybe a stupid question, but why is the error always added and not subtracted as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since all three objects created here have different intervals and predictions the only loop-thingy coming to my mind right now would be something like that:
predictions = [self.prediction + error, self.prediction + error, self.prediction + int(error * 0.5)]
intervals = [self.interval, self.interval + error, self.interval + int(error * 0.5)]
for p, i in zip(predictions, intervals):
self.__class__(prediction=p, interval=i, ...)
Is that would you mean? I also want to test whether all three child-agents are really necessary to make the results better (they are used like that in the paper).
Regarding the error variable: It can also be a negative number (if the max event occurred before the prediction), so adding should work ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, something like this, but you could even simplify it to:
prediction_errors = [1, 1, 0.5]
interval_errors = [0, 1, 0.5]
for p, i in zip(prediction_errors, interval_errors):
self.__class__(prediction=self.prediction + int(p * error),
interval=self.interval + int(i * error), ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thx!
# penalize agent for not detecting the beat | ||
self.score -= (abs(error) / frames) * normalization * act | ||
# return new child agents | ||
return new_agents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be refactored a bit by moving common code outside the if/else clause and then decide later on whether new agents need to be created or not.
madmom/features/beats.py
Outdated
**kwargs) | ||
self.tempo_estimator = tempo_estimator | ||
self.agents = [] | ||
# TODO: Not sure if thats nice? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not 😏 . I think these values should be passed to the agents when instantiating them.
# induct agents after induction time has passed | ||
if self.counter == self.INDUCTION_TIME * self.fps: | ||
# create histogram of induction window | ||
histogram = self.tempo_estimator.interval_histogram(buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but shouldn't activation
and not the buffer be passed to the tempo estimator? In the latter case you compute the histogram for more than the last activation which is probably not what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, thats a good point and actually a larger topic ;)
Well, the concept for both offline and online is the same: Always use the first N seconds to estimate the tempo. Then introduce the agents (for offline at the beginning of the song, for online after those N seconds). From that point in time there is no more tempo estimation going on and the agents are themselves responsible to adapt to the changing speed of the music. That's why in the code the online tempo estimator is not used at all, only the offline tempo estimator is used once after those N seconds (Btw. thats also done in a similar way in all papers I found). That also means that for online the first N seconds don't have agents and therefore cannot produce beat predictions which probably leads to a worse result. For that reason in some papers they just ignore those first N induction seconds when evaluating - but I think that would be unfair to all the other tracking algorithms which are capable of producing predictions from the beginning.
One other problem we are facing here is that the online networks tend to give worse predictions in (approx.) the first 5 seconds of a song. And that is exactly the section used to estimate the base-tempi for the agents. So I already thought about how agents could get their tempo updated during runtime but it's difficult because each agent has its own tempo already from the beginning (since we are using multiple tempo estimations) and updates it on the fly.
Also connected to that issue is the handling of abrupt tempo changes. Some solutions I found use a kind of "overseer" module which is only looking for sudden changes and starts another agent induction when necessary. I don't know yet whether we will need something like this since it also introduces additional complexity.
So yeah, right now for offline I get 90% and for online 80% F-Score for the Ballroom set (didn't play around with the parameters yet). If we have some more ideas I will try them out ofc. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tempo estimation at the very beginning is indeed a problem. Relying on those estimations is thus probably not a very good idea. How about simply starting a fixed number of agents at the beginning of the song and initialise them with tempi distributed uniformly on a logarithmic tempo scale? Soon afterwards the "bad" agents will be killed/die anyways, right?
For example for downbeat tracking I limit the number of modelled tempi to 60 in order to prevent an excessively bloated state space. 60 tempi is enough to cover the whole tempo range from 40..250bpm. Please note, that the downbeat tracker is not able to use any other tempi.
I guess the agents are able to adapt their tempo quite quickly, so maybe this is a good workaround/extension and would render the need for tempo estimation completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah interesting idea! I will test that (maybe also for offline) ;)
# return beat(s) | ||
return np.array(beats_) / self.fps | ||
|
||
def induct_agents(self, activations, interval, start=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably make start a required parameter. Guess it is less error-prone then.
madmom/features/beats.py
Outdated
# if no maxima could be found just use max value | ||
if len(maxima) == 0: | ||
maxima = np.array([activations.argmax()]) | ||
# pick N maxima indizes where activation is highest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
492a4d5
to
3776eec
Compare
3776eec
to
b4d5381
Compare
b2eaafa
to
dcdbb8b
Compare
dcdbb8b
to
1c640e5
Compare
1c640e5
to
be97b36
Compare
This PR adds another Beat Tracker which is based on Multiple Agents. It already works for offline and online but some improvements are necessary. Ideas and suggestions are highly welcome :)
Remaining TODOs before this pull request can be merged