-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow selection of channel when loading / creating audio as mono #409
Allow selection of channel when loading / creating audio as mono #409
Conversation
* Allow `remix` to choose a channel, and just copy results from that channel if required * `ffmpeg` version - use `pan` audio filter (see [section in manual](https://trac.ffmpeg.org/wiki/AudioChannelManipulation)). * Propagate this parameter through all the audio loading layers * Add tests around all aspects of this This fixes CPJKU#211.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, very useful addition!
Besides the specific details, it would be nice if you could reformat the calling arguments to fill all lines up to 79 characters and then add a line break (I have not checked if some arguments would fit the line before, I just looks like this might be the case).
@@ -510,6 +516,9 @@ class Signal(np.ndarray): | |||
num_channels : int, optional | |||
Reduce or expand the signal to `num_channels` channels, or 'None' | |||
to return the signal with its original channels. | |||
channel : int, optional | |||
When reducing a signal to `num_channels` of 1, use this channel, | |||
or 'None' to return the average across all channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the nicest and most descriptive docstring of channel
in the whole module. Would be nice if you replace the others with this one (and adapt if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I liked this one. IIRC I used a this different description because a lot of them just pass through, so don't really do the logic themselves, but I can propagate this through everywhere
Thanks!
Yes, I tried both. Personally I have a slight preference for logical grouping (number of channels and channel) when there are lots of arguments, but I'll change this back where it fits |
It would be nice if you could add the same channel selection mechanism to I have this code lying around somewhere, not sure how much changes happened to that class in the meantime... class Stream(object):
"""
A Stream handles live (i.e. online, real-time) audio input via PyAudio.
Parameters
----------
sample_rate : int
Sample rate of the signal.
num_channels : int, optional
Number of channels. Per default, only the first input channel is used.
Additionally, the following literals are supported:
- 'left' : use only the left (i.e. first) channel
- 'right' : use only the right (i.e. second) channel
- 'mono' : down-mix all input channels to mono
dtype : numpy dtype, optional
Data type for the signal.
frame_size : int, optional
Size of one frame [samples].
hop_size : int, optional
Progress `hop_size` samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the
given `hop_size` value (the resulting `hop_size` must be an integer).
queue_size : int
Size of the FIFO (first in first out) queue. If the queue is full and
new audio samples arrive, the oldest item in the queue will be dropped.
Notes
-----
Stream is implemented as an iterable which operates on a blocking queue.
The queue is filled by PyAudio and iterating over the stream returns the
live audio signal frame by frame. It blocks when no new data is available.
"""
QUEUE_SIZE = 1
def __init__(self, sample_rate=SAMPLE_RATE, num_channels=NUM_CHANNELS,
dtype=np.float32, frame_size=FRAME_SIZE, hop_size=HOP_SIZE,
fps=FPS, queue_size=QUEUE_SIZE, **kwargs):
# import PyAudio here and not at the module level
import pyaudio
try:
from Queue import Queue # Python 2
except ImportError:
from queue import Queue # Python 3
# set attributes
self.sample_rate = sample_rate
# channel indicates which channel to select if there are more than one
self.channel = None
if num_channels in (None, 1):
self.num_channels = 1
elif num_channels == 'left':
self.num_channels = 2
# get the first channel
self.channel = 0
elif num_channels == 'right':
self.num_channels = 2
# get the second channel
self.channel = 1
# wanted dtype (PyAudio is set to float32 always)
self.dtype = dtype
if frame_size:
self.frame_size = int(frame_size)
if fps:
# use fps instead of hop_size
hop_size = self.sample_rate / float(fps)
if int(hop_size) != hop_size:
raise ValueError(
'only integer `hop_size` supported, not %s' % hop_size)
self.hop_size = int(hop_size)
# set up a queue
self.queue_size = queue_size
self.queue = Queue(maxsize=self.queue_size)
# init PyAudio for recording
self.pa = pyaudio.PyAudio()
self.stream = self.pa.open(rate=self.sample_rate,
channels=self.num_channels,
format=pyaudio.paFloat32, input=True,
frames_per_buffer=self.hop_size,
stream_callback=self.callback, start=True)
# create a buffer
self.buffer = BufferProcessor(self.frame_size)
# frame index counter
self.frame_idx = 0
# PyAudio flags
self.paComplete = pyaudio.paComplete
self.paContinue = pyaudio.paContinue
def __iter__(self):
return self
def __next__(self):
return self.queue.get()
next = __next__
def callback(self, data, *args):
"""
Callback function called by PyAudio every time a new audio chunk is
ready.
Parameters
----------
data : str
Audio data represented as a string.
Returns
-------
data : numpy array
Audio data as numpy array.
flag : PortAudio callback flag
Indicate whether to continue the stream.
"""
# get the data from the stream
data = np.fromstring(data, 'float32').astype(self.dtype, copy=False)
num_channels = 1
# multiple channels are interleaved, reshape
if self.num_channels > 1:
np.reshape(data, (-1, self.num_channels))
num_channels = self.num_channels
# extract the wanted channel
if self.channel is not None:
data = data[:, self.channel]
num_channels = 1
# buffer the data
data = self.buffer(data)
# wrap it as a Signal (including the start position)
# TODO: check float/int hop size
start = (self.frame_idx * float(self.hop_size) / self.sample_rate)
signal = Signal(data, sample_rate=self.sample_rate, dtype=self.dtype,
num_channels=num_channels, start=start)
# if the queue if full erase the first item from the queue
if self.queue.qsize() >= self.queue_size:
# TODO: how to warn if we missed a frame? Raise an error,
# or is a simple warning enough? Maybe a counter...
import warnings
warnings.warn('dropping frame...')
self.queue.get()
# put the signal into the queue
self.queue.put(signal, block=False)
# increment the frame index
self.frame_idx += 1
# return data and indicate that the PyAudio stream should continue
# TODO: returning the data is pointless, since this callback is only
# used for recording, not for playback
return data, self.paContinue
def is_running(self):
return self.stream.is_active()
def close(self):
self.stream.close()
# TODO: is this the correct place to terminate PyAudio?
self.pa.terminate() I can also add this if you prefer me to do that. |
I totally agree, but I'd prefer to have it consistent throughout the codebase. |
Your comment above made me rethink the whole channel selection mechanism. How about deprecating
Whenever a single channel is selected, The last option is not of great usage as of now, since STFT can only operate on mono signals (see #371), but it would be nice to be able to have it this way in the future. |
Interesting, I've not used |
AIUI these are slightly convention-driven
Actually I thought about something like this, but started to scope creep way out of the original #211. But whilst the discussion is going I guess, some thoughts I'd had 😄 ... If you have multiple channels to select, you eventually need to start mapping source to destination (e.g. downmixing 5.1, ffmpeg itself can do some crazy stuff here) but it's ugly and perhaps other libraries do it better. But if we concentrated on the simple cases, yes... Explicit channels
Mono downmix
Hmm yes |
Ok, I'd vote to keep it simple and stupid for now. No fancy channel selection mechanisms, no multi-channel support or whatsoever. Literals can be added in a separate PR, as can be fancy downmixing options (i.e. which channels to select for downmixing). But with the design as it is now, all of these options are possible in future without breaking anything. |
* Update docstrings to indicate behaviour when omitting `num_channels` * Make ffmpeg versions support this same behaviour
@superbock Think that's everything... |
Very nice work, thanks again! |
Changes proposed in this pull request
remix
to choose a channel, and just copy results from that channel if requiredffmpeg
version - usepan
audio filter (see section in manual).This fixes #211.