In [8]:
from nussl import jupyter_utils, AudioSignal
from wand.image import Image as WImage
import glob
import os

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:65% !important; font-size:1em;}</style>"))

def _embed_audio(path):
    a = AudioSignal(path)
    jupyter_utils.embed_audio(a, display=True)

from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
Out[8]:

Figure 4.3

Performance of the best bootstrapped network ($\alpha=2$) vs performance of direction of arrival primitive used to train that network on the test set. Every dot represents a mixture in the test set. Points on the red line have equal performance by both approaches. Points to the right of the line mean the primitive out-performed the bootstrapped network. Points to the left of the line mean the bootstrapped network out-performed the primitive.

In [6]:
img = WImage(filename='../bootstrap_vs_primitive_doa.png')
img.transform(resize='x500')
img
Out[6]:

Below is an example of the bootstrapped model, the direction of arrival primitive, and a model that is trained from ground truth all run on the same mixture.

In [13]:
print('Mixture')
_embed_audio('bootstrap_doa/mix.mp3')
print('\n')

print('Separation via direction of arrival')
_embed_audio('bootstrap_doa/spcl_0.mp3')
_embed_audio('bootstrap_doa/spcl_1.mp3')
print('\n')

print('Separation via bootstrapped model')
_embed_audio('bootstrap_doa/bs_dpcl_0.mp3')
_embed_audio('bootstrap_doa/bs_dpcl_1.mp3')
print('\n')

print('Separation via model trained from ground truth')
_embed_audio('bootstrap_doa/gt_dpcl_1.mp3')
_embed_audio('bootstrap_doa/gt_dpcl_0.mp3')
print('\n')
Mixture

Separation via direction of arrival

Separation via bootstrapped model

Separation via model trained from ground truth

Figure 4.7

Performance of bootstrapping compared to other methods. The bootstrapped model significantly out-performs primitive clustering, which was used to train it, but falls short of the performance of the ground truth model in terms of SDR and SIR.

In [14]:
WImage(filename='../bootstrap_pcl_comparison.pdf')
Out[14]:

Below is an example separation via the bootstrapped model, primitive clustering (which it was trained from), and a model that is trained from ground truth all on the same mixture. All of these are done with binary masking to better hear the effect of separation and make for easier comparison between the methods.

In [15]:
print('Mixture')
_embed_audio('bootstrap_pcl/mix.mp3')
print('\n')

print('Separation via primitive clustering (PCL)')
_embed_audio('bootstrap_pcl/pcl_bg.mp3')
_embed_audio('bootstrap_pcl/pcl_fg.mp3')
print('\n')

print('Separation via bootstrapped model (B:YT+TR)')
_embed_audio('bootstrap_pcl/bs_dpcl_bg.mp3')
_embed_audio('bootstrap_pcl/bs_dpcl_fg.mp3')
print('\n')

print('Separation via model trained from ground truth (GT)')
_embed_audio('bootstrap_pcl/gt_dpcl_bg.mp3')
_embed_audio('bootstrap_pcl/gt_dpcl_fg.mp3')
print('\n')
Mixture

Separation via primitive clustering (PCL)

Separation via bootstrapped model (B:YT+TR)

Separation via model trained from ground truth (GT)