24.7.09

Unscented Kalman Filter for Brain-Machine Interfaces

Brain machine interfaces (BMI) have applications that span multiple fields. The ability to accurately decode the synaptic firings of neurons into specific movement from an artificial actuator is the ultimate translation job. An efficient, accurate, and reliable BMI relies on a streamless integration of multiple steps, from optimized neuron-probe interaction to an actuator that can transform the signals recorded into a physiologically meaningful movement. The first example BMIs that leaped to my mind is work on neurally-controlled robotic limb prosthesis, such as those worked on by Professor Yoky Matsuoka from University of Washington.

figure 1: Prof. Matsuoka demonstrating one of her robotic hands

However, the article I will be discussing today focuses on a different aspect of BMIs, developing a computational model to decode neuronal activity. Modeling a signal with the highest signal to noise (SNR) ratio is a common dynamic systems and controls problem, so one might assume that a variety of relatively straightforward filters could be used to 'clean up' so to speak. However, the challenge of some BMIs is that not only do you want the highest SNR possible, but also having a minimal computation time is crucial to real-time actuation of any device. A robotic prosthesis may have the most accurate recreation of brain activity to work from, but it is next to useless if the computation time is unfeasible. My e-journal club is shown below:

Introduction

Unscented Kalman Filter for Brain-Machine Interfaces by Z. Li et al (PLoS ONE, 2009) attempts to develop a model that packs the highest SNR per computational punch. The researchers compare the unscented Kalman Filter, a quadratic model with history state variables to the standard Kalman and Wiener Filters. They tested filter performance by using BMI environments controlled by rhesus monkeys. Performance was assessed by comparing SNRs and the correlation coefficient of the positional difference of the BMI controlled cursor and the target position.

Definitions

Two standard filters used for comparison are, once again, Kalman and Wiener filters. Kalman filters are recursive filters that approximate a linear dynamic system state. By using information from the n-1 state, a Kalman filter attempts to make a prediction of the n state. The advantage is that it is possible to make a prediction on a future state of the system without knowing much about it. This however might not be the optimal prediction. For those who are statisticians, Kalman filter application to a dynamic system is similar to least squares application to a regression. Since it is a linear model that needs so little information, this filter is relatively fast.

Wiener filters can also be used on linear dynamic systems like Kalman filters. However a Wiener filter actually requires some understanding of the target signal and noise. Using a Wiener filter is analogous to choosing a best-fit statistical curve for data you already know.

The 10th order unscented Kalman filter is a quadratic model, unlike the Weiner and Kalman filters. It also has a 'history' of past states that help influence it's performance. This history is larger than a 1st order Kalman filter's reliance on the n-1 state only. However, this necessitates 'training' of the model prior to use to achieve optimal performance.

Methods

Li et al trained two rhesus macaques to use hand-held joysticks to track visual targets with a computer cursor. The monkeys were implanted with multielectrode arrays in several, but not identical, cortical areas. Monkey C's implantations were in: M1, PMd, PP, SMA in the right hemisphere. Monkey G's implantations were bilaterally in: M1, primary somatosensory cortex(S1) , and PMd. An average number of neurons recorded was 142 (range: 146).

figure 2: Most ofthe brain regions used in this study

The unscented Kalman-filter was applied to data collected from the monkeys from 16 sessions, ranging from 9 to 25 minutes. During these sessions , the monkeys manually moved the joystick to shift an onscreen cursor until they hit a target position. There were six online, closed-loop BMI experiments with the same monkeys. The monkeys were required to hold the joystick as a sign of active participation during these experiments, data during any other behavior was discarded. Monkeys C and G received fruit-juice snacks for their participation.

Results

The 10th order unscented Kalman filter did demonstrate statistically significant improved performance over the Wiener and Kalman filters, as well as other methods. The 10th order filter outperformed both when applied to previously recorded data as well as during the live BMI sessions. The SNR and correlation coefficient between cursor and target position were higher on average for the proposed filter than for the Wiener and Kalman filters. However, the differences during on-line performance were more pronounced than during off-line reconstruction.

Discussion

I found this paper to be really interesting, and challenging. Unlike a boyscout I was not prepared, not a single of my dynamic systems and control books to be found. They're probably on the 'used' shelf back in university going 'haha sucker, you probably wish you hadn't sold us back now!!!!' However, I was a little surprised with the comparisons they used. While the 1st Order Kalman and Wieiner filters may be 'standard', I didn't find it in any way surprising that a quadratic model with memory states would outperform them for such a complex task. I would have liked to see a comparison between the 10th order unscented Kalman filter and another quadratic model, or at something similar. I didn't see a clear description of computational cost for this method, and how that would increase with complexity of the task. A BMI capable to 'catching' a target is obviously valuable for someone who is paraplegic, this can open an entire world to them via computer screen. However, one would hope to use BMIs for intricate physical tasks, such as operating a robotic prosthesis. I imagine the amount of training required for the model 'history' in order to have passable execution of even mundane daily (physical) tasks would be enormous. I also wondered how well such a model can handle a sharp change in a 'periodic' behavior, which is not unusual in a real-life context. Unfortunately, I did not have time to really go through the modeling equations to answer this convincingly for myself, so it remains an open question.


This is my submission for the title of Most Nefarious against Toaster. It is late, but I pinky swear that yesterday I had literally no time to read, let alone analyze, this paper. I have also not looked at Toaster's post and I'm kind of scared what his brilliant Nordic brain has come up with. Shit.

You have 72hr (actually less than that, whatever) to vote between us. You have 100 pts to divy up however you like (as long as it adds up!). Get crackin'!

**Point allocation clarification: Post to points you are awarding to each blogger on their respective post. Ex: If you had decided I clearly defeated Toaster (obviously) you would post 100 pts in my comments and 0 pts in his. We won't come after you if you don't vote our way. Probably.

figure 3: Everyone knows a pinky-swear is serious business if it's a Mad Scientist

17 comments:

Ambivalent Academic said...

Where do we vote?

Hermitage said...

@AA: In our comments. If you had decided to split your points 50/50 you would post 50pts in my comments and 50pts in his. Ezy pzy!

Ambivalent Academic said...

65

Cath@VWXYNot? said...

Cross-posted and modified from my comment on Toaster's blog:

Hmm, this is really tough because the two summaries are so different. Toaster's is more of a lay summary while yours is more technical and would possibly be more useful to someone who is actually in the field. (Is anyone else giggling at the Wiener Filter? Just me?) In the context of a journal club attended by people in similar fields, I think your approach might be more appropriate, BUT I have absolutely no background in this area and understood Toaster's summary better.

30

Sorry :(

Toaster said...

I have also not looked at Toaster's post and I'm kind of scared what his brilliant Nordic brain has come up with. Shit.



Mwah-ha-ha-ha-ha-HA-HA-HAHAHA!!! Toaster appreciates the compliment.

I'll quibble with you over the paper once voting is closed.

Toaster Sunshine said...

Also, why didn't you Research Blog your post?

Hermitage said...

@Toaster: What is this Research Blogs shenanigans if which you speak?

Hermitage said...

@VWXYNot?: That's right! You should be scared *evil cackle*

Toaster Sunshine said...

Researchblogging.org aggregates blog posts that talk about original research from your RSS feed. It then lists them as they come in on their homepage, and a Twitter feed, to get those posts out to a wider audience than you might otherwise have. It's what that little green check mark is at the top of all my really sciencey posts, and it generates tracking code for the citations to see where what goes.

coturnix said...

You should apply for ResearchBlogging.org and, once accepted, republish this post with the icon code and citation code you get from their site. Once you do that, your post is more visible, it is considered more trustworthy than usual, and it is in the running for the PLoS ONE Blog Pick Of The Month award: http://everyone.plos.org/tag/blog-pick-of-the-month/

PhizzleDizzle said...

ok, so here i am. brain machine interfaces indeed!

so....i am almost afraid to vote the way i am going to vote because i imagine hermie's physical ass-kicking abilities exceed toaster's. and because chick on chick fighting is vicious :).

But, I am going with 40.

Because I found Toaster's storyline more readable, in the sense that in the end, I could summarize the paper in one sentence better than after this one. The point that I was missing (which I realized afterwards is in this entry but not spoon fed to me), is that yes, of course the quadratic model is going to be more accurate than linear models, but the reason the paper is contributory is because it bridges the simplicity of computation from the linear models and the complicated SSNR model while getting significantly better results than the linear model.

However, here is also an interesting experiment: I wonder if part of the reason I synthesized Toaster's post better was because I read it 2nd.

AA, Cath - whose did you read first? Interesting side study.

erk said...

70

mainly for the sensible critics at the end.

Cath@VWXYNot? said...

I read this one first, and then Toaster's. But I realised halfway through reading Toaster's post that maybe it seemed that I understood it more because I'd already read this one, and that that wasn't fair. So I came back and re-read this one, and then Toaster's, so I read each one twice.

Toaster Sunshine said...

@Phizzle:
Toaster does not kick ass wantonly, he only rains his fury down upon the heads of those who have truly earned it, and voting against him would not have been a sufficient aggravation for him to do so. Hermie, on the other hand, may be a wee bit less stable...ergo a comparison of our "ass-kicking capacities" would suffer from great disparities in sample sizes.

Zheng Li said...

Congratulations on the contest win and thank you for reading and writing about our work. I was delighted that people outside my field took an interest, and who doesn't like free publicity? Your discussion had so many good questions that I wanted to share some of my thoughts.

>However, I was a little surprised with the comparisons they used. While the 1st Order Kalman and Wiener filters may be 'standard', I didn't find it in any way surprising that a quadratic model with memory states would outperform them for such a complex task. I would have liked to see a comparison between the 10th order unscented Kalman filter and another quadratic model, or at something similar.

You are correct that the improvement was not surprising. I suppose the surprising part is that such models not been used earlier for BMI prediction. There has been mention of similar quadratic models in the literature, but only for modeling data, not for prediction. We built upon that work and put it together with an efficient filter capable of non-linear operation for this paper.

I have compared, though anecdotally and unpublished, the quadratic model presented in this paper (with the square root term) with a quadratic model that comprises of each feature, its square, and all pairs of products (I'll call it the vanilla model). The vanilla model was slightly more accurate, though the difference may or may not be significant. We chose to publish with the model with square root terms because it has a firm basis in prior work in the field. I also looked at 3rd order models (similar to the 2nd order vanilla model in form). Again, anecdotally, they offer another slight improvement, but the advantage seems to disappear when combined with using history in the model. In other words, the higher order terms may be capturing some of the same information as the history. 4th order and higher models did not seem to improve at all. Of course, take this all as non-rigorous commentary.

>I didn't see a clear description of computational cost for this method, and how that would increase with complexity of the task.

At the end of the results section is a brief report of the actual execution speeds of our MATLAB and C++ implementation. The discussion of the computational complexity of the filters did not make the final edit, due to the journal's probable audience. The two most time consuming operations in the UKF is the computation of the sigma points and the inversion of the predicted observation covariance matrix, both of which are performed using Cholesky decomposition (in our implementation). The standard Cholesky algorithms are O(n^3), where n is the size of the state space (history taps times feature dimensions) in the former and the number of observations (neurons) in the latter. In our data, the latter was typically larger, up to 240. Though, for a realistic robotic arm with something like 20 degrees of freedom and, say, 10 history taps, the n for the state space is comparable.

With regards to computational increase due to the complexity of the task, my guess is that (within the UKF framework presented in this paper) a more complex task would require more history to model, and so a filter with a larger state would perform better, increasing the cost of the sigma point calculation. However, if we believe the neural tuning model's maximum time offset is independent of the behavioral task, a modification can be made to divorce the state space needed for the transition (movement) model with the state space used by the observation (neural tuning) model, avoiding this increase.

Zheng Li said...

>A BMI capable to 'catching' a target is obviously valuable for someone who is paraplegic, this can open an entire world to them via computer screen. However, one would hope to use BMIs for intricate physical tasks, such as operating a robotic prosthesis. I imagine the amount of training required for the model 'history' in order to have passable execution of even mundane daily (physical) tasks would be enormous. I also wondered how well such a model can handle a sharp change in a 'periodic' behavior, which is not unusual in a real-life context.

A mult-joint prosthesis would require many more dimensions in the state space. The modeling of the movement would also need to be far more sophisticated. This paper describes how to deal with one stereotypical task. In the future, I hope to solve the problem of learning and switching between options in a library of such tasks, giving the prosthesis user a repertoire of practiced actions, not unlike the "muscle memory" people have with their biological limbs. This switching between stereotypical tasks is one example of a sharp change in a periodic behavior. More generally, the BMI can fall back on a "generic" model of movement if it can’t understand what the user is trying to do.

I'm also interested in, and working on, ways to minimize training on the part of the prosthesis user, by moving that training burden to the algorithms. Hopefully we can avoid lengthy training to obtain passable execution. Another approach is to off-load some of the control of the prosthesis to an artificial intelligence and have the user issue high-level commands, like “pick up the cup”.

Cheers

linear actuator said...

Actuators are really needed for complex tasks in the high power market.If the load requires accurate positioning, High power actuators has the advantage among others...