Category Archives: Machine Learning

Using Weka’s Data-Mining and Machine-Learning JAVA API algorithms from within C#

Sometimes you dont realy want to create ML algorithms from scratch. you just want to use them! unfortunantly not all of them are written for specific languages, for example, you can get an Information Gain / ID3 algorithm written in C#. However, you cant use it with numeric features in a multi-class problem. so the next best thing is to use weka’s IG which is written in Java (like all of weka), and if you dig enough on the weka wiki you will find that it is possible to use weka.jar from with in C#.
I have recently updated the wiki to reflect the proper workflow, and i managed to use a meta-classfier with SVM (SMO) using IG as the evaluator and Ranker as the search algorithm, in a 10 cross fold validation scenario.

details are here: http://weka.wikispaces.com/IKVM+with+Weka+tutorial

C# support for the Shogun Toolbox (SVM)

Part of my brain research and machine learning endeavors is to use SVM for the purpose of training and/or testing in real time. given the huge amount of temporal brain data that we acquire in real-time,  there is a need for a large scale SVM algorithm. usually SVM can handle thousands of samples with thousands of features, and with a cutting edge PC  you may be able to scale it up to ~10  times as much. but using shogun you can process more data, the shogun team reports up to 10 millions training samples and up to 7 billion test samples.

A few months back shogun did not support C# . Daniel Korn and I made some advances to create something called an interface file for shogun, that will make it work with C#.  half way through the process one of the shogun participants in google summer of code took control of the interface programming and finished it. with this new release there is a C# dll that everyone can use, and in it there is support for many data types.  i still need to figure out if it will support sparse lists/dictionaries so that the memory footprint will be small enough for any algorithm. with the new dll there are many C# examples files that we have created to test out the new interface/dll.

 

shogun 1.1.0 can be downloaded here.

our credits appear in the NEWS section of the release and on the front page.

Features:
  • New dimensionality reduction algorithms: Diffusion Maps, Kernel Locally Linear Embedding, Kernel Local Tangent Space Alignment, Linear Local Tangent Space Alignment, Neighborhood Preserving embedding, Locality Preserving Projections.
  • Various performance improvements for dimensionality reduction methods (BLAS, alignment formulation of the LLE, ..)
  • Automatical k determination mode for Locally Linear Embedding dimension reduction method based on reconstruction error.
  • ARPACK and SUPERLU integration.
  • Introduce the concept of Converters that can embed (arbitrary) feature types into different feature types.
  • LibSVM is now pthread-parallelized.
  • Create modshogun.dll for csharp.
  • Various new c# examples (thanks Daniel Korn and Ori Cohen).
  • Dimensionality reduction examples application is introduced

 

 

 

 

Controlling an avatar by thought using fMRI

Cohen O., 1, 3 


Mendelsohn A., 2
Drai D., 1
Malach R., 2
Friedman D. 1
1 The Interdisciplinary Center Herzliya,
2 Weizmann Institute of Science
3 Bar Ilan University.

We are carrying out a number of studies whereby subjects control the movement of an avatar by thought alone, using real-time fMRI. At the first stage of our study five three subjects were able to control right- and left-hand movements of an avatar using motor imagery without difficulty and with high accuracy. Therefore, we are running a study involving three classes: right-hand imagery to turn right, left-hand imagery to turn left, and feet imagery to move forward.
In the first part of the experiment we instruct the subject to imagine either left-hand, right-hand, or walking movements upon a predefined cue, and manually define regions of interest (ROIs), using a GLM analysis, by contrasting hand-motor and primary leg-motor regions. In the second phase (baseline) we compute the mean and standard deviation for each ROI. Finally, the subjects are instructed to move the avatar, according to auditory cues, by using motor imagery. At this preliminary stage we use a simple classification scheme; at each TR the system calculates the z-score value of each ROI relative to the mean values obtained during the baseline, and chooses the highest z-score value among the ROIs. The classification result is fed into the virtual reality system to move the avatar accordingly.
Our preliminary results indicate that subjects can learn to control an avatar using motor imagery in better-than-chance levels with very little training. Future work will include support vector machine (SVM) classifiers, with methods for feature reduction, in real time. Eventually we aim at allowing subjects to perform simple tasks in a virtual environment.

svmLight, a Python Script that Compute the weight vector of linear SVM based on the model file

While working on my Thesis i had to get the features’ weights from the SVM Model. Thorsten Joachims published a perl script but i was using Python, i rewrote his script in python and he had graciously put a Download Link on his website.

You can find the original Perl script here: http://www.cs.cornell.edu/people/tj/svm_light/svm_light_faq.html
And the Python Script Here:  http://www.cs.cornell.edu/people/tj/svm_light/svm2weight.py.txt

Using this script will get you all the features’ weights. this is incredibly useful later on,

you can systematically eliminate features, as follows:
  • After training on all current features, select K% with highest SVM weight and K% with lowest (most negative) SVM weights
  • Iterate

you will notice that you can get higher prediction result with only a subset of your features.

* if you use this script in your publication or commercial product please credit me

 

# Compute the weight vector of linear SVM based on the model file
# Original Perl Author: Thorsten Joachims (thorsten@joachims.org)
# Python Version: Ori Cohen (orioric@gmail.com)
# Call: python svm2weights.py svm_model

import sys
from operator import itemgetter

try:
    import psyco
    psyco.full()
except ImportError:
    print 'Psyco not installed, the program will just run slower'

def sortbyvalue(d,reverse=True):
    ''' proposed in PEP 265, using  the itemgetter this function sorts a dictionary'''
    return sorted(d.iteritems(), key=itemgetter(1), reverse=True)

def sortbykey(d,reverse=True):
    ''' proposed in PEP 265, using  the itemgetter this function sorts a dictionary'''
    return sorted(d.iteritems(), key=itemgetter(0), reverse=False)

def get_file():
    """
    Tries to extract a filename from the command line.  If none is present, it
    assumes file to be svm_model (default svmLight output).  If the file
    exists, it returns it, otherwise it prints an error message and ends
    execution.
    """
    # Get the name of the data file and load it into
    if len(sys.argv) < 2:
        # assume file to be svm_model (default svmLight output)
        print "Assuming file as svm_model"
        filename = 'svm_model'
        #filename = sys.stdin.readline().strip()
    else:
        filename = sys.argv[1]

    try:
        f = open(filename, "r")
    except IOError:
        print "Error: The file '%s' was not found on this system." % filename
        sys.exit(0)

    return f

if __name__ == "__main__":
    f = get_file()
    i=0
    lines = f.readlines()
    printOutput = True
    w = {}
    for line in lines:
        if i>10:
            features = line[:line.find('#')-1]
            comments = line[line.find('#'):]
            alpha = features[:features.find(' ')]
            feat = features[features.find(' ')+1:]
            for p in feat.split(' '): # Changed the code here.
                a,v = p.split(':')
                if not (int(a) in w):
                    w[int(a)] = 0
            for p in feat.split(' '):
                a,v = p.split(':')
                w[int(a)] +=float(alpha)*float(v)
        elif i==1:
            if line.find('0')==-1:
                print 'Not linear Kernel!\n'
                printOutput = False
                break
        elif i==10:
            if line.find('threshold b')==-1:
                print "Parsing error!\n"
                printOutput = False
                break

        i+=1
    f.close()

    #if you need to sort the features by value and not by feature ID then use this line intead:
    #ws = sortbyvalue(w) 

    ws = sortbykey(w)
    if printOutput == True:
        for (i,j) in ws:
            print i,':',j
            i+=1