Part of my brain research and machine learning endeavors is to use SVM for the purpose of training and/or testing in real time. given the huge amount of temporal brain data that we acquire in real-time, there is a need for a large scale SVM algorithm. usually SVM can handle thousands of samples with thousands of features, and with a cutting edge PC you may be able to scale it up to ~10 times as much. but using shogun you can process more data, the shogun team reports up to 10 millions training samples and up to 7 billion test samples.
A few months back shogun did not support C# . Daniel Korn and I made some advances to create something called an interface file for shogun, that will make it work with C#. half way through the process one of the shogun participants in google summer of code took control of the interface programming and finished it. with this new release there is a C# dll that everyone can use, and in it there is support for many data types. i still need to figure out if it will support sparse lists/dictionaries so that the memory footprint will be small enough for any algorithm. with the new dll there are many C# examples files that we have created to test out the new interface/dll.
our credits appear in the NEWS section of the release and on the front page.
Features:
New dimensionality reduction algorithms: Diffusion Maps, Kernel Locally Linear Embedding, Kernel Local Tangent Space Alignment, Linear Local Tangent Space Alignment, Neighborhood Preserving embedding, Locality Preserving Projections.
Various performance improvements for dimensionality reduction methods (BLAS, alignment formulation of the LLE, ..)
Automatical k determination mode for Locally Linear Embedding dimension reduction method based on reconstruction error.
ARPACK and SUPERLU integration.
Introduce the concept of Converters that can embed (arbitrary) feature types into different feature types.
LibSVM is now pthread-parallelized.
Create modshogun.dll for csharp.
Various new c# examples (thanks Daniel Korn and Ori Cohen).
Dimensionality reduction examples application is introduced
Cohen O., 1, 3
Mendelsohn A., 2
Drai D., 1
Malach R., 2
Friedman D. 1
1 The Interdisciplinary Center Herzliya,
2 Weizmann Institute of Science
3 Bar Ilan University.
We are carrying out a number of studies whereby subjects control the movement of an avatar by thought alone, using real-time fMRI. At the first stage of our study five three subjects were able to control right- and left-hand movements of an avatar using motor imagery without difficulty and with high accuracy. Therefore, we are running a study involving three classes: right-hand imagery to turn right, left-hand imagery to turn left, and feet imagery to move forward.
In the first part of the experiment we instruct the subject to imagine either left-hand, right-hand, or walking movements upon a predefined cue, and manually define regions of interest (ROIs), using a GLM analysis, by contrasting hand-motor and primary leg-motor regions. In the second phase (baseline) we compute the mean and standard deviation for each ROI. Finally, the subjects are instructed to move the avatar, according to auditory cues, by using motor imagery. At this preliminary stage we use a simple classification scheme; at each TR the system calculates the z-score value of each ROI relative to the mean values obtained during the baseline, and chooses the highest z-score value among the ROIs. The classification result is fed into the virtual reality system to move the avatar accordingly.
Our preliminary results indicate that subjects can learn to control an avatar using motor imagery in better-than-chance levels with very little training. Future work will include support vector machine (SVM) classifiers, with methods for feature reduction, in real time. Eventually we aim at allowing subjects to perform simple tasks in a virtual environment.
about 15 years ago i wrote an unpacker for dune 2 .pak files (in pascal). i wanted to have the .voc (creative voice files).i decided to take a couple of hours to figure out (again) the format for these pak files and write a python script to unpack them. its probably useless these days but maybe someone can use it
importstructimportosdef unPAK(filename):
print"unPAKING:", filename
fnlist =[]
filesize =os.path.getsize(filename)
f =open(filename,"rb")
count =0;
size = f.read(4)
beginData =struct.unpack('i', size)[0]print beginData
data = f.read(beginData-4)
data =filter(None, data.split('\x00'))print data
fnlist.append((data[0],beginData))for i inrange(1,len(data)):
if((i % 2)==1):
iflen(data[i])==1:
print i, data[i]
pos =struct.unpack('i',data[i]+"\x00\x00\x00")eliflen(data[i])==2:
print i, data[i]
pos =struct.unpack('i',data[i]+"\x00\x00")eliflen(data[i])==3:
print i, data[i]
pos =struct.unpack('i',data[i]+"\x00")
fnlist.append((data[i+1],pos[0]))print fnlist
for i inrange(1,len(fnlist)):
total =int(fnlist[i][1]) - int(fnlist[i-1][1])print"saving file: ",fnlist[i-1][0],"total bytes", total
fdata = f.read(total)
f2 =open(fnlist[i-1][0],"w")
f2.write(fdata)
f2.close()#last file.print filesize
total = filesize - int(fnlist[i][1])print"saving file: ",fnlist[i][0],"total bytes", total
f2 =open(fnlist[i][0],"w")
f2.write(f.read(total))
f2.close()print fnlist
f.close()
unPAK("DUNE.PAK")
unPAK("ENGLISH.PAK")
unPAK("FINALE.PAK")
unPAK("HARK.PAK")
unPAK("HERC.PAK")
unPAK("INTRO.PAK")
unPAK("INTROVOC.PAK")
unPAK("MENTAT.PAK")
unPAK("MERC.PAK")
unPAK("ORDOS.PAK")
unPAK("SCENARIO.PAK")
unPAK("SOUND.PAK")
unPAK("VOC.PAK")#unPAK("ATRE.PAK")#unPAK("XTRE.PAK")
A year ago i was looking for a job, Facebook was a valid choice. but before they would contact you, you would have to solve their puzzles. These puzzles have various difficulties, and i managed to solve quite a few, i am publishing some of my solutions here so that people can understand a little bit about thrift.
the first one is Simon says (the easiest) and it is only there to get you going with the networking protocol of thrift.
#!/usr/bin/env pythonimportsysfromrandomimport randrange
sys.path.append('../gen-py')import SimonSays
from ttypes import *
from thrift.transportimport TSocket
from thrift.transportimport TTransport
from thrift.protocolimport TBinaryProtocol
# Make socket
transport = TSocket.TSocket('thriftpuzzle.facebook.com',9030)#transport = TSocket.TSocket('localhost', 9090)# Buffering is critical. Raw sockets are very slow
transport = TTransport.TBufferedTransport(transport)# Wrap in a protocol
protocol = TBinaryProtocol.TBinaryProtocol(transport)# Create a client to use the protocol encoder
client = SimonSays.Client(protocol)# Connect!
transport.open()#Player one or two (two is run via command line) iflen(sys.argv) <=1:
emailAddress ='blah@gmail.com'
gameID = client.registerClient(emailAddress)print'gameID =', gameID
else:
joinstatus =False
emailAddress ='blah2@gmail.com'try:
gameID =int(sys.argv[1])print gameID,'from command argument'
joinstatus = client.join(gameID, emailAddress)except DuplicateEmailException:
print"bad email"if joinstatus ==False:
print"Failed to join game: " + str(gameID)sys.exit(1)
endT =Falsewhile endT==False:
listC = client.startTurn()print'list',listC
for color in listC:
print'c',color
res = client.chooseColor(color)print res
endT = client.endTurn()print client.winGame()# Close!
transport.close()
The second one is a little harder, it was the classic battleship game. if you look around the web hard enough you will find all the best tactics to win
importsysfromrandomimport randrange
sys.path.append('../gen-py')import Battleship2
#from battleship2 import Battleship2from ttypes import *
from thrift.transportimport TSocket
from thrift.transportimport TTransport
from thrift.protocolimport TBinaryProtocol
class Player:
attackPattern =[]
attackNextCellIndex =0
attackPatternLength =0
emailAddress =''
hitLocations ={}
state =0#0 is search, 1 is try to sink via stack
hitStack =[]
roundRobin =[(0,-1),(0,1),(-1,0),(1,0)]
currentRobin = roundRobin[:]def__init__(self,client,emailAddress):
self.client= client
self.placePieces()self.emailAddress= emailAddress
print'finished placing', emailAddress
self.createPattern()def placePieces(self):
for piece inxrange(1,6):
whilenotself.client.placePiece(piece,
Coordinate(randrange(10), randrange(10)),bool(randrange(2))):
passdef createPattern(self):
for i inrange(0,10):
for j inrange(0,10):
if(j % 2==0):
if(i % 2==0):
self.attackPattern.append((i,j))else:
if(i % 2==1):
self.attackPattern.append((i,j))printself.attackPatternself.attackPatternLength=len(self.attackPattern)def changeStateHit(self):
self.state=1def changeStateSearch(self):
self.state=0def resetRobin(self):
self.currentRobin=self.roundRobin[:]def addHitStack(self,result,x,y):
if result==6:
self.hitStack.append((x,y))self.changeStateHit()def attackCell(self,x,y):
result =self.client.attack(Coordinate(x,y))if result in[1,2,3,4]:
print'sunk ship=>',result
return result
def attackNext(self):
printself.attackNextCellIndex,'my turn, i am gonna kill this guy'
x,y =self.attackPattern[self.attackNextCellIndex]ifself.attackNextCellIndex<self.attackPatternLength:
self.attackNextCellIndex+=1
result =self.attackCell(x,y)print'attacking', x,y,'result:',result
self.addHitStack(result, x, y)def attackFirstInStack(self):
iflen(self.hitStack)==0:
self.changeStateSearch()else:
(x,y)=self.hitStack[0]iflen(self.currentRobin)==0:
#if we shot all the surrounding areas we reset itself.resetRobin()self.hitStack=self.hitStack[1:]else:
foundNextCell=Falsewhile foundNextCell==Falseandlen(self.currentRobin)>0:
"get next most probable cell (for now its the first)"(a,b)=self.currentRobin.pop()"if cell between bounds of grid"if a+x>=0and a+x<=9and b+y>=0and b+y<=9:
result =self.attackCell(x+a,y+b)# if result==6:# self.resetRobin()self.addHitStack(result, x+a, y+b)
foundNextCell =True"may need to put code that knows orientation of ship ""if there are two shots next to each other"def hitNext(self):
"keep searchin"ifself.state==0:
self.attackNext()else:
"attack known areas"self.attackFirstInStack()# Make socket
transport = TSocket.TSocket('thriftpuzzle.facebook.com',9031)#transport = TSocket.TSocket('localhost', 9090)# Buffering is critical. Raw sockets are very slow
transport = TTransport.TBufferedTransport(transport)# Wrap in a protocol
protocol = TBinaryProtocol.TBinaryProtocol(transport)# Create a client to use the protocol encoder
client = Battleship2.Client(protocol)# Connect!
transport.open()#Player one or two (two is run via command line) iflen(sys.argv)<=1:
emailAddress ='blah@gmail.com'
gameID = client.registerClient(emailAddress)print'gameID =', gameID
else:
joinstatus =False
emailAddress ='blah2@gmail.com'try:
gameID =int(sys.argv[1])print gameID,'from command argument'
joinstatus = client.join(gameID, emailAddress)except DuplicateEmailException:
print"bad email"if joinstatus ==False:
print"Failed to join game: " + str(gameID)sys.exit(1)#create a player
player = Player(client,emailAddress)
totalMoves =0while totalMoves<1000:
totalMoves+=1try:
isTurn = client.isMyTurn()except GameOverException:
breakif isTurn:
player.hitNext()print client.winGame()# Close!
transport.close()
While working on my Thesis i had to get the features’ weights from the SVM Model. Thorsten Joachims published a perl script but i was using Python, i rewrote his script in python and he had graciously put a Download Link on his website.
Using this script will get you all the features’ weights. this is incredibly useful later on,
you can systematically eliminate features, as follows:
After training on all current features, select K% with highest SVM weight and K% with lowest (most negative) SVM weights
Iterate
you will notice that you can get higher prediction result with only a subset of your features.
* if you use this script in your publication or commercial product please credit me
# Compute the weight vector of linear SVM based on the model file# Original Perl Author: Thorsten Joachims (thorsten@joachims.org)# Python Version: Ori Cohen (orioric@gmail.com)# Call: python svm2weights.py svm_modelimportsysfromoperatorimport itemgetter
try:
import psyco
psyco.full()exceptImportError:
print'Psyco not installed, the program will just run slower'def sortbyvalue(d,reverse=True):
''' proposed in PEP 265, using the itemgetter this function sorts a dictionary'''returnsorted(d.iteritems(), key=itemgetter(1), reverse=True)def sortbykey(d,reverse=True):
''' proposed in PEP 265, using the itemgetter this function sorts a dictionary'''returnsorted(d.iteritems(), key=itemgetter(0), reverse=False)def get_file():
"""
Tries to extract a filename from the command line. If none is present, it
assumes file to be svm_model (default svmLight output). If the file
exists, it returns it, otherwise it prints an error message and ends
execution.
"""# Get the name of the data file and load it intoiflen(sys.argv) <2:
# assume file to be svm_model (default svmLight output)print"Assuming file as svm_model"
filename ='svm_model'#filename = sys.stdin.readline().strip()else:
filename =sys.argv[1]try:
f =open(filename,"r")exceptIOError:
print"Error: The file '%s' was not found on this system." % filename
sys.exit(0)return f
if __name__ =="__main__":
f = get_file()
i=0
lines = f.readlines()
printOutput =True
w ={}for line in lines:
if i>10:
features = line[:line.find('#')-1]
comments = line[line.find('#'):]
alpha = features[:features.find(' ')]
feat = features[features.find(' ')+1:]for p in feat.split(' '): # Changed the code here.
a,v = p.split(':')ifnot(int(a)in w):
w[int(a)]=0for p in feat.split(' '):
a,v = p.split(':')
w[int(a)] +=float(alpha)*float(v)elif i==1:
if line.find('0')==-1:
print'Not linear Kernel!\n'
printOutput =Falsebreakelif i==10:
if line.find('threshold b')==-1:
print"Parsing error!\n"
printOutput =Falsebreak
i+=1
f.close()#if you need to sort the features by value and not by feature ID then use this line intead:#ws = sortbyvalue(w)
ws = sortbykey(w)if printOutput ==True:
for(i,j)in ws:
print i,':',j
i+=1
I like using dictionaries in python, sometime i abuse then and load an obscene amount of data into them. what is good about dictionaries in python (hashmaps in other languages) is that you are able to access (write/read) any cell in O(1), and since its memory based it is super fast.
One of my Machine Learning algorithms relied solely on dictionary manipulations. i keep adding data into dictionary records, so you can imagine what happens after i loaded roughly 2gb of data when i only have 3gb of ram. dictionaries do not use virtual space so basically what you have is the amount of free RAM space available to you.
However you can always manipulate your data in other forms and probably rewrite your entire algorithm to use some kind of other more efficient data structure, but time is money and i didnt want to reinvent the wheel all over again.
I knew python has every solution available to men kind, but i just had to look for it. i was aware of cPickle, but i wanted to find something that behaves like a dictionary but uses the hard drive space. wasnt long before i found out about shelve. so which one is the one i need?
A google search lead me to this page, the guy who posted there gives the gist of it all.
Use Shelve when you have a large amount of data that you only need small parts of at a time.
Use cPickle when you have data that you want to access all at once.
Basically between Shelve and cPickle, you are trading disk-access speed for in-memory-access speed.
It felt like shelve was what i needed. so what shelve is ?
A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is
that the values (not the keys!)in a shelf can be essentially arbitrary Python objects
— anything that the pickle module can handle. This includes most class instances,
recursive data types,and objects containing lots of shared sub-objects.
The keys are ordinary strings.
To make things short, i managed to solve my problem, with good speed and memory consumption. the trick is to load a shelve like so:
shelve.open(name,'n',writeback=True)
Now you need to remember that if you do writeback=True. you are still reading stuff to your RAM in the form of a shelve cache. unless you do dict.sync() or dict.close() and then it writes everything to the HD and cleans up the cache. (if you start putting more data into the dict, it will accumulate all the dict[key]=value that you have used from that point and until you sync() again)
A shelve is not 100% compatible with your old dictionary code, there is one change that you need to take care of, the following is taken from the python document and explains it all:
# as d was opened WITHOUT writeback=True, beware:
d['xx']=range(4)# this works as expected, but...
d['xx'].append(5)# *this doesn't!* -- d['xx'] is STILL range(4)!# having opened d without writeback=True, you need to code carefully:
temp = d['xx']# extracts the copy
temp.append(5)# mutates the copy
d['xx']= temp # stores the copy right back, to persist it
d=shelve.open(filename,writeback=True) would let you just code,
d['xx'].append(5)and have it work as expected, BUT it would also,
consume more memory and make the d.close() operation slower. also
Ultimately this trick is combination of using most of your RAM and only writing to the HD when you realy MUST and this speeds up things a LOT.
Do you see that delay up there, its another one of those dirty tricks that needs a better code to replace it, i needed it because when you start putting more data into the dict without waiting around 60 seconds, it will sync every second until the First dict.sync() command finished and python frees up all that used memory that the process is eating up, giving it back to the OS. this takes usually takes 10-60 seconds on my system. honestly this was a workaround to make sure everything works, but i wouldnt mind some suggestions on fixing it.
Bottom line this is almost was fast as just using RAM in some cases (unless you have enough RAM obviously).
also note that if you had a dict of 2GB of RAM, the amount of space you would use as a shelve DB is just 750MB.
When you are finished writing to the shelve. you might want to make sure the cache is clean by using sync(). if you are going to read dict items you need to remember that every item you read is indeed being placed in the cache, and again you might fill your memory to the limit, so sync must be used again.
There is one thing that i have always wanted to implement here. a systematic caching, that manages the K dictionary records that you are accessing more than every other key. and these K records will be kept in memory even when you run out of memory and sync everything to the DB. i suspect that this will boost speed a little, as you would have the ones you always need in RAM and always accessible and the ones you dont being dumped into the DB.
I use unity for research purposes, and 90% of my code is not unity. it became quite tedious to work with unity as the debugger. as some of you know unity compiles the code on its own and then runs it, debugging usually means putting console outputs to see if something works. i had to figure out how i can debug outside of unity preferably VS with but still be able to use unity as the rendering engine, but only when i needed it to. consider the following: every line you write has nothing to do with unity, except when you need a character to move. obviously you would like to debug the entire thing without going into unity, and only later to use it to move the character. The following is one method of bypassing Unity3D as a debugger by using Visual Studio.
Using VC# we create another solution outside of the unity project directory, this solution/project will be the one that we are working on and debugging from.
So basically:
unity3d_directory\game
unity3d_directory\testGame <= this is our project
Then we need to add these to our solution references (find them in the unity installation dir):
the actual “game” solution file
UnityEngine.dll
UnityEditor.dll
UnityScript.dll
Then add the Dlls to the references of actual unity game solution (they may already be there).
This takes care of a bunch of errors that relate to the code not finding the unity API.
Then for every class that inherits from MonoBehaviour, we want to add this preprocessor directive, it means that inside (DEBUG) VS we do not use the MonoBehaviour and outside (Unity) we do:
Before we continue we need to understand a little bit about how unity interfaces with a class
It first runs the “Awake()” function, we can consider this awake function to be the constructor, note that a real constructor will always be run when Unity loads a class (even before it runs it). so its better to use Awake inside Unity.
After all awake methods are run, Unity runs “Start()” function before “Update()” but only once. this is also a type of constructor, but it runs after the Awake one.
Then Unity calls “Update()”
so basically what we want to do in VC# is to call the constructor instead of Awake() or Start() (i use either one of them). but make sure that we can still use this code inside unity (remember that we dont want constructors inside unity).
so here is a typical method of enabling only the constructor inside VC# and only the Awake/Start functions inside Unity. what you see here is a preprocessor directive telling visual studio to ignore the “start()” line if we are in debugmode, which means we get the constructor functionality, and it also tells unity to ignore the constructor line and think of this function as awake(). cool isnt it ?
publicclass MyClass
#if !DEBUG: MonoBehaviour
#endif{// this function behaves as Start() functionality inside Unity as a constructor inside VC##if !DEBUGpublicvoid Start()#elsepublic MyClass()#endif{//code goes here}}
So we handled our constructor/awake/start problem. now we address another problem of running actual code . your class should have an Update() function that runs your code, and this will be the main functionality function in each class.
publicclass MyClass
#if !DEBUG: MonoBehaviour
#endif{// this function behaves as Start() functionality inside Unity as a constructor inside VC##if !DEBUGpublicvoid Start()#elsepublic MyClass()#endif{//code goes here}public Update(){// here goes the main code of our class functionality}}
Now lets say that we want a function that outputs to the console in unity when inside unity and output text to the dos window when inside VC#.
publicclass MyClass
#if !DEBUG: MonoBehaviour
#endif{// this function behaves as Start() functionality inside Unity as a constructor inside VC##if !DEBUGpublicvoid Start()#elsepublic MyClass()#endif{//code goes here}public Update(){// here goes the main code of our class functionality}publicstaticvoid print(string s){#if !DEBUG
Debug.Log(s);//Unity#else
Console.WriteLine(s);//C##endif}}
And what about a function that prints to Dos or Unity Gui?
publicclass MyClass
#if !DEBUG: MonoBehaviour
#endif{// this function behaves as Start() functionality inside Unity as a constructor inside VC##if !DEBUGpublicvoid Start()#elsepublic MyClass()#endif{//code goes here}public Update(){// here goes the main code of our class functionality}publicstaticvoid print(string s){#if !DEBUG
Debug.Log(s);//Unity#else
Console.WriteLine(s);//C##endif}privatevoid PrintToGUI(){#if !DEBUG
GUI.guiText.text="Unity";#else
Console.WriteLine("Visual C#");#endif}}
typically you would assign this script to a unity object, but because we are using VC#, we need to emulate how unity runs these classes, from our testGame project we create instances for these classes, our constructors will initialize it. if you are inside Unity, Unity will run awake() and then Start() when instanced. then we create a loop that runs all the Update() functions.
namespace testGame
{class Test
{staticvoid Main(string[] args){
MyClass mc =new MyClass();while(true){
mc.Update();}}}}
the purpose of this blog is to share some of the things I have been doing in the last few years, in the field of machine learning research (using python, c#), in addition to application development tricks and using Unity3D for research purposes.