Blogs

Improving the extraction of Wikipedia data

I am happy to share some recent performance results of a new parser for Wikipedia data dumps that I have developed over the past 2 months.

The new parser is also written in Python, as it was its predecessor included in WikiXRay. However, this new parser comes with notable improvements in speed and accuracy:

Polynomial Regression

Polynomial Regression is a form of linear regression model but fits a non-linear relationship between the value X and Y. Basically we have to add new features to the final equation. But what features? It’s simple we can add the X1 feature as new feature: X1^2 or X1^3. If we have some input features (X1,X2,X3) also we can add new features as X1*X2 or X1^2*X3^2. So, the polynomial regression model is:

Playing with machine learning: Linear Regression

Since two months ago I’m researching about machine learning and its algorithms. The goal is get a good unsupervised and clustering algorithm to analyze every android applications and predict what application you want to install or use in a particular time. The first step is learn and understand the theory of machine learning. For this,  I began to study the Machine Learning Course of Stanford. It’s a great and practical course with videos and material to help understand the classes.

Links collection about software forges: status, criticism and new ideas

During the last two years it is quite common to hear about new software forges, but I'm not going to talk about forges proliferation in this post, what I would to like to discuss is what the next step in collaborative development environments is. Thus in order to get the big picture I spent some hours looking for scientific and "informal" publications, now I think I have a good starting point and it would be great if you can offer feedback or even improve it.

KESI: our first component for the ALERT project

As part of the work of URJC (LibreSoft) in the ALERT project, whose aim is to increase the efficiency of the developers in libre software projects, we are about to present the first iteration of the Knowledge Extractor for Structured Information, aka KESI. This component with a very complicated name has a simple mission, that is to gather information from source code repositories and from issue/bug tracking systems and to send it to the rest of the components of the ALERT platform.

Arduino & Android & ADK

The last Google I/O a new feature based in Android and Arduino was released. The ADK allows the communication between Android and Arduino using a USB connection. Google have developed a library called “USB Accessory” in Android and Arduino. So, this library help us to send and receive data through USB interface. Using the USB accessory we can control the Arduino board.

New challenges about Android, AR and Arduino

During the next months I will work in three interesting topics. The first of them is the creation of a new architecture based in Android that allow communicate and integrate applications. The second  is the port of ARviewer to iPhone platform using phoneGap. And the last one is the very well-know Arduino and its possibilities with Android USB Host.

Bicho 0.9 is comming soon!

During last months we’ve been working to improve Bicho, one of our data mining tools. Bicho gets information from remote bug/issue tracking systems and store them in a relational database.

How Bicho works

Bicho