Gregorio Robles :: My PhD Thesis

Gregorio Robles' Home Page

My PhD Thesis

In February 2006 I defended my PhD thesis entitled Software Engineering Research on Libre Software: Data Sources, Methodologies and Results at the Universidad Rey Juan Carlos.

The defense committee for my PhD thesis was composed of:

Picture of my PhD defense

I wrote it under the supervision (really, with the invaluable guidance) of Jesús M. González Barahona.

In order to obtain the European mention (Doctor europeus) I wrote part of it (really all of it, there is in an Appendix a long summary in Spanish) in English and had to certify stays of over three months in other European research centers (which in my case were Vienna, AT, and Maastricht, NL).

Keywords
  • Libre (free/open source) software
  • Mining software repositories
  • Versioning system
  • Empirical software engineering
  • Software maintenance and evolution
  • Volunteer-driven software development
  • Social aspects of software development
Download

Software Engineering Research on Libre Software: Data Sources, Methodologies and Results (PDF, 5,8 MB, in English)

Abstract

With the appearance and implantation of Internet new ways of developing software have arisen that make use of telematic tools, follow flexible methodologies and incorporate third-party contributions. One of the paradigmatic examples of software development that counts on the aforementioned characteristics can be found in the phenomenon of libre (free/open source) software, being of special special interest those projects that are large in number of participants and in software size.

Although at first these new environments are less controllable than traditional ones (because development is done generally in a geographically distributed way, there is no a company behind the development that takes the lead, traditional hierarchic structures are not followed or external contributions are hardly predictable), we have access to much information: the software product itself and many of the by-products that are created during the development process (communication archives, bug-tracking systems and versioning systems, among others). These data sources are usually publicly available on the Internet, so we can make exhaustive analysis with a great amount of data (much of which is hardly obtainable in traditional, industrial environments).

The goal of this thesis is to identify the data sources that libre software projects offer publicly, to present and display some methodologies for the analysis of these sources and the data that we can extract from them, and to show the results that have been obtained from applying these methodologies. Our intention is, in particular, to know the libre software phenomenon better, but also in general software creation processes since the acquired knowledge does not have to be specific to libre software, but could be applied to many other development environments.

Thus, we will start in this thesis with the description of the publicly available data sources on the Internet and the data that we can extract from them. Afterwards, several methods, that will depend on the source, will be used to obtain information from the data and to filter out interferences. Finally, several methodologies will be presented and applied on the data obtained from libre software projects which have been selected as case studies. The methodologies will range from classical to novel ones. Thus, among the classical we will perform an analysis of the growth of the software systems as it is known from software evolution, or we will apply social network analysis, a technique from the field of social sciences. In both cases, the contribution of this thesis has been to apply them to libre software projects. Regarding novel methodologies, we propose the archaeological analysis of software systems with the aim of stating what remains from previous versions, the generalization of software evolution to file types different from source code (for instance, documentation, translation or user interface files, among others) or the study of the evolution of volunteer participation and the regeneration of the leading ``core'' group. Also, a series of tools have been created to automate, at least partially, the whole process. These tools permits to reuse these methodologies on other projects.

Among the main contributions of this thesis we can state that this is the first exhaustive analysis of a large number of software projects, although the proposed methodologies and the tools that have been developed allow the study in the next future of more projects. On the other hand, we have shown that the technical analysis should be complemented with socio-technical analysis to fully understand the development process and many of the technical issues of (libre) software projects.

Publications related to/originated from this thesis

Journals and book chapters

  • Beyond Source Code: The Importance of Other Source Artifacts in Software Development (a Case Study)
    Gregorio Robles, Jesús M. González Barahona and Juan Julián Mereló Güervós
    Journal of Systems and Software. Elsevier. September 2006
    Volume 79 Issue 9. Pages 1233-1248. ISSN: 0164-1212.

  • Applying Social Network Analysis Techniques to Community-driven Libre Software Projects
    Luis López, Gregorio Robles, Jesús M. González Barahona and Israel Herraiz
    International Journal of Information Technology and Web Engineering
    Volume 1 Issue 3. Pages 27-48. ISSN: 1554-1045. IDEA Group. July-September 2006

  • Contributor Turnover in Libre Software Projects
    Gregorio Robles and Jesús M. González Barahona
    In Book: ``Open Source Systems'' edited by Ernesto Damiani, Brian Fitzgerald, Walt Scacchi, Marco Scotto and Giarncarlo Succi
    IFIP. Pages 270-283. ISBN 0387342257. May 2006

  • Analyzing the Anatomy of two GNU/Linux distributions: Methodology and case studies (Red Hat and Debian)
    Jesús M. González Barahona, Gregorio Robles, Miguel Ortuńo, Luis Rodero, José Centeno, Vicente Matellán, Eva Castro and Pedro de las Heras
    In Book: "Free/Open Source Software Development'' edited by Stefan Koch
    Idea Group. Pages 27-58. ISBN: 1-59140-369-3. July 2004

Conferences and workshops

  • Empirical Software Engineering Research on Free/Libre/Open Source Software
    Gregorio Robles
    International Conference on Software Maintenance (ICSM 2006)
    Philadelphia (US), September 2006. IEEE Computer Society. In press.

  • Mining Large Software Compilations over Time: Another Perspective of Software Evolution (Best-paper award)
    Gregorio Robles, Jesús M. González-Barahona, Martin Michlmayr and Juan José Amor
    International Workshop on Mining Software Repositories (MSR 2006)
    Shanghai (China), May 2006. IEEE Computer Society. Páginas 3-9. ISBN: 1-59593-085-X.

  • Entry Patterns in Global Distributed Software Projects
    Israel Herraiz, Gregorio Robles, Jesús M. González Barahona, Juan José Amor y Teófilo Romera.
    First International Workshop on Global Software Development for the Practitioner
    Shanghai (China), May 2006. IEEE Computer Society. Pages 3-6. ISBN 1-59593-085-X.

  • Evolution and Growth in Large Libre Software Projects
    Gregorio Robles, Juan José Amor, Jesús M. González-Barahona e Israel Herraiz
    8th International Workshop on Principles in Software Evolution
    Lisbon (Portugal), September 2005. IEEE Computer Society. Pages 165-174. ISBN: 0-7695-2349-8.

  • Evolution of volunteer participation in libre software projects: evidence from Debian
    Gregorio Robles, Jesús M. González Barahona and Martin Michlmayr.
    1st International Conference on Open Source Systems, Genoa (Italy). Pages 100-107. ISBN: 88-7544-048-4. July 2005

  • Developer identification methods for integrated data from various sources
    Gregorio Robles and Jesús M. González Barahona
    2nd Workshop on Mining Software Repositories Workshop
    St. Louis (USA), May 2005. IEEE Computer Society. Páginas 106-110. ISBN: 1-59593-123-6.

  • Self-organized development in libre software projects: a model based on the stigmergy concept
    Gregorio Robles, Juan Julián Merelo Guervós y Jesús M. González Barahona
    6th International Workshop on Software Process Simulation and Modeling (ProSim 2005)
    St. Louis (MI, USA), May 2005. ISBN: 3-8167-6761-3

  • Executable source code and non-executable source code: analysis and relationships
    Gregorio Robles and Jesús M. González Barahona
    4th Workshop on Source Code Analysis and Manipulation (SCAM 2004)
    Chicago (IL, USA), September 2004. IEEE Computer Society.Pages 149-157. ISBN: 0-7695-2144-4

  • Remote analysis and measurement of libre software systems by means of the CVSAnalY tool
    Gregorio Robles, Stefan Koch and Jesús M. González Barahona
    2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS '04)
    Edinburgh (Scotland, UK), May 2005. IEEE Computer Society. Pages 51-56. ISBN: 0-86341-423-3.

  • Community structure of modules in the Apache project
    Jesús M. González Barahona, Luis López and Gregorio Robles
    4th Workshop on Open Source Software Engineering
    Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 43-47. ISBN: 0-86341-423-0.

  • Applying Social Network Analysis to the Information in CVS Repositories
    Luis López, Jesús M. González Barahona and Gregorio Robles
    1st Workshop on Mining Software Repositories Workshop (MSR)
    Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 101-105. ISBN: 0-86341-432-X.

  • GlueTheos: Automating the Retrieval and Analysis of Data from Publicly Available Repositories
    Gregorio Robles, Jesús M. González Barahona and Rishab Aiyer Ghosh
    1st Workshop on Mining Software Repositories Workshop (MSR)
    Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 28-31. ISBN: 0-86341-432-X.

Posters

  • An Empirical Approach to Software Archaeology
    Gregorio Robles, Jesús M. González Barahona e Israel Herraiz
    21st International Conference on Software Maintenance
    Budapest (Hungary), September 2005. ICSM 2006 Poster Proceedings. Pages 47-50. ISBN: 963-460-981-3
License

(c) 2005 Gregorio Robles. This work is licensed under a Creative Commons Attribution-ShareAlike license.