In February 2006 I defended my PhD thesis entitled Software Engineering Research on Libre Software: Data Sources, Methodologies and Results at the Universidad Rey Juan Carlos.
The defense committee for my PhD thesis was composed of:
- Manuel Hermenegildo (Universidad Politécnica de Madrid, ES) acting as the president,
- Brian Fitzgerald (University of Limerick, IE),
- Daniel M. Germán (University of Victoria, CA),
- Stefan Koch (Wirtschaftsuniversität Vienna, AT)
- and Antonio Fernández-Anta from my home university acting as the secretary.
I wrote it under the supervision (really, with the invaluable guidance) of Jesús M. González Barahona.
In order to obtain the European mention (Doctor europeus) I wrote part of it (really all of it, there is in an Appendix a long summary in Spanish) in English and had to certify stays of over three months in other European research centers (which in my case were Vienna, AT, and Maastricht, NL).
- Libre (free/open source) software
- Mining software repositories
- Versioning system
- Empirical software engineering
- Software maintenance and evolution
- Volunteer-driven software development
- Social aspects of software development
Software Engineering Research on Libre Software: Data Sources, Methodologies and Results (PDF, 5,8 MB,
)
With the appearance and implantation of Internet new ways of developing software have arisen that make use of telematic tools, follow flexible methodologies and incorporate third-party contributions. One of the paradigmatic examples of software development that counts on the aforementioned characteristics can be found in the phenomenon of libre (free/open source) software, being of special special interest those projects that are large in number of participants and in software size.
Although at first these new environments are less controllable than traditional ones (because development is done generally in a geographically distributed way, there is no a company behind the development that takes the lead, traditional hierarchic structures are not followed or external contributions are hardly predictable), we have access to much information: the software product itself and many of the by-products that are created during the development process (communication archives, bug-tracking systems and versioning systems, among others). These data sources are usually publicly available on the Internet, so we can make exhaustive analysis with a great amount of data (much of which is hardly obtainable in traditional, industrial environments).
The goal of this thesis is to identify the data sources that libre software projects offer publicly, to present and display some methodologies for the analysis of these sources and the data that we can extract from them, and to show the results that have been obtained from applying these methodologies. Our intention is, in particular, to know the libre software phenomenon better, but also in general software creation processes since the acquired knowledge does not have to be specific to libre software, but could be applied to many other development environments.
Thus, we will start in this thesis with the description of the publicly available data sources on the Internet and the data that we can extract from them. Afterwards, several methods, that will depend on the source, will be used to obtain information from the data and to filter out interferences. Finally, several methodologies will be presented and applied on the data obtained from libre software projects which have been selected as case studies. The methodologies will range from classical to novel ones. Thus, among the classical we will perform an analysis of the growth of the software systems as it is known from software evolution, or we will apply social network analysis, a technique from the field of social sciences. In both cases, the contribution of this thesis has been to apply them to libre software projects. Regarding novel methodologies, we propose the archaeological analysis of software systems with the aim of stating what remains from previous versions, the generalization of software evolution to file types different from source code (for instance, documentation, translation or user interface files, among others) or the study of the evolution of volunteer participation and the regeneration of the leading ``core'' group. Also, a series of tools have been created to automate, at least partially, the whole process. These tools permits to reuse these methodologies on other projects.
Among the main contributions of this thesis we can state that this is the first exhaustive analysis of a large number of software projects, although the proposed methodologies and the tools that have been developed allow the study in the next future of more projects. On the other hand, we have shown that the technical analysis should be complemented with socio-technical analysis to fully understand the development process and many of the technical issues of (libre) software projects.
Journals and book chapters
- Beyond Source Code: The Importance of Other Source Artifacts in Software Development (a Case Study)
Gregorio Robles, Jesús M. González Barahona and Juan Julián Mereló Güervós
Journal of Systems and Software. Elsevier. September 2006
Volume 79 Issue 9. Pages 1233-1248. ISSN: 0164-1212. - Applying Social Network Analysis Techniques to Community-driven Libre Software Projects
Luis López, Gregorio Robles, Jesús M. González Barahona and Israel Herraiz
International Journal of Information Technology and Web Engineering
Volume 1 Issue 3. Pages 27-48. ISSN: 1554-1045. IDEA Group. July-September 2006 - Contributor Turnover in Libre Software Projects
Gregorio Robles and Jesús M. González Barahona
In Book: ``Open Source Systems'' edited by Ernesto Damiani, Brian Fitzgerald, Walt Scacchi, Marco Scotto and Giarncarlo Succi
IFIP. Pages 270-283. ISBN 0387342257. May 2006 - Analyzing the Anatomy of two GNU/Linux distributions: Methodology and case studies (Red Hat and Debian)
Jesús M. González Barahona, Gregorio Robles, Miguel Ortuńo, Luis Rodero, José Centeno, Vicente Matellán, Eva Castro and Pedro de las Heras
In Book: "Free/Open Source Software Development'' edited by Stefan Koch
Idea Group. Pages 27-58. ISBN: 1-59140-369-3. July 2004
Conferences and workshops
- Empirical Software Engineering Research on Free/Libre/Open Source Software
Gregorio Robles
International Conference on Software Maintenance (ICSM 2006)
Philadelphia (US), September 2006. IEEE Computer Society. In press. - Mining Large Software Compilations over Time: Another Perspective of Software Evolution (Best-paper award)
Gregorio Robles, Jesús M. González-Barahona, Martin Michlmayr and Juan José Amor
International Workshop on Mining Software Repositories (MSR 2006)
Shanghai (China), May 2006. IEEE Computer Society. Páginas 3-9. ISBN: 1-59593-085-X. - Entry Patterns in Global Distributed Software Projects
Israel Herraiz, Gregorio Robles, Jesús M. González Barahona, Juan José Amor y Teófilo Romera.
First International Workshop on Global Software Development for the Practitioner
Shanghai (China), May 2006. IEEE Computer Society. Pages 3-6. ISBN 1-59593-085-X. - Evolution and Growth in Large Libre Software Projects
Gregorio Robles, Juan José Amor, Jesús M. González-Barahona e Israel Herraiz
8th International Workshop on Principles in Software Evolution
Lisbon (Portugal), September 2005. IEEE Computer Society. Pages 165-174. ISBN: 0-7695-2349-8. - Evolution of volunteer participation in libre software projects: evidence from Debian
Gregorio Robles, Jesús M. González Barahona and Martin Michlmayr.
1st International Conference on Open Source Systems, Genoa (Italy). Pages 100-107. ISBN: 88-7544-048-4. July 2005 - Developer identification methods for integrated data from various sources
Gregorio Robles and Jesús M. González Barahona
2nd Workshop on Mining Software Repositories Workshop
St. Louis (USA), May 2005. IEEE Computer Society. Páginas 106-110. ISBN: 1-59593-123-6. - Self-organized development in libre software projects: a model based on the stigmergy concept
Gregorio Robles, Juan Julián Merelo Guervós y Jesús M. González Barahona
6th International Workshop on Software Process Simulation and Modeling (ProSim 2005)
St. Louis (MI, USA), May 2005. ISBN: 3-8167-6761-3 - Executable source code and non-executable source code: analysis and relationships
Gregorio Robles and Jesús M. González Barahona
4th Workshop on Source Code Analysis and Manipulation (SCAM 2004)
Chicago (IL, USA), September 2004. IEEE Computer Society.Pages 149-157. ISBN: 0-7695-2144-4 - Remote analysis and measurement of libre software systems by means of the CVSAnalY tool
Gregorio Robles, Stefan Koch and Jesús M. González Barahona
2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS '04)
Edinburgh (Scotland, UK), May 2005. IEEE Computer Society. Pages 51-56. ISBN: 0-86341-423-3. - Community structure of modules in the Apache project
Jesús M. González Barahona, Luis López and Gregorio Robles
4th Workshop on Open Source Software Engineering
Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 43-47. ISBN: 0-86341-423-0. - Applying Social Network Analysis to the Information in CVS Repositories
Luis López, Jesús M. González Barahona and Gregorio Robles
1st Workshop on Mining Software Repositories Workshop (MSR)
Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 101-105. ISBN: 0-86341-432-X. - GlueTheos: Automating the Retrieval and Analysis of Data from Publicly Available Repositories
Gregorio Robles, Jesús M. González Barahona and Rishab Aiyer Ghosh
1st Workshop on Mining Software Repositories Workshop (MSR)
Edinburgh (Scotland, UK), May 2004. IEEE Computer Society. Pages 28-31. ISBN: 0-86341-432-X.
Posters
- An Empirical Approach to Software Archaeology
Gregorio Robles, Jesús M. González Barahona e Israel Herraiz
21st International Conference on Software Maintenance
Budapest (Hungary), September 2005. ICSM 2006 Poster Proceedings. Pages 47-50. ISBN: 963-460-981-3
(c) 2005 Gregorio Robles. This work is licensed under a Creative Commons Attribution-ShareAlike license.