Workshop on Public Data about Software Development (WoPDaSD 2007) (June 2007)
Co-located with the Third International Conference on Open Source Systems.
Motivation
In the latest years, and specially thanks to the huge availability of data about software development that can be obtained from libre (free, open source) projects, the research community is starting to produce, use and exchange large data sets of information. These data sets have to be retrieved, purged, described, and can be published for public consumption by other groups. Their availability allows for the decoupling of research activities (some groups can focus on data retrieval and preliminary analysis, which others can devote to more in-depth analysis without bothering with data retrieval), the reproducibility of research results, and even the collaboration (and competition) in the analysis of data.
All this activity is being presented in several workshops and conferences, but a single place to exchange experiences does not exist yet. We propose this workshop as such a place, where researchers in the field can discuss specifically about this kind of data sets, how they are retrieved, how can they be analyzed and mined, how they can be exchanged and complemented, etc.
Main goals
The goal of this workshop is to foster the analysis of public available data sources about software development and the exchange of data between different research groups.
The workshop is aimed specifically at two different target studies:
- Analysis of some data collections about software development
(provided by the organizers, see below).
The analysis should show a methodology for exploring any of those data sets (or better, to relate both) searching for some specific result in the area of software development, and its applications to the actual data sets. The study can be in the field of software engineering, economics, sociology, human resources, and others. - Retrieval process and exchange formats of public available data
collections about software development.
The data collections presented should be publicly available, based themselves on public data (so that other groups could reproduce the data collection process), and be related to the field of software development. This includes, but is not limited to, data from source control systems, but tracking systems, mailing lists, websites, source and binary code, quality assurance systems, etc. Although any kind of data collection can be considered, those including information about a large amount of projects will be considered especially appropriate.
Detailed description
Following the goals described above, the workshop will accept papers about two specific issues:
- Analysis of two data collections about libre software development:
FLOSSMole and CVSAnaly-SF.
These collections, already available to any researcher, are offered for the analysis. The studies submitted should detail how they have been used, which part of the information has been considered, how they have been validated or filtered and/or post-processed (if that is the case). The description should be detailed enough to let any other research group reproduce the study. - Studies about the data retrieval and preparation for public consumption of data sets in the same realm, which could be proposed for analysis in future editions of the workshop.
FLOSSMole
FLOSSMole (formerly OSSmole) is a set of tools for gathering data (metrics) about the development of free/libre/open source projects. The FLOSSMole project also publishes the resulting analysis about FLOSS projects, and accepts data donations from other research groups. It offers this workshop a complete set of data gathered from the SourceForge development platform and the Freshmeat announcement systems.
More information can be obtained from http://ossmole.sourceforge.net
CVSAnaly-SF
CVSAnalY is a tool created by the Libre Software Engineering Group at the Universidad Rey Juan Carlos that extracts statistical information out of CVS (and recently Subversion) repository logs and transforms it in database SQL formats. It has been used to retrieve information for all projects that have an active CVS system at SourceForge. This data set is publicly offered to be analyzed in this workshop.
More information can be obtained from http://libresoft.urjc.es/Data
Target audience
The target audience is composed by the research groups interested in empirical software engineering and quantitative studies of the software development processes and methods. This includes not only software engineers, but also researchers from other fields that might use the data for economic, social and other studies.
Submissions
We solicit short position papers (3 pages) and research papers (6 pages). Short papers will be expected to discuss controversial issues in the field, or describe interesting or thought-provoking ideas that are not yet fully developed, while full papers will be expected to describe new research results, and have a higher degree of technical rigor than short papers. The papers must be in ACM 2-column format. Authors may indicate their intent to submit a paper by April 16th 2007 (the title of the paper and abstract will need to be submitted online via e-mail to grex@gsyc.escet.urjc.es). The full paper should be sent to grex@gsyc.escet.urjc.es by April 25th 2007 (extended to May 4th 2007!) as PDF. Notification of acceptance will be sent by May 26th 2007. The final version of the paper is due on June 5th 2007. Accepted papers will be published as part of the WoPDaSD proceedings (which will be available on the Internet).
Program
Place (see details in the OSS 2007 website):
- Jean Holland Theatre
- Main Building (beside cafeteria)
- University of Limerick
Schedule:
- 9:00 Registration and shake-hands session
- 9:15 "Opening: WoPDaSD presentation", Megan Conklin (Motivation of the workshop, last edition, proposals for this year, schedule, mention to the sponsoring projects, etc.).
- 9:45 "On the convenience of research repositories with public information about software development", Jesus M. Gonzalez-Barahona
- 10:30 Break
- 11:00 First round of (4) reviewed presentations (20 min. each, including
clarification questions)
- "Studying Production Phase SourceForge Projects: An Exploratory
Analysis Using cvs2mysql and SFRA+" (PDF),
Daniel P. Delorey, Charles D. Knutson, Alex MacLean - "Working with Open Source Development Data" (PDF),
Matthijs den Besten, Hela Masmoudi, Jean-Michel Dalle - "A Preliminary Analysis of Publicly Available FLOSS Measurements:
Towards Discovering Maintainability Trends" (PDF),
Ioannis Samoladas, Stamatia Bibi, Ioannis Stamelos, Sulayman Sowe, Ignatios Deligiannis - "Using FLOSSmole Data in Determining Business Readiness Ratings" (PDF),
Ashutosh Das, Anthony I Wasserman
- "Studying Production Phase SourceForge Projects: An Exploratory
Analysis Using cvs2mysql and SFRA+" (PDF),
- 12:20 Discussion about issues raised by this first round of presentations
- 13:00 Lunch
- 14:00 Second round of (3) reviewed presentations (20 min. each,
including clarification questions)
- "Programming Language Trends in Open Source Development:
An Evaluation Using Data from All Production Phase SourceForge
Projects" (PDF),
Daniel P. Delorey, Charles D. Knutson, C. Giraud-Carrier - "Understanding the KDE Social Structure through Mining of Email
Archive" (PDF, Appendix),
Matthias Studer, Nicolas S. Müller, Gilbert Ritschard - "Simulation of the temporal evolution of OSS projects" (PDF),
Ioannis Antoniades, Michalis Kontoyiannis, and Ioannis Stamelos
- "Programming Language Trends in Open Source Development:
An Evaluation Using Data from All Production Phase SourceForge
Projects" (PDF),
- 15:00 Discussion about issues raised by this second round of presentations
- 15:30 Break
- 16:00 "ResearchFriendly: friendly to researchers, friendly to developers", Gregorio Robles
- 16:30 "Panel: the future of research based on public information about software development": Walt Scacchi, Greg Madey, Megan Conklin, Gregorio Robles
- 17:15 Open discussion and future of the workshop
- 18:00 Closing
Challenge
This edition an specific challenge is proposed to contributors, in addition to regular papers. The topic of the challenge is "data visualization". For participating in it, contributors should send a short paper (2 pages of text, up to 4 pages of figures) visualizing the data in any of the datasets offered (FLOSSMole, CVSAnaly-SF, or both).
The text in the paper should explain the visualization technique used, and its possible applications. The images in the paper should be the visualization images themselves, or snapshots of them. Visualization techniques that help to answer interesting questions, to better understand the data, or to find relationships in it (including relating data in both datasets) are encouraged.
Each paper will undergo a thorough review, and accepted challenge papers will be published as part of the workshop proceedings. Authors of selected challenge papers will be invited to give a presentation at a special session at the workshop, Deadlines for challenge papers will be the same than for regular papers.
Important Dates
- Intent to submit: 16th April 2007 (not mandatory, only for organizational purposes)
- Deadline for submission: 25th April 2007 (extended to May 4th 2007!)
- Paper notification: 26th May 2007
- Camera-ready paper due: 5th June 2007
- Workshop date: 14th June 2007
Organizing Committee
- Jesus M. Gonzalez-Barahona (Universidad Rey Juan Carlos, Spain)
- Megan Conklin (Elon University, USA)
- Gregorio Robles (Universidad Rey Juan Carlos, Spain)
Program Committee
- Kevin Crowston (Syracuse University, USA)
- Jean-Christophe Deprez (CETIC, Belgium)
- Roberto Di Cosmo (University of Paris VII, France)
- Daniel M. Germán (University of Victoria, Canada)
- Stefan Koch (Wirtschaftsuniversität Vienna, Austria)
- Ken Krugler (Krugle, Inc., USA)
- Bart Massey (Portland State University, USA)
- Sandro Morasca (Università dell'Insubria, Italy)
- Francesco Rullani (Copenhagen Business School, Denmark)
- Walt Scacchi (University of California at Irvine, USA)
- Diomidis Spinellis (Athens University of Economics and Business, Greece)
- Giancarlo Succi (Free University of Bozen-Bolzano, Italy)
- Tony Wassermann (Carnegie Mellon West, USA)
- Dawid Weiss (Poznan University of Technology, Poland)
- Jim Whitehead (University of California at Santa Cruz, USA)
- Thomas Zimmermann (Universität des Saarlandes, Germany)
Sponsoring projects
Some research projects sponsor this workshop (although it is open to anyone who registers): Some of these projects are funded in part by the European Commission, under the Information Society Technologies (IST) research programme of the Sixth Framework Program. A list of the IST projects in the area of Software Technologies is available from http://cordis.europa.eu/ist/st/projects.htm.
Further information
Should you need further information, you can email jgb @ gsyc.escet.urjc.es.
