Personal tools
You are here: Home Activities WoPDaSD 2010: 5th Workshop on Public Data about Software Development

WoPDaSD 2010: 5th Workshop on Public Data about Software Development

WoPDaSD, colocated with OSS, will be held in Notre Dame (IN, USA) on June 2nd. It willl be a place to discuss topics related to the retrieval, analysis and mining of public data about software development, and to present research results. Some of the topics considered are: how these large datasets about FLOSS software development are retrieved, how can they be analyzed and mined, how they can be published, exchanged and extended, which lessons are we learning from their use, and which results are being obtained from their analysis.

Co-located with The 6th International Conference on Open Source Systems

 

June, 2nd, 2010. Notre Dame, IN, USA

 

Introduction | GoalsDetailed Description | Target Audience | Submissions | Publication | Dates | Registration | Organizing and Program Committees

WoPDaSD 2006, 2007, 2008 and 2009

 

Introduction

  Projects such as FLOSSmole and FLOSSMetrics are compiling huge quantities of data about libre (free, open source) software development. The  vailability of these data in formats suitable for analysis by third parties are enabling researchers to focus on the study of the data, and not on data retrieval activities. This is fortunate, since data retrieval from software development repositories is becoming more and more complex, especially when reliable and detailed information from many projects is needed.

  The use for research purposes of this kind of data compiled by teams external to the researcher is posing new problems. Annotation of data, exchange formats, traceability and privacy issues, are becoming issues to be addressed. In addition, working with FLOSS projects to easy obtaining their data, and showing them how that can benefit their activities is also of increasing importance.

  Despite these open issues, the use of these open datasets is enabling researchers in many ways: reproduction of results is easier; massive analysis (based on data from hundreds or even thousands of projects) is possible; quick obtaining of results is simplified; availability of data for research communities with little experience in retrieving data from software repositories.

  Studies and research results based on this kind of dataset have already been presented in workshops, conferences and journals, but rarely the focus is on how to benefit from the datasets, or on the problems derived from their use. In addition, the details of how to use the datasets for different purposes, or specific results from their analysis, are not published elsewhere.

  This workshop is once again (for the fifth year in a row) a place to discuss all these topics, and to present research results developed with these ideas in mind: how these large datasets about FLOSS software development are retrieved, how can they be analyzed and mined, how they can be published, exchanged and extended, which lessons are we learning from their use, and which results are being obtained from their analysis.

Goals

  The goal of this workshop is to foster the production and analysis of publicly available data sources about software development and the exchange of data between different research groups. The workshop is aimed at the following kinds of studies (although other related studies could also be considered):

  • Results based on the analysis of large datasets about software development.This refers mainly to research conducted on FLOSSmole or FLOSSMetrics data, but also on other similar open source datasets. The analysis should show a methodology to explore the projects, but also it should show explanations to "odd" things that could appear in the data set. For instance, a company-driven project can show different behavior than a community-driven project. The study can be in the field of software engineering, economics, sociology, human resources, and others.
  • Retrieval process and exchange formats of publicly available data collections about software development. The data collections presented should be publicly available, based themselves on public data (so that other groups could reproduce the data collection process), and be related to the field of software development. This includes, but is not limited to, data from source code control systems, but tracking systems, mailing lists, websites, source and binary code, quality assurance systems, etc. Although any kind of data collection can be considered, those including information about a large number of projects will be considered especially appropriate.
  • Data mining activities and new retrieval tools. Working with a huge quantity of data invites complexity in storage and analysis. Data mining techniques are welcome in this section, provided that papers include some conclusions about a specific set of projects. Again, this analysis should show a methodology to explore the data and explanations about the whole process. Cross-analysis of datasets, and specially of those provided by the organizers (FLOSSMole and FLOSSMetrics databases) is especially welcome. Also, new tools developed to obtain data from several data sources, such as forums, wikis, bug tracking systems and others fit perfectly here.
  • Usage of public datasets about software development by new research communities, which until now did little empirical research in this area because they lacked the expertise needed to retrieve information directly from the repositories, but are now empowered by the availability of these datasets. Research results produced by these communities, cases of use, problems found, etc. are possible contributions to the workshop.

Detailed Description

  Following the goals described above, the workshop will consider papers about two specific issues (not taking into account the development of new data mining tools, which will also be considered):

  • Analysis of data collections about libre software development: FLOSSMole, FLOSSMetrics, and other similar collections. These collections, already available to any researcher, are offered for analysis by third parties (see below). The studies submitted should detail how they have been used, which part of the information has been considered, how they have been validated or filtered and/or post-processed (if that is the case). The description should be detailed enough to let any other research group reproduce the study.
  • Studies about the data retrieval and preparation for public consumption of other data sets in the same realm, which could be proposed for analysis in future
    editions of the workshop.

FLOSSmole
  FLOSSmole is a set of tools for gathering data (metrics) about the development of free/libre/open source projects. The FLOSSmole project also publishes the resulting analysis about FLOSS projects, and accepts data donations from other research groups. It offers researchers an extensive set of data gathered from the SourceForge development platform and the Freshmeat directories systems, as well as Rubyforge, Objectweb, Free Software Foundation, Github, Savannah. More information can be obtained from http://flossmole.org
FLOSSMetrics
  FLOSSMetrics maintains a database with data from thousands of projects. Currently, the project is working on the retrieval of data, with information now available for about than 3,000 projects (mainly retrieved from CVS, SVN and git repositories, but also mailing lists and issue tracking systems). These results are publicly available at http://melquiades.flossmetrics.org

Target Audience

  The target audience is composed by the research groups interested in empirical software engineering and quantitative studies of the software development processes and methods. This includes not only software engineers, but also researchers from other fields that might use the data for economic, social and other studies.

Submissions

   We will solicit short position papers (3 pages) and research papers (6 pages). Short papers will be expected to discuss controversial issues in the field, or describe interesting or thought-provoking ideas that are not yet fully developed, while full papers will be expected to describe new research results, and have a higher degree of technical rigor than short papers. The papers must be in ACM 2-column format.
  An Easychair account has been set up in order to follow the submission process: http://www.easychair.org/conferences/?conf=wopdasd2010
  For any further information, or any problem related to the Easychair account, please refer to jgb_at_libresoft_dot_es, megan_at_elon_dot_edu or dizquierdo_at_libresoft_dot_es.

Important dates

  • Intent to submit: 20th, March 2010 (not mandatory, only for organizational purposes)
  • Deadline for submission: 25th, March 2010
  • Paper notification: 8th April 2010
  • Camera-ready paper due: 5th May 2010
  • Workshop date: 2nd June 2010

 Publication of selected papers

  The papers presented in the workshop will be published on-line, as Proceedings of the WoPDaSD.

  Selected papers will be invited to submit extended, improved versions, for an special issue of the International Journal of Open Source Software and Processes (IJOSSP).

Registration

  Registration to the WoPDaSD will be free for those registering to the whole OSS Conference. Those interested in attending only the workshop will be able of registering for a small fee. Details in the OSS website.

Organizing and Program Committee

Organizing Committee

  • Jesus M. Gonzalez-Barahona (Universidad Rey Juan Carlos, Spain)
  • Megan Squire (Elon University, USA)
  • Daniel Izquierdo-Cortazar (Universidad Rey Juan Carlos, Spain)

Program Committee

  • Kevin Crowston (Syracuse University, USA)
  • Jean-Christophe Deprez (CETIC, Belgium)
  • Justin Erenkrantz (Apache, USA)
  • Juan Fernández-Ramil (Open University, UK)
  • Daniel M. German (University of Victoria, Canada)
  • Charles D. Knutson (Brigham Young University, USA)
  • Stefan Koch (Wirtschaftsuniversitat Vienna, Austria)
  • Bart Massey (Portland State University, USA)
  • Sandro Morasca (Universita dell'Insubria, Italy)
  • Gregorio Robles (Universidad Rey Juan Carlos, Spain)
  • Francesco Rullani (Copenhagen Business School, Denmark)
  • Walt Scacchi (University of California at Irvine, USA)
  • Tony Wassermann (Carnegie Mellon Silicon Valley, USA)
  • Jim Whitehead (University of California at Santa Cruz, USA)
  • Thomas Zimmermann (University of Calgary, Canada)

 

Document Actions