Week 3 summary

The past week mainly revolved around finalizing what a complete reference implementation would look like. Since, the GSoC project effectively involves creating an entire project, designing each of the components and the structure takes a lot of time. Nonetheless, I am having a great time!

Tasks for Week 3

  • Finish analysis and notebook for CodeChange
  • Create pull request for CodeChanges
  • Create structure and layout for non-pandas implementation
  • Create pull request for OpenIssuesAge
  • Create pull request for Issue Response Time

Summary for week 3

  • Finish analysis and notebook for CodeChange(#162)

    This was probably the easiest task. The only things that caused delays were the constant iterations we went through. Still, it was a necessary part to get the best possible implementation and structure of the project.

  • Create pull request for CodeChanges(#162)
    This is where the majority of the action took place. Apart from several small changes and improvements, I:

    • changed the layout of the implementations
    • redesigned the SourceCode class
    • re-implemented _flatten_data
    • fixed a few bugs related to incorrect calculation of metric
    • improved documentation and updated README.
    • redesigned plotting of charts (not pushed yet)
    • renamed modules and classes

    The new structure is like so:

      implementations/
      ├── code_df (contains the scripts)
      ├── data.json
      ├── notebooks_df (contains all notebooks)
      └── README.md
    
    
  • Create structure and layout for non-pandas implementation (#9)
    This was not a high-priority task for this week. Anyways, I created a rough draft and layout and opened a pull request in my project-tracker. This will be done later. The idea, as of now, is that it would be very similar to the “pandas” implementation, but again, this idea may be scrapped later.

  • Create pull request for OpenIssuesAge (#10)
    Again, the constant iterations, though obviously necessary, caused slight problems and delays. I created a draft pull request in my project tracker as of now. In this week’s meeting, me and the mentors decided to finish those implementations related to commits first, followed by the other categories. OpenIssuesAge, IssueResponseTime and other issue related metrics will be on hold till then.

  • Create pull request for Issue Response Time
    Just like OpenIssuesAge, the implementation is complete, but is on hold till we finish all commit related metrics.

$ \ $

Meeting details

Today’s weekly IRC meeting with my mentors is summarized below

Agenda

  • flatten_data implementation
  • Notebooks required for all modules or not
  • source code check

Summary

  • flatten_data implementation
    This method has caused way too much trouble than it should have :/. The main goal of this method is to “flatten” the nested dictionaries fetched by Perceval, keeping only those fields required. As you can probably guess, it can be implemented in several ways. Finally, we decided to go with the following:

      class Metric (not called)
          _flatten_data() 
    
      class Commit (inherits from Metric)
          _flatten_data() which performs the following:
              + clean the dataframe from Metric class
              + remove commits outside the mentioned (since, until) period
              + remove commits decided to be outside the source code
    

    Initially, only the cleaning was done in a method similar to _flatten_data, called _clean_commit — the data check and source code check happened in Commit class’ __init__ method. This helped clean up the code, making it easier to follow.

  • Notebooks required for all modules or not
    It is a good idea to have notebooks for everything. Just like the reference implementations, these notebooks will explain the purpose and usage of the code in their corresponding python scripts.

  • source code check
    One of the most important tasks a class which computes metric values has is to define source code. Currently, we have three algorithms:

    • Naive (all files are considered to be a part of source code)
    • FolderExclude (exclude files based on their location)
    • ExtensionExclude (exclude files based on their extension)

    We had a choice between two different ways to call these checks:

    1. Here, a user passes the Algorithm class name as a string to the IsSourceCode function. Using the globals() dictionary, the required class’ check method would be called. Something like this:

       class IsSourceCode:
            def __init__(..., algorithm_class):
               self.algorithm_obj = globals()[algorithm_class](source_code_exclude_list)
      
            def check(self, ...):
               return self.algorithm_obj.check(...)
      
    2. The user passes the Algorithm class directly, instead of passing a string. The advantage was transparency, since one would have to implement all the Algorithm classes they would use, unlike the case above, where a string would suffice.

       class IsSourceCode:
       def __init__(..., algorithm_class):
           self.algorithm_class = algorithm_class
      
       def check(self, ...):
           return self.algorithm_class.check(...)
      

      Ultimately, we went with No. 2 for now, due to the slight advantage of clarity. For issues and pull requests, we decided to put to implement that later when we work on those metrics.

  • Tasks for the next week

    • Get CodeChanges merged. Implement the latest suggested changes.

    • Start working on the next commit related metrics in wg-evolution, after completing the above task.

The log for this meeting can be found here.

The next meeting will be on June 24th.
For older GSoC posts, please click here.