Week 1 summary

This past week was the first week of the Google Summer of Code —- the first of twelve. Just in case you don’t have any idea regarding my project, it is to create reference implementations of several metrics, specifically those of the Evolution Working Group. These metrics are to be created using jupyter notebooks, pandas and matplotlib.

Every week, I’ll summarize my tasks for that week, any problems I encountered, my tasks for the next week as well a summary of the meeting with the mentors conducted at the end of the week.

Tasks for week 1

  • Create an issue in the Evolution Working Group repo regarding the new format of reference implementations.
  • Create an implementation for Code_Changes-git metric following the new structure and push it to my project-tracker repository.

Summary

  • Issue regarding new format (#153)
    If you missed the new format (it was discussed in the last post), it basically involves a single class in a notebook, which will be imported as part of a module for use in various projects. Anyways, I successfully managed to open it and said issue can be found here.

  • Create the new implementation (pandas) for Code_Changes-git metric
    The existing implemenation has two parts to it: the class and the external analysis. For the new implementation, all I had to do was move the analysis into the class.

    The two most important methods were those that directly relate to the computation of the metric are: compute and compute_timeseries. While compute calculates a single value of the metric per repository, compute_timeseries does the same thing over a regular interval like a month or a week. Thus, by using compute_timeseries, you end up with a dataframe with rows having the metric value for each repeated “period”.

    The pseudo-code for compute_timeseries is below:

    compute_timeseries(period)
      for each repository in Perceval_data:
          group data based on year, followed by period in DataFrame df
    
          create a separate DataFrame all_periods which contains all possible "period" sized intervals from the start date till the end date of the data
    
          do an outer merge of the all_periods with df and add to a dictionary this.
    
      return dictionary with DataFrames of each repository
    

$ \ $

Meeting details

Today, I had the regular weekly meet with my mentors regarding my progress. The meeting is summarized below.

Agenda

  • Change in new implementation structure
  • Tasks for the next week

Summary

  • Change in new implementation structure
    My mentors (Jesus, Valerio and Pranjal) and I felt that creating a parent class for the entire module of python scripts generated from the reference implementations would be a great idea. This was discussed in the latest weekly IRC meeting. This class will have all of the existing reusable functionality, like reading data from Perceval and converting it to a Pandas DataFrame. This is part of my tasks for the coming week.

  • Tasks for the next week

    • Create the root class (#154) as discussed. All metrics will inherit reusable code from it. It would also make it easier to create a module of all implementations.

    • Modify the Code_Changes-git implementation (#155) present in my project tracker according to the latest changes, namely, a root class which the metric class will inherit from. Once, it is approved, it will be added to the Evolution working group repository.

    • Create a reference implementation of Code_changes-git without pandas (#156) since metrics can be implemented without the use of pandas. Certain parts of pandas code may be slightly complex and hence this idea was introduced. Not much has been discussed as of now. The discussion will take place in issue #156 linked above.

    • Create implementations for the next few metrics (#157) based on my proposal timeline. The metrics to be worked on will most likely be:

The log for this meeting can be found here.

The next meeting will be on June 7th.