Monday, August 21, 2017

Time flies! We are on the final week already!

Since my last blog, I have been working on adding a module to the Stingray library that would enable us to run blocks of code in parallel, evenly distributed across a machine's cores.
I do believe that it would be a great asset, once it is done right.

Unfortunately, the first challenge is performing multiprocessing in Python. I've always heard about the ugly Python monster named "GIL" (Global Interpreter Lock). I never understood why some people hate it so much. But not anymore, every strategy I came up with to try and work around the GIL, it always comes back and hit me.

We tried to use some external libraries to help us. But the downside of doing so is whenever there is an error, you must dig deep inside the external libraries code and figure out what is causing the problem, not to mention that debugging a multi-threaded program is already a challenge.
However, my mentor suggested using the built in multiprocessing module in Python, which basically works around the GIL by creating multiple Python processes.
It actually achieved good results, and the fact that I can customize it as much as I want without much limitation is great. But of course it has to have a downside, and that would be that absolutely no errors are handled automatically, all kinds of exceptions\errors must be handled explicitly.

My goal is to have this module tested and ready before the final deadline. If I do make it in time I believe it will have a great impact on Stingray, I also plan on continuing to develop it even after GSoC ends.

GSoC is a really great experience, it teaches you a lot. Not just on a technical level, working on a huge sized project with developers from all across the world, it helps your work to be mature enough to meetup with the standards.

Monday, August 7, 2017

They say debugging is like investigating a crime scene where you're the murder.

We have been planning to Integrate a python distributed computing library named "Dask" to our Stingray library. After planning I started implementing it step by step. Everything was moving smoothly and I was almost done with the very first working example. I ran the default tests in the library and all tests passed except for a single one!

It didn't look so bad as 47 tests have passed and only a single test failed, yet I have never been so wrong. The failing test was the beginning of a 3 days nightmare! For 3 days straight I've stopped progressing, and all I was doing is continuously debugging trying to figure out what is causing the bug.

At the end of the second day I started to feel powerless and stuck, so I asked my mentor for help. He responded immediately and asked me to write him a failing test so he can investigate himself.
Shortly, he joined me in the debugging fiesta. We've spent the rest of the day investigating.
By the third day I almost lost hope and I was about to postpone the debugging and begin with a new task, but my mentor told me otherwise, so I went back to debug and suddenly! There was it! A single word of code was causing all this mess. Literally all I had to do was to remove a single word!

This was by far the most challenging bug I've faced during GSoC. As much as I've learned how much trouble a single word of code can cause, I hope I don't face a similar bug ever!

Saturday, July 22, 2017

Week 3 - Submitting Pull Requests

Hello again!

It's almost two months now since GSoC coding phase has begun, and I can say that so far I've learned a lot, even things like good practices when writing code and conventions, they may seem like small things but they sure do make a difference.

My two pull requests that I've mentioned in my last blog were reviewed and merged last week, and it felt great to have an impact. I am still looking forward to having more pull requests and to learn a lot in the upcoming days.

Sunday, July 9, 2017

Third blog: Getting into the action

So far in my project what I have been doing is analyzing the already written code and study the algorithm behavior under big testing data.
After I was done with the first class to be analyzed the most of the results were acceptable, except two main class methods that produced shocking results. I shared the results with the community and came up with a fast solution that can improve the results.

It is very exciting to actually start writing code and implementing my thoughts into the Stingray library and have an impact. After I was done with the implementation the results improved better than what I expected, this graph shows the results.



The blue solid curve shows the old implementation, the yellow dashed shows the new implementation. What used to take around 700 s, now is done under 20 seconds.

I've made my first full\ready for merge pull request and I am hoping that it gets merged into the Stingray main repo, still looking to have more impact on Stingray and I am sure that there is still a lot of knowledge to be learned through my project.

Monday, June 26, 2017

Week 2 blog

I've studied algorithms and data structures in my previous year in college, and I usually practice on the topic by solving problems on online websites. In all my previous projects the data sizes that I usually work with aren't large enough to consider the various behaviors under different algorithms. So far in my GSoC I got the chance to experiment and see my self the difference in the behavior according to different algorithms and it is such a great experience to implement what I've studied and to obtain results accordingly. This is a sample of the results I've obtained. If I plotted the time taken to execute a certain logic using an algorithm having the time complexity of O(N):

Tuesday, June 13, 2017

Hello, My name is Omar Hammad and this is my first Blog with Python Software Foundation.
My GSoC project is with Stingray. Stingray is a python library that helps astronaumers research and create timing analysis to study black holes and track their activity. (How cool is that?)
My project is about optimizing the python code.
Optimization can come in any form time\memory efficiency or even code neatness and structure.
So far my project is going great and am already learning a lot, for example: when I used to write programs (outside of GSoC) I generally did not pay much attention on how my program would react under a large number of inputs. Then when I was performing some tests in my GSoC project, my Cpu would absolutely freeze. I thought it was a matter of a high time complexity algorithm being used in the code that makes my Cpu freeze, But what I've learned new is that when I use large data set in a program if the space being occupied by the program is more than what the RAM has availble, the Ram would start swapping the disk memory which is extremely slow (compared to the RAM).
So I guess I had to learn the hard way to pay attention on how much memory my program uses and I must always keep that in mind to create scalable\efficient programs. I'm really looking forward to learn more and more throughout the summer and have an impact in Stingray.