Monday, August 21, 2017

Time flies! We are on the final week already!

Since my last blog, I have been working on adding a module to the Stingray library that would enable us to run blocks of code in parallel, evenly distributed across a machine's cores.
I do believe that it would be a great asset, once it is done right.

Unfortunately, the first challenge is performing multiprocessing in Python. I've always heard about the ugly Python monster named "GIL" (Global Interpreter Lock). I never understood why some people hate it so much. But not anymore, every strategy I came up with to try and work around the GIL, it always comes back and hit me.

We tried to use some external libraries to help us. But the downside of doing so is whenever there is an error, you must dig deep inside the external libraries code and figure out what is causing the problem, not to mention that debugging a multi-threaded program is already a challenge.
However, my mentor suggested using the built in multiprocessing module in Python, which basically works around the GIL by creating multiple Python processes.
It actually achieved good results, and the fact that I can customize it as much as I want without much limitation is great. But of course it has to have a downside, and that would be that absolutely no errors are handled automatically, all kinds of exceptions\errors must be handled explicitly.

My goal is to have this module tested and ready before the final deadline. If I do make it in time I believe it will have a great impact on Stingray, I also plan on continuing to develop it even after GSoC ends.

GSoC is a really great experience, it teaches you a lot. Not just on a technical level, working on a huge sized project with developers from all across the world, it helps your work to be mature enough to meetup with the standards.

Monday, August 7, 2017

They say debugging is like investigating a crime scene where you're the murder.

We have been planning to Integrate a python distributed computing library named "Dask" to our Stingray library. After planning I started implementing it step by step. Everything was moving smoothly and I was almost done with the very first working example. I ran the default tests in the library and all tests passed except for a single one!

It didn't look so bad as 47 tests have passed and only a single test failed, yet I have never been so wrong. The failing test was the beginning of a 3 days nightmare! For 3 days straight I've stopped progressing, and all I was doing is continuously debugging trying to figure out what is causing the bug.

At the end of the second day I started to feel powerless and stuck, so I asked my mentor for help. He responded immediately and asked me to write him a failing test so he can investigate himself.
Shortly, he joined me in the debugging fiesta. We've spent the rest of the day investigating.
By the third day I almost lost hope and I was about to postpone the debugging and begin with a new task, but my mentor told me otherwise, so I went back to debug and suddenly! There was it! A single word of code was causing all this mess. Literally all I had to do was to remove a single word!

This was by far the most challenging bug I've faced during GSoC. As much as I've learned how much trouble a single word of code can cause, I hope I don't face a similar bug ever!