Playing with Data

In the last couple of month I’ve started to experiment with data. Won’t call it big data, because my datasets are rather small. Like visitors in our holiday apartments or feedback from EclipseCon Europe attendees. But like a real guy I didn’t want to do with MySQL, I wanted to play with the the new toys. So I installed MongoDB.

Anyway, it turned out to be more difficult than I thought. Cleaning up the data and getting them in (the same) shape turned out to be harder than I though. But help was just around the corner – coming from Udacity: A course on Data Wrangling, just what I needed. Turns out it was just the right course at the right time. And forgive me, if I don’t bore you with all the details that I had to clean up.

I’m certainly not a big data expert yet. But I start to understand how to approach my problems and what I can do with my information. And visualizing that information is even more fun. Now you can guess what that heatmap shows 🙂

Heatmap Example (based on Google Maps)

 

Screen Scraping

After my last post I was actually contacted by 2 people asking for more current information on the website that I had built. In particular, they were interested in the conditions of the winter sports facilities that we have in the region (ski-lifts, cross-country trails).

I looked around, and the only information available was on the web sites of the facility operators. No central place where all the data was collected and made available. Since I had never done screen scraping before, I wasn’t really sure what to do.

Reading up on Stackoverflow and other resources I learned that I had to read an HTML site, turn it into a DOM object and find the right places with the right information for the facilities (closed, open, good conditions, red.gif, green.gif). Looking around I found a nice helper library that served me very well with my first version: For every webpage to get data from I wrote a little PHP script to capture the data.

This worked well for the first facility, where the website was quite responsive. The second one was making more trouble with regard to response times. Now I had a 6 sec wait before my page was displaying. That wasn’t really acceptable, because I have still 2 more places to scrape.

gersfeld-ski

So I took the Saturday afternoon to make it work asynchronously. It turned out to be quite easy: I continued to use my PHP scripts, but converted them into functions that could be called with AJAX calls, returning JSON data. From there it took only a couple more minutes, and I was finished. Displaying the site itself is really fast again, and since the scraped information doesn’t show up in the visible part of the browser things can take a little longer. But even scrolling down right away is fun: I enjoy watching the data show up!

Ch’ti

So tonight I was heading to northern France, to present at the Java User Group in Lille. My idea was to fly into Bruessels and take a car from there to meet the folks around 18:00 at the meeting location of the Ch’ti Jug to talk about Eclipse and such.

Now it turns out that this was a really bad idea. The plane I took from Berlin to Bruessels: Well it started late. The excuse the pilot made was that his cabin people had miscounted the number of people on the plane. So they unloaded some luggage, then they found out that the people were actually on the plane, then they re-loaded the luggage.

Execution cost some time, so we started about 45 minutes after the planned departure time. Oh wee, I thought, good that I had planned for some extra time.

Turns out that the traffic jams around Brussels were not in my calculation. They ate up all the buffer that I had planned. But there was a chance! My little TomTom navigation app on the iPhone was telling me that I will be only 3 minutes late. Little did it know!

Just 20 kilometers before Lille my For rental car gave up. No comment, it said.

So what’s left: I can only apologize to folks in Lille. If they still want me, I’ll be back!

Visiting Ancient Sites

Last week, I had to go to Italy for some an Eclipse meeting in Naples and then another one in Florence. Departure to Rome on the Sunday and the May Holiday on Tuesday gave us a chance to visit a couple of places before, between and after the business meetings.

 

It started off on the Monday with a visit to the Forum Romanum. Having learned Latin for many years in school, the place is sort of familiar, and it’s always fun to visit. And not only this, I actually like to see ancient things a lot more than the religious places, which are usually overladed with the symbolisms of a belief I’m not really keen on. It’s actually true for the pagan religions of the ancient times as well, but there I can ignore it.

Next was the Colosseum, and  this time we actually decided to stand in line and pay the fee of € 7 to get in. It took us about an hour to walk around, and that was time well spent! Saw a lot, learned a lot. The Colosseum has an onsite walk that explains the facility as well as the different building stages and the life in a day of a Roman while attending the games.

It is pretty amazing to wander around this place and imagine that down in the arena people were fighting for their lives while on the seats families where having their meals warmed up on open fires, playing with their children and doing beauty maintenance. That’s at least what can be deducted from the items found in the sewers.

Next stop was Naples, where we had May 1st to visit both Pompeji and Herculaneum. Both of them got destroyed in the eruption of Mount Vesuvio in AD 79. While I had visited Pompeji before, it was very interesting to visit Herculaneum, a much smaller site that is also less frequented by tourists. And you actually have a chance to stroll through the modern town attached and have a normal coffee or beer 🙂

Different to Pompeji Herculaneum provides a good look at the structures and buildings, as they were not as destructed as in Pompeji. The picture below shows a look at a Roman fast-food restaurant.

Again on this part of the trip my Latin lessons came back, and I’m looking forward to reading the letters that Pliny the younger sent to his friends, describing the events in AD 79.

As a side note: We staid in an old and quit hotel close to Pozzuoli, called Delle Therme. Completely outdated, but has a lot of charme if you can live with ancient beds:-) And the best: We ran into a photo shooting where we made the actress pose for us in her 60’s outfit.

Next stop was the Eclipse Day Florence. The event was very well organized, and the line-up of speakers was great. But I think that might be easier if you have to offer a location like Florence.

Before this post gets to long: Florence is great, I will go back and take a couple of pictures there!

My traveling compagnon Mike visited Italy for the first time ever.  When we departed in Rome he told me that he was really impressed by Italy.

And guess what: The food was great in all the places we went, so my scales were the only onces who didn’t appreciate Italy.

Open Source Think Tank (Thursday)

Day One turns out to be very interesting, despite of being pretty tired from jet lag.

Earlier today a panel discussion how communities should be managed and treated with a lot of insight from very experienced community managers, then introduction to GENIVI project and initial workshop work. While most of the questions we are supposed to work on have obvious and simple answers (“YES”, “NO”), developing the reasoning in the group shows a large bandwidth of experience and opinion.

The keynote of the day came from Chris Vein, who is CTO for Innovation in the Executive Office of the President. The most interesting talk was on Open Governance, what the different departments are doing to innovate the way they are serving (and want to serve in the future) the individual citizens of the country. He gave a couple of examples from NASA to the department of food and drugs how open source empowers the government agencies. I hope that the talk will be available for public consumption soon. It will certainly give other organizations an idea how far OSS is already spread throughout the governments of the world.

Now I’m listening to the next case study presented by the U.S. Department of Veteran Affairs – an open source project comprising a full-blown system for large hospitals.

Looking forward to more!

this-tor-node-is-causing-you-grief

In the last couple of days I experienced strange visitors on our holiday apartment site. They came through a TOR network and were trying to create users on the page. Apparently they understood that it was a Drupal site, because they had the right URL and everything.

I had never heard of TOR before, so I’m quite amazed to see visitors from the dark side of the internet. On the other hand, its a pain-in-the-butt to delete the 30+ users that they create on a normal day. Does anybody out there have experience with these types of attacks?

FRA Airport

So here I am, ready to board my flight to Washington for EclipseCon 2012. Will be interesting: A new venue, new city (never stayed in the area before). I have a free day tomorrow, so I should check Anne Jacko’s About Washington page tonight. My current favorite is the Smithsonian, but I’m not sure yet. There’s a whole bunch of other stuff that looks interesting. I’m just not sure what security will look like after the events in Toulouse earlier this week.

One of my favorite talks will be the Ford/BugLabs keynote. While the example in the the press release sounds rather uninteresting, I believe that the experiment can produce a whole bunch of cool gadgets for the car. One issue will be how much they open up the car’s infrastructure to be tinkered with: This is where the interesting information is! A cricket radio doesn’t really get my interest so much – why wouldn’t I use the normal radio?

As with every Eclipse conference I don’t expect that I will attend a lot of sessions. But for sure I’ll try to attend as many sessions as possible around the safety critical tooling topic. And I hope to also catch up on what is happening in CDT – as much as I can actually understand it.

Other than that, many chats, many people, many beers.

Chess In Paris

Just came back from Paris, where I’ve spent the last tow days. We were invited to present Polarsys to the Chess, an Artemis funded research project.

I came across quite a few of these projects lately, some of them funded by the EU, some of them funded by Artemis, ITEA or some other agency that I knew or didn’t know. And all of them seem to do some sort of the same thing. At least that what it looks like to me. When I ask, they try to explain to me what makes them so different.

Anyway, many of them seem to spend considerable time on stuff that others have already build or are in the process of building. Some platform components here, some persistency frameworks there. And neither I nor they know if they can use the other project’s results, because they just don’t see them.

If I’m right with my assessment, then we look at a huge waste of taxpayer’s money here. How could this be stopped? Really simple: The funding agencies just need to tell them that they need to develop in open source. Creates visibility as well as accountability. Or is that a problem?

The Shared Pasture And The Butcher

The small village in the mountains had a problem. While the farmers had to work hard to do the fields, they had no space left to have their cows out during the day.

So the solution they decided on was that some of the public wasteland of the village was turned into a pasture where they could all have their cows grassing during the day before they had to take them home to the farms.

That was a brilliant solution: Not only had they all shorter ways everyday with getting the cows in and out, but they also could share the cost for a boy watching the cows.

They also decided that they would get together regularly and see what  they need to do keep things going, and they also decided that they would share the cost for the boy and the work around the pasture.

All worked well, and all the farmers were quite happy. But after some time the butcher of the village came along and said that he would like to see that the cows should be treated differently to make the meat leaner. But since he would not own cows he didn’t really feel that he should participate in supporting the shared effort of the others.

The farmers talked about it for a while and then came to the conclusion that the butcher should pay the same as they would for the common. While he was not directly using the pasture he was still a beneficiary of the improvement, and he also wanted to give direction on how to use the pasture.

They wet back to the butcher, and after some discussions he understood that it would be to his advantage if helped to his business if he would help to improve the common.

Everybody  in the village lived happily ever since.

Also check out Elinor Ostroms’s work.