Getting rid of annoyances... the hard way
Lately I have been tinkering around with code in shotwell, which is the photo manager I use to sort pictures I click. As a result I built an svn snapshot and set about making my changes. On firing up the freshly built app, I saw an "Updating libraries..." progress bar pulsing away for quite some time. it just would not go away. I didn't wait long enough for it and ended the app to modify my code further. I started the app again. The progress bar comes up again and does not leave.
This had been happening for a few days and I kept ignoring it because I was working on a different part of the code (porting shotwell to cairo). I finally decided to find out what was going on today and I started shotwell in debug mode, which is simply:
SHOTWELL_LOG=1 SHOTWELL_LOG_FILE=:console: ./shotwell
... and it struck me that I had been lazy the very first time and did not bother to set the default directory for shotwell to scan. Shotwell simply assumed it to be my home directory and that was it. Scanning my home directory obviously was taking very long since it has a little over 30GB of data. Anyway, I moved my photos to a different location but it turns out that shotwell cannot read them anymore despite pointing it to the new location. I will have to modify its database to update the file paths.
So the unique thing about shotwell is that it does not modify the photo files. It stores all edits into a database correlated with the file path. So all I had to do was to edit this database (which is $HOME/.shotwell/data/photos.db, an sqlite3 database) and change the file paths to the new location.
Now here's the big warning: NEVER do such changes when you're having a conversation with someone else or are distracted by anything else.
I ended up writing an incorrect query... and then cancelling it midway, thus leaving the database in a pretty messy state. So in the end the only reliable information I had was the name of the image files. I then hacked up a little script to correct this:
#!/bin/sh # Set your photo path here photopath=$HOME/photos sqlite3 photo.db "select filename from PhotoTable" | while read file; do basename=$(basename "$file") fullname=$(find $photopath -name "$basename") if [ -n "$fullname" ]; then echo "Updating $fullname" sqlite3 photo.db "update PhotoTable set filename='$fullname' where filename='$file'" fi done
So those who want to move their shotwell photo databases to another location, here's one (fairly roundabout) way to do this. Enjoy!
FOSS.in day 2 - The Fedora miniconf day
I concluded my previous post yesterday, on my way to FOSS.in to attend day two. Before I left, I saw Gopal saying w00t! to Pradeepto on twitter and wondered what was going on. Turns out, Pradeepto was coming to FOSS.in!
When I arrived at the venue, I found that we had (once again) captured a table in the Expo area. We spread out the DVDs and buttons and waited for people to come over. Saleem was obsessed about spreading the DVDs in a perfect line and arranging the buttons to form the 'f' of Fedora. of course, people were constantly picking the buttons from the 'f' and ruining his design. Fedora goodies were a hit on the first day and we had quite a few people picking them up on the second day too. There were a few people asking for 64-bit DVDs; we might have to note that in future conferences and hopefully till then the flash nonsense will sort itself out. to me, that seems to be the only hurdle right now for a completely trouble-free installation of a 64 bit system.
We had a few people coming over asking for help with their installations and some of us helped out till lunch time. I then met Pradeepto and Sujith along with a bunch of KDE folks for lunch. Post-lunch, we had the Fedora miniconf.
The miniconf was sort of not very well timed for me, since I wanted to attend artagnon's talk on the git object store. Rahul and Amit Shah kicked off the proceedings and Amit gave the first talk of the session. Rahul announced the workouts -- the Fedora for Kids spin by Aditya Patawari and Fedbura Bugzapping by /me. I went up to the first floor and waited for prospective contributors. Two people came in, fish_sticks and jijo. I walked them through setting up their Fedora account, bugzilla account and the sign-up for bugzapping. I also walked them through triaging bugs and the different scenarios they would face. They then started setting up virtual machines so that they could reproduce bugs in emacs. Finally, we could not get much done at all, since neither of them could get their virtual machines up and running in the time we had. Both were running Ubuntu -- it's kinda hard reproducing Fedora bugs in Ubuntu ;)
While this was going on, I was also discussing packaging with some college students from Vellore. Their college had a CMS that students had designed and they were presenting it in the FOSS.in expo. I think it was Pragyaan CMS; I can't remember the name. I volunteered to help thm get started in packaging it for Fedora. I did not want to do it myself since I personally do not have any use for a CMS. They were again trying to set things up to try out a small test rpm package using a tutorial. That was kinda hard going though since they too were using Ubuntu.
In the end the best I could do was point them to some documentation and give some tips on how they could get started and where they could get help. I think they finally joined Aditya's workout, which seemed to be a very crowded affair -- possibly the most popular workout in FOSS.in. I'm not sure how it went though, since I was busy talking to Philip and artagnon later. Oh, I had another person approach me for bugzapping when I had wrapped up and picked up my stuff -- he was a Fedora user for a change ;) We could not really sit down and work on something at that point since it was a little late, so we talked about how he could sign up and start contributing. I hope he finally does sign up for bugzapping.
The day ended with a security talk by James Morris. I made a quick exit after that since I had promised my wife that I will have dinner with her.
I will not be attending day 3, since I have a few things to wrap up before I leave for a week long holiday in north India tomorrow. It's unfortunate that I will be missing Aanjhan's closing keynote as a result. Like the last FOSS.in, I will remember this event for friends -- all the people you get to meet only once a year as well as new people you meet and exchange ideas with. It was definitely not as crowded as last year though. Too bad this will probably be the last FOSS.in, but I hope there is a similar conference somewhere similarly accessible next year.
FOSS.in Day 1
So I made it to what apparently is the last FOSS.in. It was late as usual and so I did not miss much by reaching late. There was a bit of confusion about big grown men like myself coming in as student delegates, but that was cleared out fairly quickly when one of the organizers explained to the volunteers that miniconf speakers were to be admitted as student delegates. I am to conduct a workout today on bugzapping in Fedora today (day 2).
The beginning of FOSS.in was definitely less emphatic than before. Kishore Bhargava had replaced Atul Chitnis with the kick-off ceremony. That was quickly compensated however, by a really interesting talk by Danese on Wikipdia and its technical architecture. There was talk of an Indian mirror. That will be a very cool thing if it materializes. I'm sure one of the IITs can set aside bandwidth and hardware for them.
The wikipedia talk was followed by two completely technical talks; on page cache optimization in VMs by balbir Singh and the other by Lennart Poettering on his latest interest -- systemd. Balbir Singh's talk was quite interesting, where he ex[plained the problem of page cache duplication between guests and hosts in a VM environment. He went on to explain a few possible approaches to solving this. Lennart's talk didn't give anything new for me since I had been following the Fedora devel discussions.
Me, Shreyank and Rahul Sundaram then spent some time pulling artagnon's leg. We went back to the main hall in time to see Philip Tellis finish his talk. Rahul then followed that with his talk on failures in Fedora and what we learned from them.
We finally moved on for our Fedora dinner. Dimitris Glezos wanted to have kababs and we were told of a place where we could get good kababs. That was good, except for the fact that we didn't know where it was. After running around Forum mall looking for the place, we finally settled for Firangi Pani inside Forum mall. All that most of us ended up having was drinks and starters (kababs of course). Lots of them. That accompanied with Lennart and Dimitris discussing how they could get translations from Transifex to systemd with all of the commit log intact. We'll probably hear more about it once they actually reach a conclusion on that.
Now I'm sitting in the bus writing this post and preparing for the workout that I will be doing today. I hope I get a few bugzappers interested today.
Back to the OS class: Memory Allocation
A lot of us in India learn OS concepts from textbooks. The concepts really go directly from the textbooks into examination papers for most, including me. We would grab on to key phrases like "semaphores", "paging", "segmentation", "stack" and so on, and never really stop to wonder how this all is *really* implemented. Some of us aren't interested since all we care about is getting a well paying job while others find goofing around to be a more attractive alternative. Either way, very few really get it.
Four years since I finished college, six since I last took an OS class, and I can say now that I finally got it. Somewhat.
Recently there was a very interesting case I hit upon, which got me wondering how memory allocation was managed by the workhorse of memory allocation, the malloc() function. The malloc function is implemented in the glibc library for Linux (and also for other Unix systems if you want). Its major responsibility (along with its friends, free, mallopt, etc.) is to do accounting of memory allocated to a process. The actual *giving* and *taking* of memory is done by the kernel.
Now the fun part is that on a typical *nix system, there are two ways to request memory from the OS. One is using the brk() system call and the other is by using the mmap() system call. So which one should glibc use to allocate memory? The answer is, both. What malloc does is that it uses brk() to allocate memory for small requests and mmap() for large requests.
So what is the difference between mmap and brk you ask? Well, every process has a block of contiguous memory called the data area. The brk() system call simply increases one end of the data area and hence increases size, allocating memory to the process. To free, all it does is decrease the same end of the data area. This operation is quite fast most of the time. On the other hand, the mmap system call picks up a completely different portion of memory and maps it into the address space of the process, so that the process can see it. Additionally, the mmap call also has to put zeroes in the entire memory area it is about to allocate so that it does not end up leaking information of some old process to this one. This makes mmap quite slow.
So why have mmap at all if it is so slow? The reason is the inherent limitation that brk() has due to the fact that it only grows one way and is always contiguous. Take a case where I allocate 10 objects using brk and then free the one that I had allocated first. Despite the fact that the location is now free, it cannot be given back to the OS since it is locked by the other 9 objects. One way that malloc works around this is by trying to reuse these free spaces. But what if the size of the object I am about to allocate next is larger than any of these freed "holes"? Those holes remain and the process ends up using more memory than it really needs. This is "internal fragmentation".
So to minimize the effect of this internal fragmentation, glibc limits allocation of small objects to brk(). Larger objects are allocated with mmap(). A threshold was set at 128KB, so objects smaller than it are allocated using brk and anything larger is allocated using mmap. The assumption is that smaller object requests would come more often, so the little fragmentation is worth the improvement in speed. Oh, and as for the reuse of the memory holes, it does the "best fit algorithm" -- remember that phrase? ;)
But with more recent versions of glibc (around 2006 actually, so not *very* recent), this threshold limit is dynamic. glibc now tries to adjust to the memory allocation pattern of your program. If it finds that you are allocating larger objects and freeing them soon, it will then increase the threshold, expecting you to allocate larger objects and free them more often. There is of course an upper limit of 32 MB to this. So anything larger than 32 MB will *always* be allocated using mmap. This is quite awesome since it speeds up malloc quite a bit. But it obviously comes with the price of potentially larger memory holes.
There is so much more to this, like the actual details of the way accounting of the brk'ed memory is done, obstacks, arenas. The fun seems to be only beginning.
Yahoo chat room support now in libyahoo2 trunk!
Yahoo chat room support is now in trunk. Many thanks to Kai (Kay) Zhang for this effort, which he made as part of his Fedora Summer Coding project. The code still needs more testing, so I also need to start working on the chat room implementation in ayttm.
I expected git-svn to commit the patchset along with the history in Kay's git repository, but unfortunately that did not happen. So those who want to see the history of commits for this may take a look at his libyahoo2 github repository
libyahoo2 to get chat rooms soon
Kay has been working on the libyahoo2 project as part of the Fedora Summer Coding initiative. He's been working on chat room support and things are looking quite good so far. Kay has finished working on the core functionality of logging in, joining and leaving rooms; only the chat room list functionality is remaining. He was a little shy of interacting with libyahoo2 upstream earlier, but he has been working on it since.
Looking at the pace of the project so far, we could have Kay's code merged into upstream very soon. Following this, me or Ray van Dolson will do a release in Fedora.
GCC Workshop at GRC, IIT Bombay
Mustafa (my manager) knew about my current fascination with learning the _real_ basics of computers (electronics, kernel, compilers, etc.), so he arranged for me to attend the GCC workshop held by IIT Bombay this week. My first reaction was that I would be wasting my time there since I didn't know the first thing about compiler theory and was a novice at best in assembly language programming. He said it wouldn't hurt to try. So I had to try.
I got a chance on the day before the workshop to read up some about things like IRs, RTLs, etc. It was enough that I would not be completely lost from day 1. But I did not attend day 1 at all, thanks to the countrywide strike that crippled all public transport (and some people too). So I spent that day working and also trying to cover what would otherwise have been covered in day 1 -- the various phases of compilation, passes, gray box probing to find out some more about intermediate outputs, etc. I got hold of last year's slides, so it was a little easier.
So I finally made it to days 2, 3 and 4. The first thing I noticed during the lecture sessions was that the professor really knew his stuff. He was well acquainted with the internal layout of gcc and was able to explain it well enough that I really _got_ it. Overall I come out of the sessions today with much more knowledge about gcc than I could ever gain in 4 days on my own. Here are some observations that I made during the course of this session.
The professor really knew his stuff. I say this again so that it does not look like I am ignoring that. There are also a lot of really talented individuals at the GRC, who are doing some pretty interesting research based on gcc. The trouble though is that there seem to have been no efforts whatsoever to share these ideas upstream.
One such idea is the Generic Data Flow Analyzer (GDFA). It is a patch to gcc that provides a data flow analyzer, which can be used to find and eliminate dead code or unused variables. It adds a gimple pass to the compilation sequence and intends to replace the current dead code elimination and unused variable elimination passes with the same code called with different parameters. While the idea is pretty interesting, the sad thing is that there are no signs of an attempt to push this idea upstream. All I could find was an announcement to the gcc mailing list, but no request for comments or for inclusion of the patch.
This is only one of many more ideas that are brewing in the GRC in the minds of some very talented people. But one felt that these ideas were being used only to get degrees and nothing was being done to actually test their feasibility in live production level code. It would be nice to see some of these ideas actually presented upstream with a genuine interest in getting them incorporated.
To conclude, it is a pretty good session for those who want to get started with learning compilers and gcc internals.
Yahoo chat future and libyahoo2
Philip had recently posted on libyahoo2-users that Yahoo is planning to open its instant messaging platform API to the public. It has been delayed a bit since then, but it is surely due.
So how does this change things for libyahoo2 or any other FOSS implementations? For one, the fun of digging through binary data and trying to make sense of it will be gone ;) But on a serious note, we can hope to have some more consistency in behaviour and support will definitely improve. I'm not very keen about the fact that the support will be over HTTP, but I guess it works well for them. For now we can only wait for their announcement before we know what the entire thing looks like. If it is anything like the messages that the current official yahoo! messengers send, then it's only really a wrapper around their old pain of a protocol. But this does not really use JSON, so it is likely that they're writing a fresh implementation. In any case, there is still time for it and in that time, we have some decent work going on on the libyahoo2 code base.
In other news, Kai Zhang has been working on implementing chat room support for libyahoo2 as his Fedora Summer Coding project. His code can be found here. Other than the brief comments in the git logs, everything seems to be quite ok. A bulk of the feature set is already in, so that is pretty good progress. Once the entire feature set is completed and tested, I will have them included in the main libyahoo2 source tree. Following that will be a release and a rebase on Fedora. This will be a good rebase compared to the ugly one the last time around, where I broke all API compatibility in an effort to revamp the authentication support.
Hacking on assembly code: Dynamic memory allocation on stack?
So I started dabbling with assembly language programming a couple of days ago. This was the next logical step in the "going lower down" move I have been doing ever since I started writing programs in Visual Basic some years ago (there, I admitted it). Since then I went through C#, Java, C++, C and now finally assembly. And it is fun to watch a program die in so many innovative ways. It is helping me understand the internals of a program much better.
One of the first things I learnt about assembly programming was that I needed to use completely different syscall numbers and instructions for x86_64 as compared to i386. For example, the syscall number for exit on i386 is 1 while on x86_64 it is 60. Same goes for write -- 4 on i386 and 1 on x86_64. I spent half an our trying to figure out why my program was calling fstat on x86_64 while a similar program built with --32 would work fine.
Crossing all these hurdles, I finally wrote a slightly more complicated (but still useless) program than a hello world. This is a program that takes in an integer string through the command line, converts it to an integer, converts it back to string and prints it back out. Pretty useful huh :)
Now for the interesting part in the code. I always thought of dynamic memory allocation as something you can only do through the OS using the brk() and/or mmap() syscalls. Generally we do this indirectly through malloc() and friends. But what I ended up doing in my program is allocating memory on the stack on the fly. Here's the code snippet:
movb $0x0a, (%rsp) decq %rsp next_digit: movq $0, %rdx divq %rdi addq $0x30, %rdx # Hack since we cannot 'push' a byte movb %dl, (%rsp) decq %rsp
The complete code along with the makefile is at the end of this post. You can build it if you have an x86_64 installation. What I do above is simply:
- Read a digit from the number
- Move the stack pointer ahead to make room for a byte
- Store the ascii representation of that number into that byte
I could not use the push instruction itself, since it can only push 16, 32 or 64 bit stuff on to the stack (with pushw, pushl, pushq). If you push a single byte value, it will be stored in one of the above sizes, not in just 1 byte. What I wanted was to create a string on the fly without limiting myself to a fixed size array, so this seemed to be the only approach. While this works, I still need to find out a few more things about this:
- Is it safe?
- If it is safe, then is there a similar way to do this in C without embedding assembly code? This would be really cool, especially in usage scenarios such as the above. Admitted that the above scenario is pretty useless in itself, but I'm sure there must be similar examples out there that are at least a little more useful.
The code:
.section .data usage: .ascii “Usage: printnum-64 <the number>\n” usagelen = . - usage.section .text .globl _start
Convert a string representation of an integer into an int
.type _get_num, @function _get_num: push %rbp movq %rsp, %rbp movq 0x10(%rbp), %rdx mov $0x0, %rcx mov $0x0, %rax nextchar: # Iterate through the string movb (%rdx), %cl cmp $0x0, %rcx je call_done
subq $0x30, %rcx imulq $0xa, %rax addq %rcx, %rax incq %rdx jmp nextchar
Convert a number into a printable string
.type _print_num, @function _print_num: push %rbp movq %rsp, %rbp movq 0x10(%rbp), %rax movq $0x0a, %rdi
# Hack since we cannot 'push' a byte and increment # %rsp by only 1. push will push whatever it has as # a 16, 32 or 64 bit value (pushw, pushl, pushq) movb $0x0a, (%rsp) decq %rsp
next_digit: movq $0, %rdx divq %rdi addq $0x30, %rdx
# Hack since we cannot 'push' a byte movb %dl, (%rsp) decq %rsp cmp $0x0, %rax jne next_digit movq %rsp, %rbx addq $0x1, %rbx movq %rbp, %rcx subq %rsp, %rcx push %rcx push %rbx push $0x01 call _write jmp call_done
Wrap around the write system call
.type _write, @function _write: push %rbp movq %rsp, %rbp movq 0x10(%rbp), %rdi movq 0x18(%rbp), %rsi movq 0x20(%rbp), %rdx movq $0x01, %rax syscall jmp call_done
I always do this when I am done with a function call
call_done: movq %rbp, %rsp pop %rbp ret
#Program Entry point _start: # Command line arguments: # The parameter list is: # argc: The number of arguments # argv: The addresses of all arguments one after the other # They can be popped out one by one pop %rax cmp $0x2, %rax jne error
# Pop out the first arg since it is the program name, but # keep the second so that it can be fed into the next function pop %rax call _get_num push %rax call _print_num jmp exit
error: push $usagelen push $usage push $0x2 call _write movq $0xff, %rax exit: movq %rax, %rdi movq $60, %rax syscall
The makefile:
32: as –32 $(target).s -o $(target).o ld -melf_i386 $(target).o -o $(target)64: as $(target).s -o $(target).o ld $(target).o -o $(target)
If you save the source as foo.s, you can build it with:
make target=foo 64
Lots and lots of work
The past week has been quite hectic, with a lot of juggling between different things I have been wanting to do. So here's what I had on my mind:
- I have been looking to learn more about compilers. I goofed off in college and missed out on the same course that was taught twice. I always understood enough to fool my teachers into thinking I knew it all, but not enough to really know it all. Or some for that matter. So now I want to make up for it.
- I had not touched ayttm and libyahoo2 for quite a while. So I wanted to do something there
- Kushal had asked me if I could package libraw for Fedora because some random app needed it. He asked me because I knew autotools and I could autotoolize the project before I package it.
- Rahul pointed out this cool little command line audio player called gst123. I had been looking to write something like this for some time now but I just could not wrap my head around gstreamer. I tried it and immediately fell in love. I just had to package it for Fedora.
- Work at my day job. Lots and lots of work.
- Work at my day job. Lots and lots of work. Yes, it is worth mentioning twice
And so here's what I actually ended up doing over the week:
- I had bought 3 books to study compilers. They're just lying there since I haven't had enough time to actually start studying.
- Nothing on ayttm and libyahoo2. Not enough time
- Packaged libraw and submitted for a package review. There is no activity on that bug report yet, but there was some action before it. Libraw upstream does not like autotools, so I had to hand-write a configure script to detect stuff. I also looked up and tried out the app for which Kushal wanted me to package libraw. The app is Shotwell, a photo management program. And it is good; I'm starting to use it for my photographs now. I'm glad I decided to package libraw for it.
I packaged gst123. The package has been approved and I have already submitted an update for F-13. I did this while on a bus from Pune to Mumbai :D
gst123 is a really cool app, try it out. It might not play internet radio streams right out of the box (my use case), but you can easily pipe/grep/cut your way to getting it to work. Here's how I play the radio stream from Absolute Radio:
gst123 `curl -s http://network.absoluteradio.co.uk/core/audio/ogg/live.pls?service=vcbb | grep File1 | cut -d '=' -f 2`
See, it's so easy!
Oh yeah, work at my day job. Lots and lots of work.
Comments are closed.