Tuesday, March 10, 2009

Bug Hunting

The last few weeks I have been a good Solaris citizen and reported several bugs against Solaris Nevada: (I have provided the links if the CR is available online)

6803634: virt-install panics in module mac with Jumbo frames enabled
This bug made the system panic when installing a domU which used a network with jumbo frames enabled, it has now been fixed.

6805659: Phantom volume in ZFS pool
This bug causes phantom device links to be left after a zvol has been destroyed.

6815540: Live upgrade is too picky with datasets in non root pools
This is related to live upgrade, live upgrade keeps track of all filesystems used on a host, if any one of them have been removed lumount and ludelete stops working. If you want to delete your old boot environment a few weeks after upgrade and any filesystem has been removed, it will fail before this is fixed.

6815701: snv_109 hangs in boot with SATA enabled on GeForce 8200
My storage node at home hangs when booting snv_109 and SATA is enabled in the BIOS.

As you probably know, Solaris Nevada is the development branch of Solaris. It's not production software so when you use it your are part of the test process.

Friday, March 6, 2009

Upcoming ZFS features

ZFS is constantly evolving and some upcoming things have caught my attention.

The first one is mentioned in a blog entry by Matthew Ahren which deals with the implementation of the new scrub code. The new code fixes the issue that has forced scrubs to restart every time a snapshot is taken. But more interestingly he mentions that this lays the fundament for what is arguably the most requested ZFS feature of all, vdev removal or pool shrinking! This will probably also pave the way for block rewrite so that existing data can be rewritten with current dataset settings such as compression or block size.

The second one is CR 6667683: "need a way to select an uberblock from a previous txg". This will add the ability to fall back to an earlier überblock in case of such a serious error that the pool have become unusable. This problem was highlighted in a long discussion on zfs-discuss. Jeff begun to work on this problem and he stated the timeframe for a fix as "weeks, not months" in early february. In the best of worlds this should never be an issue since ZFS is designed to have a always consistent on-disk format, and I think is very rare but still it can happen with badly behaved hardware which ignores flush-cache commands while stating otherwise.

The third one have had a long time coming, it's the dataset encryption project that has been pushed back even further due to other changes in ZFS. The current target is now snv_120 (July) which means it will not make it into OSOL 2009.06.

I also noticed that user/group quotas seems to be coming in a near future, CR 6501037 "want user/group quotas on ZFS" has been updated to be fixed in snv_113.