Ron's profileWelcome to the Home of R...PhotosBlogGuestbookMore Tools Help

Welcome to the Home of Ron Hackett

bobsobol = Ron Hackett
Artsy  
Photo 1 of 6

RaGEZONE PT

Loading...Loading...
21 April

Archive Images and Photos

The majority of this post is a copy of an Email I sent to a family friend when I was asked to "digitally restore" some faded and damaged photographs depicting family history which they had scanned for preservation.

The images where quite damaged before the scan, but the Email I received containing 3 Jpegs of these images had far more corruption from the image data compression than anything else, and where stored at a resolution which could be comfortably viewed on screen without reduction.

Admittedly, any higher resolution would only allow you to zoom in on the grainy scratched surface of the image. But it would have lessened the compression errors, and given me a chance to see clean areas between the scratches and grain.

I hope this information will be useful to anyone trying to make digital copies of ageing images (photos, snaps, art work etc) for their preservation.

JpegArtifact2

The illustration above is of two identical images at a resolution of 64 pixels by 64 pixels, depicting a pattern deliberately designed to show up the errors in Jpeg compression.

The left half was saved as a 0.4 kilobyte truecolour PNG image. 32bit, just so there is no argument that the PNG is storing a palletised 6 colour image, where Jpeg will only store grayscale or truecolour. The right half was stored as a 2 kilobyte Jpeg. 4 times the storage space. Each where then blown up to 256 x 256 pixels and aligned alongside each other so you don't have to strain you eyes to see them and the damage caused by Jpeg compression.

Cleaning images is about looking for patterns that are clear from the original, and removing what is out of place. This is made almost impossible by the mess that Jpeg compression makes of regular patterns in images.

In fact, if the image is clean and natural, (where the above example is clearly not) Jpeg does a pretty good job of only removing the patterns our eyes don't notice... the trouble is, when this image is damaged (either by age, or equipment inaccuracy) those areas our brain ignores as regular patterning, are exactly what you need to recreate an estimation of the damaged area.

The Jpeg compression technique (Macro-block Wavelet compression and Discreet Cosign Transformations) is used in DV camcorders (Motion Jpeg), DVD videos (Mpeg) and DVB (h264 / mpeg4)but at least in those forms you have multiple frames with which to smooth out the errors caused by this lossy (messy) compression. Analogue Film, Video and Broadcast also introduce artifacts from analogue compression, but these are of an entirely different nature.

PNG, TGA and TIFF all compress (some optionally) reasonably well, most of the time, without needing to modify the image in order to make it compress better, and frequently compress an image that has previously been compressed as a Jpeg worse (bigger file) than the raw image before Jpeg compression.

If you consider the images above, the one on the left contains many identical repeating patterns of pixel formations, which can be grouped together and replaced with a single <insert here> marker to reduce storage space. The image on the right has destroyed the identical nature of those blocks of pixels, thus destroying it's loss less compressibility. I also mentioned that there where no more than 6 different colours in any of the pixels in the original image, but Jpeg compression has clearly blended, merged and deformed those colours producing many many more shades and hues.

There are very few really good reasons to use Jpeg. Here are a few :-

  1. My mobile phone only stores images as Jpeg. (common, sad to say)
  2. My web host only accept Jpeg images. (not common, but all sorts of weirdness happens with free hosts)
  3. My web browser (or my target audiences) is really really old and can only handle Jpegs, Gifs and the antiquated AOL ART files. (And then, only for full colour images)
  4. I really can't afford all the memory cards I would need to store the 500 shots I want to take of this once in a lifetime event in raw format. (And you are willing to sacrifice quality for quantity)
  5. My Digital camera is so obscure, I can't find a program that can read it's raw files. (O'Rly? £20 in Tesco will get you a 1.5 MPixel camera which is pretty standard)

Additionally there are a number of times when you should never use any form of image compression which sacrifices image quality for storage space:-

  1. For medical photography. (Is that a tumer or a Jpeg artifact?)
  2. For scientific photography. (Is there a star in that consolation? IDK could be just compression noise.)
  3. For historical archiving. (Have you restored and preserved ageing film? Or have you destroyed the detail it once had with compression noise? Hang on... did Hitler shave for this picture, or has Jpeg just eaten his moustache?)

We are making great strides in restoring film and video footage. With motion mapping you can average out noise and enhance detail, you can even restore lost focus, (sometimes) or interpolate the motion, or resolution between frames.

Consumer demand for High Definition (HD-DVD or Blue-Ray) reproductions of "Frank Cappra's - It's a Wonderful Life" and NASAs increasing frustration with cut-backs forcing astronomical observations back to earth bound measurements are driving us to produce ever more accurate photography by increasing our data set to weed out anomalies.

An very nice technique for non-videographers / cinematographers is called HDR (High Dynamic Range). The process involves taking two pictures in succession, either at different exposures or with and without artificially lighting the subject / scene. (With and without flash, for example) Where you would previously have had to accept an exposure in the middle, which lost some very bright detail and some very dark detail, you can now combine the two images to produce a very natural looking composite which contains great detail both in the very bright and very dark areas.

But sometimes, a single image is all you have. You can't fire an x-ray machine at a man 100 times and average out the differences just to find out what's wrong with him, because the radiation would kill him anyway. You can't go back in time and ask Jesus if he'd mind wiping his brow on a few more shrouds just so we can get a better look at his features either.

If you only have one image to work with, get as much detail out of it as you can. If it's fuzzy or grainy, that doesn't matter. The fuzz and grain is probably in some way related to detail you might just still be able to make some sense out of. If not now, than just wait a while. New ideas are popping up all the time.

Of course C.S.I.s getting a perfect mug shot of a felon off of a hub cap in the distance on a re-used VHS recording from a security camera is complete science fiction. There are too many analogue compression artifacts in that too, and sadly the poor alignment of scan lines in VHS media makes many motion compensation tricks pretty worthless... for now.
Interestingly, if they used Betamax tape, or film the result would be much more plausible, even than a modern DV tape.  O_O

When trying to digitally preserve your aging snaps and photographs, please please please:-

  1. Scan big. (Laser printers start at 1200dpi at the bottom end of the range, don't go below that if you can help it)
  2. Scan more than once. (You can combine multiple scans to remove artifacts introduced by the scanner mechanism)
  3. Scan at more than one brightness and / or contrast setting. (You can increase the range of luminescence in the image that way, so that increasing contrast after the fact does not produce banding)
  4. Never, ever, ever compress using lossy methods (Jpeg, Mpeg, Motion Jpeg, JP2000, Gif etc) until you are sure you have finished working on an image, and want to distribute it to people who may not have anything other than a digital photo frame or mobile phone on which to view it.
  5. Try to keep a backup of the uncompressed image somewhere. The Jpeg reign of terror has to end sometime soon, and it will be nice to have a clean copy to convert to whatever we standardise on after that. ;)

18 January

Blue Track Mouse? Don't want not stinking Blue Track Mouse!

Today, I received this communication from Microsoft TechNet... where I normally only get the Newsletters, which are also becoming very boring and very Ad / Spam ish.

#

Play the Tech Timeline* game for your chance to win an Explorer Mouse and test kit
Mouse technology has come a long way since the clunky roller ball of the 1980s. Our experience has vastly improved; first with powerful optical tracking, and then with the precision of laser. But now – in a move that T3 magazine has called 'a revolutionary step forward in mouse technology' – we've gone a step further, and combined the two: introducing BlueTrack technology.
New Explorer Mouse is a force to be reckoned with. But don't take our word for it – put it to the test and be one of the first to experience its benefits. Play our Tech Timeline* game for your chance to win one of thirty Explorer Mouse and test kits.
Microsoft BlueTrack technology – bringing total reliability to your mobile lifestyle

Microsoft has developed BlueTrack – the world's most advanced tracking technology – and added it to the new Explorer Mouse. Remarkably, it works on virtually any surface**. So, whether you're working on your living room carpet or a park bench on a sunny day, you can be confident your mouse will keep performing at its best.
BlueTrack technology works by emanating the light off the surface it's moving over. High-angle, imaging optics generate an exact replica of the surface, enabling it to respond instantly to your hand movement, wherever you are.
What's more, the Explorer Mouse is distinctive, with a chrome trim, glowing blue-light effect and curved-for-comfort surface, so it always stands out from the crowd. In the words of Computer Shopper, 'Microsoft's hardware team has a history of introducing small innovations that quietly change the way we use our PC'.
*Terms & Conditions
**Except glass

lol. Now I must protest. I use (and have used, since 1998) a Logitec TrackMan Marble FX. This "mouse" predates optical and laser technology, and "works on any surface" including glass, and allows me (most importantly) to move the pointer without moving my hand at all.

In fact, it's greatest let down (aside from the fact that it also pre-dates the "scroll wheel" and requires custom drivers for the 3rd and 4th buttons which do not support modern operating systems, since more than 2 buttons was also very rare when it was made) is that it was only ever available with a PS2 connecter, and as such is very difficult to attach to a Mac... and some of the more recent laptops.

What's worse, is that the first test in this "game" is to install Microsoft Silverlight... Err, no! There are many people who, for various reasons, cannot install Silverlight. The Internet is for everyone, I will not promote a web based technology which enforces that pages using it are only accessible by people using one companies products.

You cannot load a Silverlight page on your Mac, iPhone, Android Phone, PS3 or (without a hell of a lot of faffing around and good luck) Linux / BSD machine. You might as well make an entire web site out of HyperGuide stacks. At least enough was known about these technologies that third parties could read HyperGuide stacks on something other than a Mac.

The same problem presented with RealPlayer, and is happening again with Flash in the hands of Adobe. Under Macromedia, the source code to the latest Flash player plug-in was always available to the public. That source hasn't been updated since Adobe took over. So now, SVG animations play more universally than anything that requires a version of Flash plug-in above version 8. Especially as Adobe will not (cannot?) release a version of the plug-in that works on 64-bit systems other than Apple Mac. (No 64-bit Linux or Windows versions exist)

If 64-bit Windows users are forced to use Silverlight for rapid multi-media and Mac users to stick with Flash, with X-Windows based POSIX users (including non-Windows mobile devices) running only SVG, the World Wide Web is fractured. It is no standard at all.

And back to the "mouse" issue, IMHO, we do not need more reliable "mouse" technologies, what we need is less reliance on a "mouse" like device to operate the pointer.

I have used tablet PCs, and find the touch-screen stylus method of mouse control very convenient... but playing WoW on a desktop setup with my arm out stretched pressing a stylus into the screen to guide my character would not be very beneficial... and in tense raid times I think I'm likely to skewer the LCD trying to kill an instance end game boss.

I quite like the touch screen methods of using an iPhone or PDA. Especially Apples 2 finger approach. Though anyone from my home land who things giving Apple 2 fingers is funny and cover their smirk in my presence. ;)

Graphics tablets have never really taken off, and shining lasers into peoples eyes to track the location on the screen they are focusing on is not great for the health of ones retina.

Touch Pads seem to be a love hate thing. I've used good ones, and bad ones... and I still feel that, for the portability of the laptop (that is, when using it without a desk to hand) the touch pad still exceeds the practicality of any mouse. But my beloved Trackball still wins out.

I've seen pointers which are similar to laser pens for presentations, where the on screen pointer moves to where ever you point the puck in your arm, and this can be used whilst wandering the auditorium or lecture hall. Unfortunately you can't easily do the same for a keyboard, and it's hard to operate a keyboard with a puck in one hand anyway.

You see, the thing about a touch pad, graphics tablet, touch screen or trackball, is that you can use them, even on a duvet cover, or waterbed. You don't need any surface, solid or otherwise to perform your pointing operations. The tablet, touch pad and touch screen provide their own surfaces, and the trackball can be held in the hand and rolled with fingers alone. It's a little like playing "round and round the garden" upside down.

For the time being, I'm happy to stick with my trackball. But I think the days of the "mouse" should surely be numbered... and I wish people would stop trying to re-invent the wheel. I have seen no mouse which is better than the roller ball versions which came with Classic Macs, Amigas and as expansions to the old 8-bit computers. My AMX Mouse on my ZX Spectrum was just as effective as any modern laser tracking mouse, and just as flawed. The tracking technology is not the problem... it is the design for an analogue input device.

I have seen drivers which allow one to use an analogue joystick to control the mouse pointer, and this was pretty instinctive, and did not require any surface to mount the joystick on. (a lap is usually sufficient)

I remember that the Mouse on my Atari ST was quite flakey, and so I often resorted to using the Alt and Cursor keys to move the pointer... just as I often resort to "mouse-keys" Accessibility feature in Windows. It's not accurate enough for art and design work, but it's often easier than taking your hand off the keyboard just to click send in some idiot written instant messenger which doesn't assign a key on the keyboard to the send button... like "Return" would be nice. I'm sure many keyboard lovers have come across programs which use a point an click control which cannot be activated with the keyboard, when the main program is all about typing something.

At any rate. I think that any technology which claims to revolutionise the state-of-the-art mouse, at this point in time, is rather like a state-of-the-art hammer. As in, it's just a hammer. There are better ways of banging things into other things, and though many can do things a hammer alone cannot, few are a versatile as a hammer... even so, it's still just a hammer, and, for short term use, a lead or stone hammer is just as effective as a carbon-fiber super-hammer.

For long term use, my uber trackball is still going strong. It's accurate, it's precise, it's fast, it's got more buttons than modern generic drivers can cope with, and it's so outdated and unsupported, was it really necessary for it to be so good in the first place? Either way... When it finally breaks, or PS/2 ports stop being fitted in computers, I will have to cry long and hard. There will probably be great expense involved in it's funeral.

Come back Apple Mighty-Mouse, all is forgiven?...

DUDE!!! IT'S JUST A FREEKING MOUSE!!!

12 November

ESTsoft Corp. ALZip 6.7 Iceows and Software Patching

imageOkay, so I'm going to recommend (somewhat unlike me) a commercial program. That is one which is neither free (as in free beer) nor free (as in free speech). Well, actually version 6.7 of ALZip is the last free beer release of this nifty little program, and I recommend you download it and give it a try right now.

imageI don't think I'd ever be prepared to "pay" for a Zip program, and if I where to be convinced, it would have to be something like Iceows (formerly ArjFolder) but with a true integration with the desktop explorer (Windows Shell Namespace Extension) ala imageZip Folders  (ZipFldr.dll) or Cabinets (CabView.dll). Iceows is the closest I have seen to this (surely the most transparent) approach to archive handling in Windows. The only difference between Iceows archive folders and Zip Folders or Cabinet views is that the tool band object in the browse window is only a close facsimile of the windows explorer toolbar, not the real one.

If you are any good with Namespace Extensions and fancy having a go at this, I encourage you to try. ZipFldr.dll and CabView.dll can be transported from one version of windows to another, and they acquire the visual look and feel of the host system, not to mention the users folder view preferences. The toolbar (an explorer band object) is specific to Iceows views, and looks worse and worse the further we get from Win95, where it was originally developed, not to mention that there is no x64 version of it, so it cannot integrate into Windows XP x64 or Vista x64 editions. Development seems completely stalled, and Iceows is ©1998-2003 Raphaël et Béatrice Mounier closed source freeware... as in free beer. Drink it once and then it's gone.

Back to ALZip 6.7 and why I should be recommending a program that will (if you ever need to update from version 6.7) cost you money. Well, ALZip is far more simple to use than many of the other commercial achievers (WinZip, WinRAR, WinAce, StuffIt etc.) and far more presentable than the typical OSS Windows GUI Archivers (7Zip, PeaZip etc.) The most important factor, for me at least, is the context menu functionality, and the number of archive formats supported. ALZip supports 7z, RAR, ACE, ARJ, TGZ, BZ2 as well as Zips and Cabs. It's easy to select which formats it should be associated with before you even run the program. The explorer right click extension is compatible with both x64 and x86 (32-bit) versions of Windows Explorer. The main user interface is still Win32, but since that isn't required to integrate within the 64-bit explorer, it will run happily under Windows built in 32-bit compatibility mode. (you won't even notice unless you check the task in the task manager)

It's not too intrusive, for a commercial product but does contain a number of banner Ads for ALTools in it's user interface. These don't connect to the Internet, and there is no spy-ware, and no nagging about unregistered versions. One of these I found particularly annoying, but if you find this also, you can always use my little patch (obtainable from the right) which removes the banner from the Create, Add, Extract progress dialogues. This is a simple Resource Hack, and modifies no code in the program. So I'm not redistributing modified versions of their code (I hope) only my own code to modify theirs.

Use of this patch, is therefor mine. I give it to you freely, but if you use it, you may be breaching the agreement of the terms of the licence you have with ESTsoft Corp. If they ask me to take my patch down, I will replace it with instructions to allow you to recreate the modification your self... This is just information, given freely and without warranty. I can also assure them that I will not make any such information available for future versions of ALZip, for which free use is no longer available. So if you are not interested in paying for a GUI archive program, I recommend that you get a copy of Version 6.7 before ESTsoft decide to take it down. (as is their prerogative.)

Hacking

As a technical side note which becomes obvious as you start rooting around in the code of ALZip. The first thing which stands out is that the program is written in a reasonably recent version of Borland (Now Embracadero CodeGear) Delphi. Which means is could quite easily be ported to Linux, OS X, .Net with current versions of Delphi or the Open Source Lazarus Project. (I just love linking to them ;) )

The next point one notices is that the entire project UI is built with, and skinned using BusinessSkinForm component from Almediadev. This product, in it's self is not free, and not only suggests that ALZip will never be OSS, but also explains why they are going to have to start charging for it. Hey, it's only about 20 bucks.

The final point I will make, for anyone attempting to customize their own copy for their own personal use without the aid of my patch, is that you will find that the AZMain.dll file (which is used largely like an overlay in a DOS program rather than a true Windows Dynamic Link Library) is encrypted with ASPack. Quite why people insist on this nonsense is quite beyond me... really. If you search your favourite web search engine for UnASPack you will probably find Aaron's Homepage has a copy really quick. It will turn the 303k DLL into a whooping 1.5Meg DLL revealing the Delphi nature of the library, and a number of initialisation routines, and stream dispatchers for handling files and archives. It will also load more quickly and use less system resources during startup. Not withstanding the tripling of the hard disk space it occupies.

Aside from that UnASPacking this DLL (the only one encrypted) get's you very little, unless you want to Olly it to determine how to interface with the sub archive type handling libraries. AzCDImage, Az7z, unacev2, unrar4.

If you really want to get rid of the final banner in the main browser UI, I should look at the YN_BannerCreate function in ALBanner.dll. This library doesn't use DFMs compiled into the resource area of the executable image, but rather manipulates VCL components programmatically. So you will have to do a little Ollying, with a DFM / VCL eye. The object you are looking for is probably TYNBannerPanel.

However, the images are encoded within that library as GIF streams, so you could always take the easy route out, and simply overwrite them with small, single colour GIF images (which will encode to almost nothing and thus not overwrite anything vital). The TGifStream component should stop once the GIF image has been rendered to the DC (Device Context) of the Panel and ignore anything beyond that.

My entire point here, is that the puny protection here doesn't make this program secure from tampering. (though I wouldn't dream of doing it with later versions) And also, that it doesn't really warrant the expected future cost, to users. I do realise that the choice of tools used to rapidly develop such a nice UI require income to support their license, but... now that we have it, I for one wouldn't care if it was written in Lazarus with their VCL equivalents. If it where, it would take on the appearance of my system too, not something that looks half Windows XP and half SUSE Linux.

A Note to the Developers at ESTsoft

ESTsoft; I wish you all the best, your code is great. Unfortunately, I don't think I'll be buying a copy, and I fear this will be the reaction of many private individuals. If you can't turn a profit, please think to release your code to OSS. It's too nice to waste, it's just not worth the cost of your development tools. I fear that your product, while a great program and useful tool, is just too expensive to produce for the value it brings users.

02 August

Microsoft has fun at my expense! (RTF Specification Version 1.9.1)

The beauty of Rich Text files

On the 19th of March 2008 Microsoft released the latest incarnation to the Rich Text Format specification 1.9.1 to go with their new Office 2007... Interesting because Office '08 is available on the Apple Mac but the document doesn't mention that, and only shows images from Word 2003 on the PC.

Anyway... I'm working on a program to replace WordPad, not because I don't like WordPad but because I do, I just think that since it hasn't changed since Windows 95, and then not much from Write on Windows 3.x it could do with a little more... especially in the way of internet appliance... read and edit html, blog posts etc. So I'm writing my own, keeping the UI as much like WordPad (or Write) in it's default configuration as possible, keeping the code size small and the startup time fast, and ensuring that my replacement can do everything that the original can.

Access to the Rich Edit component (richedt32.dll or richedt20.dll) is a really quick way of maintaining a simple Word Processor, and it's mostly what has allowed WordPad to remain as current as it is. Every time Microsoft updates Rich Edit, WordPad gets that update automatically, because it's really just a user interface to that library. Of course, as the Rich Edit component gets new features (not just improvements to existing ones) WordPad falls behind... and Rich Edit is notoriously bug ridden from an API point of view. Interfaces that are documented as fixed and working fine don't actually do anything, and ones which are documented as having bugs under specific documented circumstances don't present those bugs under those exact same circumstances, but it's more a documentation issue than a code one... and, as previously stated, the library is now very very old.

So I'm using Rich Edit to get my program up and running. I'm passing Rich Text between it and my filters and modifying it at quite a low level. I'm testing my program with versions of richedt32.dll that came with Windows 95 and ones that come with Vista, and ones that are in the latest SP for NT4 and the compatibility libraries that are packaged with Wine. (Just to be safe) I'm getting Rich Text files from Macs from Next from Linux and Word Processors of every caliber.

"I really ought to know about the state of play with the format it's self, and have some idea how to re-implement or replace this library should the need arise in my program." I thinks... So I asked the inventor of the standard (Microsoft) what their present documentation on this open standard for information interchange is.

I received a docx file... (though it seems they have a .doc up now too)

Okay... I've had docx files before, I've got translation filters for O2k-2k3 and for OOo. They work, not great, but they work. Oh no! Not on this they don't. The tables are a complete mess and, though I can read the words... making sense of the document is a nightmare. It's rather like a complex scientific journal with all the diagrams thrown away, run through a cheap 1980s OCR program and turned into UNIX ASCII text file without any formatting, only worse. There is formatting, it just doesn't resemble the original formatting of the document in any way shape or form.

I don't want Office 2007, I don't use the copies of Office 97, 2000 or 2003 that I legally own. I much prefer to use Open Office, or Word Perfect Office or anything other than Microsoft Office. I've said it before, I'll say it again, it's not that MS Office is bad, I just don't like it. I know they spend a lot of time and effort on getting their UI right, but I'm happy with WordPad, I'm happy with a DOS command prompt and bash scripts, it's just who I am. I'll spend hours replacing Explorer with little third party desktops, Icon widgets, launch bars and file browsers, I like things to look good, but I need to be able to make them look and act the way I want, not the way your panel of testers say is ergonomically correct for the majority. There is no "I" in democracy, and "I" want "My PC" to work how "I" want, not how the majority vote it should, it's not their PC, they can get their own.

So now I've used Microsoft's Live Writer software to write and upload my winge about their best seller onto their servers, they can shoot me for stealing their software (for about a day), I implemented the 90 day trial of Office 2007 in a virtual machine, read this file and promptly removed it again by going back 1 snapshot. About as legal as I could get away with, and way too much effort just to read a document that is supposed to enable free transfer of information between diverse systems. (it's only not really legal because it's wasn't my copy of the trail, and I because I undid the drive rather than uninstalled, so I could theoretically install it again sometime down the road)

So how Open is Microsoft's Open XML file format? Not very it seems. I can read an ODT file anywhere I can read Google, which is even on a Phoney Praystation! Yet I can't read a docx even on my Microsoft Vista PC with Office 03... and if reports are to be believed, it will be hard to read them on versions of Office yet to come if MS are to implement their own ISO standard format which isn't compatible with the existing docx at all.

Sigh.

Office 2k7 Hate (my hate, you can love it all you like)

While on the MSO chat, and yes I know I was going to bring you W2.0 Spreadsheets next, I will get there I promise, I have to add my tuppenith worth on Clippy and the Ribon. Yea, I hated Mr. Clippit (otherwise known as Clippy the paperclip Office Assistant) though I will miss Paws (the Cat) and Albert (the Genius) more importantly, I miss menus and toolbars. I only wanted to load this file and save it as something I could use... I ended up spending the day loading XP, Office 07 and boxing with a constantly changing blue ribbon... Every time I found an option I wanted to use, I'd move the cursor to where I wanted to apply that tool or effect and... hey! Where'd it go? Everything's changed!

Blue Ribbons should remain fiendishly tasty and reasonably priced chocolate wafer snacks and stay the hell off my PC is all I can say. You want a revolutionary new design? Try putting the toolbars down the sides instead of up the top... have you tried using O2k7 on a widescreen display? Very popular these days... not very practical for word processing, but with OOo I can pin all my toolbars and property pages, document navigation and defined paragraph formats on the sides, maximizing the vertical document editing space, and making practical use of the extra screen width. OOo wins!!! The Office Suite of the future! Hooray!

Okay... so I managed to get O2k7 running in my VM (ugly as it is, at 1024x576 I could get about 3 lines of 8 point text in at page width before the ribbon, and had 3 pixel high text at 80% document view where I could at least see a whole paragraph.), and loaded the docx. Hooray! The page numbers matched the pages, and the tables had columns below them that actually related to the column headers.

So I saved my document out as a .doc, and a .odt, and an rtf, and a PDF and an XPS. "That, I should be able to do something with" I thought, and ditched 2k7 like the shallow painted tart it is.

Getting Something Useful

Reading any of these documents in anything else was quite a trial however. WordPad seemed to do the best job with the Rich Text file. But of course it doesn't support document links, page breaks and a myriad of other features that are actually quite useful in a 278 page document.

The PDF and XPS are fine for reading, but the document was locked from editing or copy and paste. So copying the source code would be a matter of printing and re-typing. That's not very practical either. The .doc file read back about as well as the docx via translation filters, and it turns out (after much re-working of the internals of the file, trying to maintain the layout and feel) that most of what is wrong with it, is that it has been written by someone who has no idea how to use a professional document editing tool like Word. (or rather, it appeared to have been worked on by several someone's, at least one of whom had a very good idea how to manage a large document in a decent word processor, but sadly they weren't in charge of managing the consistency of the document)

The tables, messed up, because they were full of 0 width columns that had been created part way down the table by splitting cells and rather than removing un-necessary columns, they were just shifted along until they met the boundary of the next and / or previous one. Fields had (at some stage) been used to create the page numbers in the contents section, but then they were converted to constants, and links were made to _toc1354375138 named bookmarks which resided at the same point as a decently named and perfectly linkable heading.

I know I've taken word processing courses, and am IT literate enough to get around these things... I know that many of my collogues in programming and system maintenance haven't and or aren't, but surely Microsoft could get a secretarially trained document specialist to collate the information from the techies?

Anyway. I reworked this document in OOo Writer, and in AbiWord and in a little gem known as Jarte (which reads both the .doc and .docx formats as well as .rtf, with the right filters, but sadly goes the way of 2k7 in UI design) and now I have the document in a form that is instantly useable by almost anybody.

One Document to Rule them all...

So, before I upset Microsoft again by republishing their hard work in an edited form, here are some interesting details about this document.

Size (one of the reasons Microsoft cite for the switch to docx):-

(Source) Format Initial Size Simple recompress Advanced compression
Word .docx

0.98M

Already PK Deflated  
OOo .odt

0.74M

Already PK Deflated  
Word .doc

11.9M

1.75M PKZip 0.68M WinRAR
Word RTF Export

55.8M

1.89M PKZip 1.08M WinRAR
Word PDF Export

7.33M

3.06M  
OOo PDF Export

11.1M

1.92M  
Word XPS Export

4.59M

Already PK Deflated  

Okay... so docx is a lot smaller than a .doc... but not all that much smaller than the zipped .doc, and .docx wont zip because it's already in a .zip file, just like an .odt.

MSOffice makes smaller PDFs, but it used JPEG compression on images even against my wishes, and made a horrible PDF which compressed worse than the originally larger OOo version.

By horrible, I mean the navigation is just every possible link to location in the left hand side with no levels what-so-ever. OOo made a PDF with pull out navigation tree that mimicked the contents of the document.

RTF actually zips quite nicely. I'd say a PDF in a Zip is a pretty good binary distribution form.

XPS files are pretty big, and not so easy to navigate as PDFs. I don't really see what Microsoft is trying to achieve here... other than that it is a plain text XML format in a Zip just like odts and docxs so it doesn't need decompiling to edit the way a PDF does, a simple unzip will do.

The formats which aren't zipped (or compiled binary) already actually pack down smaller than most of the Zipped xml formats... so we're really not saving any space at all, intact, we're loosing it, you can't RAR, ACE or 7ZIP a zip it just doesn't work. (Most modern PDFs should be Flate compressed, the same as a zip, though how thoroughly is up to the creator)

Also of interest, I have discovered that if I unzip a .docx .odt or .xps and pack it back up with ALZip (which isn't the best Zip program by any stretch, but it's cute, small fast and very easy to use) the files become smaller... changing the resultant zips extension back to .docx .odt or .xps makes them still perfectly readable in their new smaller size.

I've tried to get this document to open legibly in as many readily available packages as possible. I've tried Atlantis Nova, Angel Writer, QJot etc etc all of which I consider in some way to be competitors for my up coming WordPad replacement.

Most struggle with the tables. Some, most notably AbiWord, struggle with the sheer size of the document. Jarte reads the whole file, but stops counting the pages when they reach 59, and only saves that many pages.  Angel Writer copes with all the formatting best, but doesn't implement pages or wrap to ruler so you can't really treat it like a mark up for a paper document. WordPad copes the best, but again, page breaks just don't happen as it has no idea what a "page" is till you hit print preview... but at least it knows what the ruler is. The Math functions are very new in Rich Text, and most either ignore them, or turn them into WMF objects inserted into the document. Saving from MSO to an odt file removes them all together, replacing them with the plain text of the variables and little or no math symbols.

The main editing I did is in OOo, my favourite of all. This required considerable effort to take full advantage of the package and it's different (broader ISO standard) Open XML document features.

Open Document Text files implement Math based largely on the older Open XML Math functions of MathML, where Microsoft's Open XML documents are based on their own proprietary markup.

Apparently, Word (prior to 2007) couldn't include math layouts at all. So I'm guessing the Math Markup tool that I used to use in Word 97 was simply embedding a DDE Object. I'm sure I used to do something like this in Lotus AmiPro too back in the 90s, but I know it's something that the LaTeX people have winged and whined about for years, so I guess I'm not all that surprised.

From what I can see, OOos Math injection works out in such a way as you could almost execute it, though you might have to strip a few fluffy formatie bits out here and there that will make no difference to the function of the formula at all, just make it look neater. Microsoft's is much more like laying out a User Interface or a Web Page. It would never run, as code, but the presentation description is quite exact, giving exact measurements in twips and the like. This smells of fluff to me, and doesn't make for a very transportable language at all. Nobody (that I know of) other than Microsoft use a twip as a measurement... and when you're looking at a hard copy document, surely a point or fraction of an inch would be more helpful.

Anyway, what I can agree on is some of the fantastic ways to align formula elements in Microsoft's format. In OOo, the best means of doing this (according to the help) seems to be to align to some edge or other, and pad with one of two relative width white space items, or a phantom object. Microsoft use phantoms too, you can give them no width, or no height but assume their other dimension is the same size as it would be if you included the code / function which it isn't going to display. That doesn't make much sense, but if I have a word "fourtytwo" and I want to line up the word "ant" to one side of it, and the word "dog" to the other but don't want to see the word "fourtytwo" just yet, I can use a phantom of it to measure how long that word would be in the present font and style, and align "ant" and "dog" to that phantom without displaying "fourtytwo".

The Win32 API has a similar function to this in it's repertoire, and when arranging user interface components that appear and disappear as they become relevant (like a ribbon) but must align up regardless of the users preferred font and screen DPI whether they are visible or not (so they don't more around like the ribbon) it is essential to know how long or tall a string will render in a given font at a given DPI without having to draw it just so you can measure it's bounding boxes.

In Math it's more useful to have the brackets from one side of a formula line up with ones on the other side, even though the balance of glyphs within them may vary greatly, so it's clear that you are balancing an equal or equivalent equation, regardless of any variance in glyph ink weight. When you write a mathematical formula, your artists eye automatically does this, (even if you're a mathematician and not an artist) but for a computer, it's not instinctively clear, and since it's logically irrelevant, it can get it seriously wrong.

Regardless, I couldn't find anything listed in the possibilities for 2007 Math Markup that I couldn't do in OOo Writer. Except knowing and setting exactly how many twips might be between one glyph and another. Many things that had different ways to achieve different things in MSO, used different parameters to the same method in OOo, and some needed cheaty work around's like manually shifting the size of individual symbols relative to the whole formula to get the same basic look.

Some features Microsoft considers part of the Math, which OOo treats as object decoration. Boxes around formula, for example. Once a formula is composed in OOo, it is a graphical object on the page, just like a graph or a photo. So just like a graph or a photo it can have a border, and you can control it's justification and it's position relative to the anchor point and the way words and paragraphs wrap to it. Microsoft seems to take a formula as a paragraph, not an object on the page, and so you define it's distance from things, it's alignment and borders within the mathematical paragraph. So maybe all the talk about not having millions of ways to do the same basic thing any more in Office 2007 was all just smoke and mirrors after all. (Don't get upset, I know that's out of context and they were talking about UI design not file formatting and underlying code)

Once I had made the alterations necessary to make the formula work in OOo correctly, and display as they did (plus or minus the odd twip) I had already put considerable effort into making a maintainable odt file. So, I went ahead and saved the source for the example RFT reader to a folder and zipped it up, applied a common font to the code (because it was irregular and all over the place from various edits) and took the liberty of applying standard schintilla syntax highlighting to it. I know most of this won't print, but it makes it easier to read on screen. I re-aligned some of the comments here and there too.

Between fixing the empty half columns and broken tables, messing with formula and this that and the other the page numbers were now skewed, and as I say the TOC was no longer linked to page numbers via functions (though you can see it was at some time) so I re-built the Contents page using OOos locked contents object, and configured it to maintain the same formatting as Microsoft had used.

Sharing the Fruits of my Labours

Then I moved on to creating a clean PDF from all this. I had the one Word created, and hotlinks did work, but as I say, the side index (or bookmarks) it exported where a complete mess, the XPS doesn't even seem to maintain a document navigator. The file size of my new PDF was quite a bit bigger, but I know that OOo creates complete and clean PDFs not optimized for downloading, so I ran it through a compressor, and was amazed at the difference, the document even loaded in a flash compared even to the MS export and had it's beautiful TOC at the side, so I tried the compressor on the MS document (which I will keep as a reference to the original formatting of the document). The result was less impressive, but that may be because Jpegs don't re-compress as well as lossless images. Even so, the time taken to load is a great improvement, and the size decrease is not inconsiderable.

I'm not sure if you can do this with XPS files, but a PDF can have other files attached to it, just like an Email can. I wanted to use this to attach the zip I made of the example source file, so I attached the zip to the page where the source code starts with another little PDF tool I downloaded.

Now, you can have the choice of reading this document in two flavors of PDF, (I recommend my OOo reconditioned version, unless you are a stickler for authenticity) as an XPS or a Rich Text file, and the PDFs will have a zip containing all the source and a make file for building your very own Rich Text Format file reader.

Links to these files in my public Sky Drive can be found here, and will remain here until Microsoft take them down, or ask me to do so for them. Personally, I hope they don't take offense to my redistribution... In fact, they can give me a job. ;)    PS. The Zip contains the Rich Text export from Word 2007.

02 February

Mirohoo (Microsoft Yahoo bid)

Okay... I know I've gotten other posts to put up but I had to comment on this while it's fresh.

Microsoft have bid $44.6bn to take over Yahoo from it's share holders this Friday and it's all over the news.

Their proposed aim in this take-over? "Today this market is increasingly dominated by one player. Together, Microsoft and Yahoo! can offer a competitive choice while better fulfilling the needs of customers and partners" Everyone is pointing to the "dominated by one player" part as being a reference to Googles recent acquisitions of DoubleClick and the Googlesyndication advertising ring, AdSense and AdWords Google Analytics... which actually benefits Googles search database as well... hence why a Google search is far more relevant than a Yahoo one in most cases.

TBH, I was a big Excite fan... I flowed easily to AltaVista, but haven't used them over Google in some considerable time... largely because they now use the Yahoo engine and database... why Yahoo still keep the AltaVista page I'm not entirely sure. I used to use AllTheWeb... but in all honesty, their database has become seriously out-of-date, and the specific FTP and Gopher searches are no longer available in the usable way they used to be. So Google is in-deed the only real player in the Search engine field. Microsoft's Live! search out strips a Yahoo search (IMHO) and their HotBot is nearly as good as a DogPile search. So what do they think Yahoo will gain them over Google?

Let's look at the three companies side by side, to see what they offer:-

Yup... I used a Google Sheet. Well, I still don't even have the Beta of Office Live! I am promised... so...

So... from this we can see that Microsoft would gain only 6 points in competition with Google. They could also gain 10 services that they don't currently have, and neither do Google. However, we could potentially loose 34 services from Yahoo, which Microsoft already have.

Looking more closely, Microsoft has always been keen to leave Avatars to others... so I think they will just kill that if they take over Yahoo. I don't really see why they'd want Yahoo Notes, unless they want to expand Live! Lists. Advanced searches are not really in their interest, as narrowing your search criteria narrows their ability to throw marketing at you, and on that point, Microsoft are not very good at marketing anyone but Microsoft. I don't think they will pick up all the companies who currently place Ads with Yahoo, because they probably go to Yahoo because they are Microsoft and / or Google competitors. Microsoft's Live! search technology, while no Google or Spotlight, is as good, if not better than Yahoos' engine so they can't want that, and we should also remember that we will be loosing AltaVista as well as Yahoo.

If we loose Yahoo, we loose Yahoo Widgets, (formally Konfabulator) which spawned Apples Dashboard Widgets... and for what? That code can't be re-used in Vista Side Bar or Live! Gadgets. It's Java not .NET / Avalon / WPF.

If Microsoft want any of that lot... It's probably Yahoos mobile technology (but IMHO their market placing would kill it, even it they had it) and Bable Fish. There was a time when Bable Fish was a huge asset on the Web... now it has many competitors, and some are based on considerably better (faster and more accurate) engines than Bable.

I can't see MS carrying on with GeoCities, and Bill Gates recent philanthropy aside, if MS wanted a Microsoft for Good they could do it without buying Yahoo.

No... I don't think this is about increasing consumer quality and choice. I don't think it's about Microsoft beating Google, or even Google beating Microsoft. It's about Microsoft kicking Yahoo while they are down. It's about one less competitor for Microsoft.

Oddly enough... I think it could well become more about one less competitor for Google, which will place Google in an even better position to make Microsoft's strangle hold on PCs irrelevant.

If Microsoft wanted to go up against Google, they should start selling Microsoft Linux, buy out ThinkFree Office, SlideShare and or Wordsmith. They should stop working on Silverlight and concentrate on technologies closer to AJAX and XUL, which can operate across multiple platforms.

I know Microsoft have this mind set, that if it's not Windows only it needs to be bought out, and versions for other platforms killed off, or just stamped on till it dies, but the Web is changing how we look at applications. It really won't matter if you're using a Windows, Linux, Apple, BSD, Sun, X-Box or Playstation to use your applications... just so long as they can get on line and run Web 2.0 XHTML, AJAX, Java, JavaScript and Flash.

MS Office will not stand up to that, unless it steps up to it. Windows will not last in it's present incarnations... not even Vista. Linux' Wine and ReactOS are already more compatible with legacy Win32 applications than 64-bit versions of Windows. In Wines case, even on 64 bit versions of Linux or BSD. .NET is an arse when compared with present Sun offerings and delight the Mono Open Source cross platform equivalents, it's just doesn't port as easily or as well. WPF and IE7 are too little too late.

Yes, I use Windows XP. I own Vista, but I don't use it. I don't see the point in using it, as opposed to SUSE, PC-BSD or MacOS. If I had the cash or I could freely run it on any hardware, I'd use MacOS. I can run both Wine and Windows XP proper, and get all the benefits of true Unix and Mac only applications. Oh yea, and IEEE1394 works properly with ease without having to boot the thing and keep winding tapes back and forth and rebooting the computer just to remove a drive with a filesystem that isn't 30 years old. If I wanted the best value for money modern OS I would probably use Ubuntu or PC-BSD.

The stand-alone Desktop PC is becoming as irrelevant as the Mainframe computer, at which point, your choice of OS, and your choice of browser becomes just that... your choice. It makes little or no difference, in the long run, with your use of the computer / terminal. The applications you use, the things you do, games you play etc will all have to be on-line services designed to fit any box (with sufficient processing power and audio visual capabilities and user input) you care to connect with.

Even after the beating Sun took from Microsoft over the Java court cases, I think Sun (especially after their recent acquisition of mySQL, and the prolific use of OpenOffice on any non-Windows platform) and Google with their cross-platform in-browser technology and well known brand are far better suited to be a threat to MS than Yahoo, who are simply a service provide lightening the Microsoft server and competition load. To kill off (buy out) Yahoo is somewhat political suicide IMHO. It gains them little and looses them a fair chunk of a market they need to move to, if they are to continue their success story.

I wouldn't normally mind the idea of Microsoft going down the swanny, but taking user choices for the future with them is offensive to me in the extreme. And in all honesty... this new emerging market they need to move to, is one I think they are well positioned to server users well in. One I would like to see them succeed at, but one which I think they are going about entirely the wrong way.

I can only urge Yahoo shareholders not to take this offer. For the sake of Yahoo, the sake of Microsoft, and the sake of the Internet community. I also urge users of the Internet to take up in Yahoos defense, should they defy the man from MS publicly.

Okay. Rant over. Please feel free to comment.

 

Ron

Occupation
Location
Interests
Trained IT Tech Support / Network Admin. Casual graphic design, media artist.
Thanks for visiting!
Please wait...
Sorry, the comment you entered is too long. Please shorten it.
You didn't enter anything. Please try again.
Sorry, we can't add your comment right now. Please try again later.
To add a comment, you need permission from your parent. Ask for permission
Your parent has turned off comments.
Sorry, we can't delete your comment right now. Please try again later.
You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
Complete the security check below to finish leaving your comment.
The characters you type in the security check must match the characters in the picture or audio.
Hello, found your site very interesting, and did'nt want to lose where you was so added you as a friend, hope thats ok?
3 Aug.
Good Windows Live Space if I do say so myself, look forward to further updates  Marty.
14 June
Public folders