Daniel O'Connor: January 2006

Tuesday, January 31, 2006

Patterns Patterns Patterns: Filtering Results

Say you have a bookmarks site, like del.icio.us. You've got squintillions of results for a search, and your boss gets a feature request for "being able to organise the information by the time the sun rose when the bookmark was entered".

Your boss, being your boss, decides that this would make a fine feature. You get stuck with it.

The first step is to take advantage of your kick ass indexing (you've done that, haven't you?). You grab a whole lot of search result IDs, we don't care about the rest of it at the moment.


function search($db, $filters) {

  //First we get a hell of a lot of id-only results. You do use PEAR::DB don't you?
  $sql = "SELECT id FROM tags WHERE someOverlyBroadCriteria = true";
  $tags = $db->getAll($sql);

  foreach ($filters as $filter) {
      $tags = $filter->search($db, $tags);
  }

  //We've done all of our filtering, now actually retrieve the information
  //and pass off responsibility to some rendering component
  $sql = sprintf("SELECT * FROM tags WHERE id IN (%s)", implode(', ',$tags));
  return $db->query($sql);
}

We've passed in an array of Filter objects ($filters), each which has a search() method defined. The search() method takes in an array of IDs and returns one on the other side.


class EskimoFilter implements iFilter {
  function search($db, $input) {
      $sql = sprintf("SELECT id
                           FROM tags
                           WHERE id IN (%s)
                           AND tag LIKE `eskimo%%`",
                     implode(', ',$input));
      return $db->getAll($sql);
  }
}

The beauty of object oriented programming is that noone cares what's on the inside so long as it works nicely with the others on the outside. So, adding more filters into this workflow is pretty simple: you just have to be careful to pay attention to the order in which you do it.

You can easily store customised filtersets in a database on a user by user basis - and next time someone whinges that they need to look at search results created by users with only 7 pet penguins, you don't panic.

Whoever said design patterns weren't useful, eh?

Nina Starr / Loxton Murder Update

Updates: Everything, including James Hall is sentenced.

When I reached home I discovered something very surprising: if the web was powered by gossip, I'd have nothing impressive to tell people about what I learned in the day.

Donshae informs me of the arrest and charging of a 19 year old male (I know the name, I just can't remember it) who was supposedly heavily stoned when he murdered Nina Starr.
The Advertiser confirms it but withholds the name, also releasing absolutely no details about how Nina Starr actually died.

Rumor (possibly unfounded) has it that the "particularly vicious" injuries inflicted upon Nina is a polite euphamism for an axe blow or several to the face.

What's more, I mention this whole affair to my friends and discover just how small the world is: Chloe's father went to school with Nina, Donshae went to school with the accused, Jason knows Nina's relatives, and our neighbours are aghast that it's all happened in the sleepy community of Loxton.

As chance would have it, I have photographs of the area in which the body was found: unwittingly, I'd snapped away at some of the locations. I'm not sure if it would be wise to post them at this time...

Monday, January 30, 2006

My Weekend and Murder

Updates: Everything, including James Hall is sentenced.

This weekend I went away with my housemates to the community of Loxton. I'd spent the trip up tapping away on my laptop, trying to think of mystery & intrigue to write about, and work into some fashion of ARG.

Life is infact a lot stranger than fiction, it turns out. Donshae, who grew up in the area, and Chloe, who lived about 40km away were shocked when we heard about the death of a woman and the discovery of her body near the local caravan park.

The local caravan park is the main swimming area of Loxton, right on the river Murray. It's where everyone goes - so it must have been very, very upsetting to find the body afloat in its waters.

Initially it appeared to be an accidental death, but police later declared it a major crime and appealed for information in the case. We got back to Donshae's family home and were told that no-one knew who she was - police as of yet had no way to identify the woman.
Rumor had it that she had suffered a large amount of blunt force trauma to the face; which I am willing to believe seeing as it took so long to identify her.

Later, we found out that she was in her 50s, and had only been wearing a pair of socks and two necklaces - the necklaces were in the sunday mail. No one thought she was a local, figuring that we'd know if someone was missing, wouldn't we? But unfortunately, that wasn't the case - 53 year old local resident and supermarket worker, Nina Starr, met an unfortunate end.

Initial speculation that she had been shot was unfounded. The post mortem results should be available over the coming day or week, so I may keep you all posted.

Erstwhile, google news carries the related stories.

Friday, January 27, 2006

Copyfight & IE

US courts rule that google's caching features are fair use, and I discover embedding IE in firefox tabs, the extension.

It's a touch warm, lately, isn't it?.

Tuesday, January 24, 2006

All Exceptions Created Equal...

All exceptions are created equal. But what if you have a very good and pressing reason to serialize one to save it in a database? For instance, you had a collection of checks you wanted to run against some data, and save the results?

Ah, that's easy! serialize() was made for that! So off you trot, you make a few changes to the code, add a blank line here, a blank line there, and suddenly your code can't find the exact matches of the serialized object in the database.

HUH? What's going on? You're the exact same Exception I just threw three minutes ago, and you've sporadically broken?

It took me a while to twig. When you create an exception ($e = new Exception("foo");), it's shiny and new and listens when you do equality comparisions (==).

But things go awry: you throw a new Exception from your filter, catch it, and serialize it. You haven't remembered that...



$a = new Exception("foo");
try {
    throw $a; //Line 1
} catch (Exception $e) {
    throw $a; //Line 10;
}

will result in two difference traces. One saying "I was throw on on line 1", the other "I was throw on on line 10"...

Fuck oath, hello stupid coder. You've been wracking your brains wondering why every time you go off and edit a different bit of code it serializes differently; and there you have it.

How the hell do I fix it? Going to __sleep() on the job actually helps.



<?php
class DumbException extends Exception {
    /**
     * Cleanup anything we need before serialisation
     *
     * @return  string[]    An array of member varible names to serialize
     * @see     http://php.planetmirror.com/manual/en/language.oop5.magic.php
     */
    public function __sleep() {
        return array('string','code');
    }

    /**
     * Compare against another DumbException for equality.
     *
     * Since two exceptions can be !== because the trace / line / file
     * information is different, we need to do this.
     */
    public function cmp(DumpException $e) {
        return (serialize($e) == serialize($this));
    }
}

print '<pre>';
$a = new DumbException();
$b = new DumbException();


try {
    try {
        throw $b;
    } catch (Exception $e) {
        throw $a;
    }
} catch (Exception $e) {

    var_dump($a === $b);
    var_dump($a == $b);
    var_dump($b === $e);
    var_dump($b == $e);
    var_dump($a === $e);
    var_dump($a == $e);

    var_dump($b->cmp($e));
    var_dump($a->cmp($e));
}
print '</pre>';
?>

Firebug

Firebug just made it into my must-install list of extensions for firefox. I love getting to work and forgetting I installed something the other day, waiting to restart firefox - so I start off the day with a new toy.

FireBug is a new tool that aids with debugging Javascript, DHTML, and Ajax. It is like a combination of the Javascript Console, DOM Inspector, and a command line Javascript interpreter.

Other fun features:

* XMLHttpRequest Spy - Ever wonder what all them newfangled Ajax websites are up to? Watch the requests fly by in the console!

* One web page, one console - Tired of slogging through a zillion errors in the JavaScript Console trying to find the one you want? The FireBug console is built into the bottom of the browser, and only shows you errors and log messages that came from the page you're looking at.

* JavaScript Error Status Bar Indicator - It's a sin that Firefox doesn't include this by default, like IE does. When there is an error in the page, the status bar will let you know with a big red blob.

* Logging for web pages - Sick and tired of "alert debugging"? Jealous of all your C programmer buddies with their fancy printf? Now you can log text and objects to the FireBug console from any web page. See my website for more info on this.

Monday, January 23, 2006

Slashrabble

VNUNet reports that the Photocasting feature in Apple's iPhoto application violates core XML and RSS standards. Perhaps the worst part is that, in many cases, this isn't even a case of 'embrace and extend', but just plain doing it wrong. Dave Winer, essentially the creator of RSS, says, 'It's pretty bad. There are lots of errors, the date formats are wrong, there are elements that are not in RSS that aren't in a namespace.'"

HAH.

The RSS-DEV group went on to produce RSS 1.0[5] in December 2000 based on a draft proposal of amendments to the specification presented by Tristan Louis[6]. Like RSS 0.9 (but not 0.91) this was based on the RDF specifications, but was more modular, with many of the terms coming from standard metadata vocabularies such as Dublin Core.

Nineteen days later, Winer released by himself RSS 0.92[7], a minor and supposedly compatible set of changes to RSS 0.91 based on the same proposal. In April 2001, he published a draft of RSS 0.93[8] which was almost identical to 0.92. A draft RSS 0.94 surfaced in August, reverting the changes made in 0.93, and adding a type attribute to the description element.

In September 2002, Winer released a final successor to RSS 0.92, known as RSS 2.0 and emphasizing "Really Simple Syndication" as the meaning of the three-letter abbreviation. The RSS 2.0 spec removed the type attribute added in RSS 0.94 and allowed people to add extension elements using XML namespaces. Several versions of RSS 2.0 were released, but the version number of the document model was not changed.

-Wikipedia

Does anyone else get annoyed at this less than crafty attack on Apple by someone who is rather pro microsoft?

Phing

Phing looks pretty useful. Note to self; check it out from CVS and tinker.

FeedTagger: Localised News

Feedtagger just got revamped - more focused on localised news & particularly on Australian cities. I like it, because it's like a vamped up AdelaideIndex which I can add links to.

Check it out for Adelaide or your own town.

Friday, January 20, 2006

The AttentionTrust Bandwagon

Reactions to the concept of AttentionTrust appear mixed: ranging from bullshit to confusion.

I like the idea. I've seen it before. The difference between what I saw before and what's out there is all in the message.

Menow:
* Describe what you are doing
* Share it with other applications
* End user benefits from software being more aware of what is being focused on.

AttentionTrust / Root.net:
* Record all your interaction with the webbrowser.
* Hock it to a vendor.
* ????
* Profit!

I don't like it in the slightest: the pitch is all wrong, wrong, wrong!

If I were pitching AttentionTrust, I'd do it like this.

The world is a chaotic place, and the world wide web is even more so. We're assaulted with more and more information, we're pestered by irrelevant advertising, and at the end of the day we still have to be productive. RSS, blogging, podcasting, email: all of these things scream out for my attention, each clamouring for more-more-more of my time.

If my instant messaging client flashes one more time when I'm busy, I'll kill someone. If someone asks me "what r u up to" one more time in a poor attempt at human communication, I'll break down into tears.

Instead, I've now got AttentionTrust. It's a plugin for firefox that will help me minimize the day to day issues I face and provide me only with relevant information. I can opt-in to allow the data of whatever it is I am up to as this very second to be collected; and if I so choose, I can give this to a bunch of webservices that will help me work better.

For instance: imagine looking for a book on amazon; and having a service notify me that there's infact a range of cheaper copies of the same book on barnes & noble - productive and helpful.

Imagine my instant messaging client being able to directly share whatever it is I am looking at with a buddy - "help me find out about Africa"; and I make my browsing history available via gaim / jabber - two heads are better than one.

Imagine being able to have a calendar / task planning application that takes into account just how much time I spend googling things.

Imagine having a nagging application that tells me I'm blogging too much and I should get back to work... (Robby, who sits behind me usually fills this function).

I'd like all of those uses. Just don't mention money or spyware!

It's akin to being a pizza oven manufacturer who's comparing their product to a holocaust era, nazi made concentration camp oven: it does not make the reader feel easy about buying your idea if it makes them feel queasy.

Painful Test

Mensa test, via Digg.

Before I started blatantly cheating I got to 13 under my own steam, a further 3 with abstract hints from someone else who got them, then I just resorted to google.

Hint: If you aren't american, you won't get some of them. If you aren't american, you won't get others.

CiteULike

Hublog points me to CiteULike today, which looks damned spiffy. It knows if two URIs refer to the same object in some cases. I wonder if it would be useful to create a generic web service that unified urls:
IE, it resolves a urn:isbn: url to an amazon / barnes & noble / etc style url.

Applications of this:
* Ecommerce; identifying products by EAN/UPC barcode if it exists
* Books
* Geo coordinates in URLs to place names (on wikipedia, perhaps)

... it normalises URIs that it knows about, which mean that you can post http://www.hubmed.org/display.cgi?uids=16099373 or http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=16099373&dopt=Abstract or info:pmid/16099373 or even http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6SYV-4FX23FY-3&_coverDate=08%2F31%2F2005&_alid=356223062&_rdoc=1&_fmt=&_orig=search&_qd=1&_cdi=4844&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=0af86d8bdaa4e097bc8ac51186461586 and they'll all be identified as the same object (the first three will be linked by PMID and all four will be linked by DOI). The same applies to books, which are linked by their ISBN.

Thursday, January 19, 2006

Having Kittens: PHP-Java bridge & JGAP

Route planning in PHP: sucky to the utmost. You have to go and implement a nice genetic algorithm to get all of your coordinates and sort them by the most optimal A to Z ordering.

God help us all; whatever are we to do? I don't want to have to implement and test something like that! It makes my mind hurt!

So, before we all panic, there's a solution.
PHP-Java bridge
Installation on Windows

It's really painless to install: you don't even need a servlet engine.

What next?
Install JGAP.
Read Javadocs.

Hey presto: you've got a very powerful genetic algorithm framework at your fingertips. What else can you apply it to?

Where Am I + Ubuntu

A Geo enabled desktop sound useful? I think so!

Wednesday, January 18, 2006

Laptop Hunting in Adelaide & Trust

Being the saavy consumer that I am, I thought that there was no better way to purchase a laptop than to perhaps shop around on ebay.

I saw a laptop that was more or less what I wanted, and I discovered to my delight the seller was based in Adelaide.

Quick as a flash I sent off a message asking where they were located; if I could come in and browse some of the laptops they had on hand. I would prefer to hand over a large wad of cash in person and carry off a laptop there and then as opposed to trusting them to deliver it to me.

The response:

Although we do have a affiliated store in Adelaide we don not allow preinspection of the ebay notebooks and the store has completely different stock, pickup is available once purchased

Best Regards
Stuart

All I wanted was a laptop of some description, I wasn't overly picky - I don't care if it's the ebay one or something similar - and you're telling me that you would rather not have my business; that you'd rather I didn't come into your store and you'd rather not talk to me about purchasing a laptop from you?

What a fantastic business model.

I can't for the life of me even find the "associated Adelaide shopfront". Why is it that people are afraid to link their online ebay identies to their actual business?

It's evident that the feedback rating system of ebay is shockingly poor: 99.3% positive ratings - but I get the distinct and uneasy impression that if I were to deal with this merchant in person I'd be left with a very sour taste in my mouth and wouldn't part with my money.

People aren't interested in quality metadata: ratings like "excellent item AAAAAAAAAA" show that people are just getting into a positive feedback circlejerk: these consumers aren't smart (or at least don't care about written language, which gives me a poor impression of their intellect), and I don't know them, so why should I trust them?

And communications with the seller in question show that poor bastard doesn't know how to deal with people: we're just a meal ticket and a positive feedback rating; he doesn't want to deal with us in person.

If I were building an ecommerce trust rating system; here's how I'd do it.
* An application is downloaded that hooks into your email accounts, instant messaging accounts and IRC logs. It finds and identifies the level of communication between yourself and individuals, and ranks them according to the number of messages and approximate length. gaim already does this with "order buddy list by log size", putting the people important to you above all else.
* You get presented with names, and you get to link them between each other. This should be as easy as dragging from A to B.
* Groups / specialised topics are created. "Friends". "Coworkers". "Linux geeks". Concepts are linked to each other as best as possible, and trust ratings are given to the groups on specific subjects. I'd trust linux geeks & coworkers to tell me about hardware, but not friends.
* In some fashion, if at all possible, the application communicates with my contacts and gets any details they feel like providing: ebay usernames, username @ site.com records in firefox, etc.
* When I look on ebay / amazon, I query my network of contacts for information about whatever I'm looking at - seller, product, etc. (SPARQL + jabber?)

How hard would it be to create an suite of components to perform all of these tasks? Extensions in firefox to expose data in there, data mungers to talk to MSN, gaim, etc or parse the logs (or maybe hijack google desktop search), gaim to communicate with my close contacts, a decent triplestore to handle it all (one that *doesn't* get integrated into my browser, I don't need a bloaty browsing experience).

My eyes are agog

Denial of Perspective makes my eyes tilt.

I've wanted to do something like this but could never get the math behind it down on paper right - perhaps it's a challenge to do to further my knowledge of python.

Tuesday, January 17, 2006

Upcoming + Greasemonkey

I've started tinkering on a greasemonkey script to grab highlighted text and turn it into an event on upcoming.

So far I've only gotten so far as to:
* Get the highlighted text
* Attack it with eleventy billion types of regexp to find dates & extract them

Todo:
* Look at spellchecking in firefox. Use this to eliminate common english words from highlighted text, thus making it easier to identify people & place names. Spellbound might help.
* Look heavily into NLP and statistical approaches to extracting information from free text.
* Do a quick and easy UI to generate a "smart" description of an event.

If I don't have code up in the next two weeks, heckle me, blogosphere.

Monday, January 16, 2006

QOTD

"It's like the Catholic Church charging for the air you breathe because their CEO invented the Universe," he said. "That's the stupidity of the situation." - Matthew Tutaki, in reference to ecommerce patent litigation threats against Australian businesses

Why do you hate Apache?

Why I hate Apache.

What are some of your favourite gripes about... CSS, Javascript, gaim, firefox, and more?

A year of me

43things has a roundup of how my 2005 was 84% worth living.

None too shabby!

Saturday, January 14, 2006

Why Google Pack Rocks

It's a GUI version of apt or rpm for regular users. That's why it rocks. It's not about the products on offer, but that you can download and install a bit of software autonomously.

Imagine: Gazillions of useful applications with a one click install for a new system - you dash through a department store of applications and say "this one, this one, not that one" - and it's all done for you.

Simplicity, and lots of it :)

I just want to know when we open source developers can start installing our software via it - imagine a sourceforge.net version, you grab the latest installs of Clamwin, Filezilla, gaim, and much more at the drop of a hat.

New System

I just got myself a new laptop, it's an Acer Extensa 2304LCi.

Here's a list of "vital" stuff to install.

Firefox
Adblock
Yubnub search extension
Filezilla
OpenOffice
gaim
TortoiseCVS
TortoiseSVN
Google Desktop Sidebar
Java 1.5
Clamwin
Google Pack

Friday, January 13, 2006

Son of MultiSelect

As ongoing detimmification takes place, another mighty blow is struck against custom HTML controls coded by the one we call... Tim.

Son of Multiselect is a more usable, all browsers, gracefully failing custom control that kind of just stole my heart a little.

Embedding it in HTML_QuickForm is a piece of piss too.



require_once('HTML/QuickForm.php');
require_once('HTML/QuickForm/select.php');

$GLOBALS['HTML_QUICKFORM_ELEMENT_TYPES']['mselect'] = 
   array('HTML/QuickForm/mselect.php','HTML_QuickForm_mselect');

class HTML_QuickForm_mselect extends HTML_QuickForm_select {


// {{{ toHtml()
/**
* Returns the SELECT in HTML
*
* @since     1.0
* @access    public
* @return    string
*/
function toHtml()
{
   if (!defined('HTML_QUICKFORM_MSELECT_EXISTS')) {
       $js = '<script type="text/javascript" 
                  src=\'multiselect.js\'></script>' . "\n";
       define('HTML_QUICKFORM_MSELECT_EXISTS', true);
   } else {
       $js = "";
   }
          
   return $js . parent::toHTML();
} //end func toHtml

// }}}

} //end class HTML_QuickForm_mselect

Using it?



  public static function MultipleSelect( $name, $elements,
                                             $params, $default = array()) {

      ob_start();
      require_once( 'HTML/QuickForm/mselect.php' );
      $a = @HTML_Quickform::createElement( 'mselect', $name, 
                                             "", $elements, $params );
      $a->setMultiple(true);
    
      $a->setValue( $default );
      print $a->toHtml();

      return ob_get_clean();
  }

Tada: Pretty, isn't it



print CLS_html::MultipleSelect2("control[]",
                                   array("one", "two", "three"),
                                   array("size" => 6), array(1));

The IT Crowd

The IT Crowd looks promising: note to self, watch it or go home!

Thursday, January 12, 2006

Trust & Reputation

iKarma is a trust & reputation site. My first impressions include:

The front page doesn't show me how it works

It's not even trying to be web 2.0, it's web 1.0 in its feel

Nowhere near enough in place to try to maintain accuracy. If I were to launch a hate campaign against a business, or promote myself falsely, there's little to stop me.

Where's the FOAF?

What possible use is it for me unless it's got web services / FOAF?

How does it tie in to existing trust systems (Ebay seller ratings, for instance)?

Opinity.com appears to be a much better offering - validation, communities, and more.

Where's the FOAF / web services?

Wednesday, January 11, 2006

Inflector - A bit of Rails comes to PEAR / PHP

Text::Inflector excites me greatly. It pluralises english words and all sorts. Take a squint at it!

Monday, January 09, 2006

Why we don't have a Semantic Web

Bug #273342:

After trolling the web a bit, the only thing I can find that RDF calendar is
useful for is if you want to mix calendar data with other RDF data in order to
do some sort of querying on the combined dataset. I'm having a hard time
imagining an end-user ever wanting to do this, so this sure seems out of scope
for a calendar client to me. I vote we cvs remove that file instead.

Friday, January 06, 2006

Norman Walsh: Annotations & Links

Norman Walsh points us to a few good firefox annotation extensions.

A New Blog: Seen Objects

Martin Kenny is a product manager for Maxamine International and also a local adelaide blogger. Be sure to check out his SeenObjects photoblog, there are some impressive shots in there.

Seen here: I think this is ~~my local~~ a random HappyWash.

Thursday, January 05, 2006

Image Searching, ALife & Self Organising Maps (Search By Sketch / Retrievr)

To expand on this post:
What if you could tell a computer what you were looking for, in general, and have it find it for you?
What if you could sketch a picture and have matches found for you based on your input?

Here's how: Query images by signature, query images by overall similarity, and a real implementation: Retrievr.

I prefer approach number 2; which is harder to implement but probably a lot more effective.

I wonder if there's any Java implementations out there that could be bootstrapped into PHP nicely.

Math, Anyone?

Math_Derivative looks hopeful: the ability to use math rather than PHP for certain functions means you can rapidly prototype math heavy applications; then optimize as needed into specific functions.

One to keep an eye on, for sure.