Flex: cvt (seive it)

CVT – Cluster Visualisation Tool (pronounced sieve-it) is a Flex application for the visualisation of search results returned from the Carrot2 search clustering engine.

Here we can see the results returned from sticking my name into the search. Note that the application also has a ’site’ input box which lets you target a particular site.

cvt

Selecting a cluster changes the focus of the graph to show the target cluster and any related search clusters which share common urls.

cvt

Of course it would be useless without being able to view the results page. Here the html page is rendered directly inside the application using the iframe component.

cvt

The client is written with Flex and flexvizgraphlib. The back end is a java component which utilises the Carrot2 library. The underlying search is from Yahoo. I would have also liked to use Google but the java API has been phased out.

This application was written initially as an example of using GDS and remoting java objects. I am not sure how GDS stands now that BlazeDS has been released by Adobe but kudos to the GDS guys for doing this anyway. I will continue to use it for the time being.

After getting the initial version working I stuck in a search for Carrot2 and visualisation and noticed that there is already an very similar tool.

carrot3

This is written in Java and you can reach it here. I like this but I think my tool is better.

You can download a tomcat deployment for CVT here.

How to run cvt

These instructions apply to Windows. Mac users will need to apply their technical skillz to get it to run. Shouldn’t be too hard (is JDK 1.6 available for the Mac?).

1) Download the tomcat image here.

2) If you haven’t already got Java 1.6 JDK you will need to download it. I think you can use the JRE but I haven’t tried it (yet). I am using “jdk1.6.0_01″ so your best bet is this or later.

3) Install the JDK (or JRE I guess) to your machine and then edit the batch file ’startTomcat.bat’ to set the JAVA_HOME variable to point to your installation. E.g mine is

JAVA_HOME=C:\\Program Files\\java\\jdk\\jdk1.6.0_01

4) Run the ’startTomcat.bat’ batch file.

If all is well you should see a DOS box appear with some messages flying past. You may get an exception message about “Persistance”, ignore it as these are to do with persistance of the server app image. If you already have Tomcat running then you will get error messages. Kill the other Tomcat instance and rerun the CVT one.

Go to the url

http://localhost:8080/granite-search-web-server/search-viewer/SearchViewer.html

or click here.

You should see the cvt application in your default web browser.

5) As a final step you should really register for a Yahoo search API id.

As of now the registration page is returning a 404 error for me. If this comes back up I will give further instructions but for now the default id is the one bundled with the Carrot2 api. In theory you should edit the yahoo-search.xml file under the tomcat deployment (in the WEB-INF/classes directory). I haven’t a clue if this application will be popular so we will see if this causes problems or not i.e the id might reach its search result limit and stop working.

Tips on using cvt

  • Site input box
  • Remove the “http://” bit at the start of the url. Some site urls will not work. I am unsure why. The application should really return a ‘no results found’ message, this is my next task.

  • Graph Display
  • The ‘Link Length’ slider only applies when the autofit checkbox is OFF. If the graph is too small and you want to expand it then deselect autofit and adust this slider up and down. Toggle the autofit to see the graph switch between the two states.

  • Click – Double Click
  • Clicking on an item in the datagrid willl automatically focus the graph on the chosen cluster.

    The same effect can be reproduced by double-clicking on the blue ‘cluster’ nodes in the graph.

    To return to the overview double click on the ‘Clusters’ node i.e. the ‘EYE’.

    Clicking on a star will render the page within the ‘Browse’ tab. You can perform a text search using your web browser by clicking on the html page to focus it and then searching in the normal way.

  • Separation Slider
  • This should only be of any use when you have a cluster focussed. Bumping it up to 2 will have the effect of displaying all the clusters with the selected cluster as the focussed item. Be wary of double clicking on the EYE as you will get every referenced url displayed and the graph will be unmanageble.

    Tips on searches

  • Searching for a persons name
  • This is only my experience but the clustering algorithm is excellent at making links between specific topics. For instance if I put in the name of a friend if they have enough presence on the web they will appear in the search. What’s more it is possible to see what clusters share links with that person by making their cluster the focus of the graph. This sort of thing is possible with a bog standard search engine but CVT is a cool way to visualise it.

  • Searching a particular web site
  • As noted previously there is no feedback to say if a site has returned results or not. You will just see that the graph area has not changed. Don’t forget that not all sites make their pages available to search engines (in this case Yahoo).

  • Genealogy
  • Potentially this is a wonderful tool for making connections between people. I have had a play with this but the problem is that genealogy specific sites don’t normally make their data crawable by the Yahoo search engine (why would they). This means that my results from this have been sketchy at best.

  • Input box history
  • I used the history input control from Sho Kuwamoto. I couldn’t get it to work from his swc and had to create a new library project from scratch. This uses the Flash ‘SharedObject area which means that potentially it could fill up. You can mitigate this by right clicking on the the combobox area, when you input a previous search, and select the option to reset the history.

    Anyway, this is a first cut and I have still to put stuff in like ‘history’ and better error messages. I am getting pretty good at Flex now and this application is a cumlative one with lots I have learned from my previous work on flexTraffic and others including the use of Cairngorm.

    Post some comments or email me.

    Comments

    Leave a Reply