July 30, 2004
What I want in a Blog Search Tool
There's been an ongoing discussion about search tools, specifically those used to track blogs and see who is referencing what. Last month Judith was talking about Feedster and today Jason is taking on Technorati. Since I promised the Scott's at Feedster that I'd put together a wishlist which I keep putting off I thought just thinking out loud would be a good way to start.
It's important to explain how I use these tools currently, in order to make sense of what I'm looking for. Obviously I have this blog, seanbonner.com, and I also have blogging.la. I then have my wife Caryn's art offshoot art.blogging.la and the art gallery we run together sixspace. Then there's Metroblogging and the every growing list of city metblogs. There's also a handful of blogs run by other people that I contribute to. On a daily basis, I want to know whose talking about these sites, and some of them on a several times daily basis.
At this point there's not one tool that will answer this question for me so I use a search cocktail. I have a group of RSS feeds that I generated at feedster that are scanned every time my aggregator runs. This is great for finding sites that are talking about specific key words, but not linking to them. Then in my browser I have a bookmark folder with a bunch of bookmarks of technorati searches that I open in tabs several times a day. This is great for finding blogs linking to my sites. I also use several other search tools including google and their "link:" searches. Between all these I think I find *most* of the content I'm looking for, but every once and a while I someone points out something that slipped through the cracks.
While I've perfected the way this all works to the point where I don't have to think much about it on any regular basis, I'd be lying if I said it didn't take up too much time, or that I didn't wish it was easier. I know several people at both companies and I know they are working hard, so this isn't a criticism of what they are doing. I love what they are doing. This is really more of what I wish they could add to their already full plates. And trust me, I know some of this is impossible.
- Functional : I want the site to work. Obviously this is priority number one because if it doesn't work the results it isn't bring up don't matter. Actually giving back results and not timing our or stalling or whatever is the thing I want most. I'd sacrifice every other thing on this list if I knew the tool would work every time I used it.
- Speed : This isn't as big of a deal to me as it is to some other people. Since I actually do things during the day sitting at my laptop clicking reload on a search string, finding out that someone wrote something 45 minutes ago is as good as finding out they wrote it 5 minutes ago. At least it is to me. When I search, I generally want to know who has been talking about this stuff "today" - anything within the last hour is really fast. Under 10 minutes is really cool, but not really anymore useful than the last hour. As long as it's accurate and relatively up to date I'm happy.
- Thoroughness : I don't want to use 2, or three or four different tools to find all the links. I want one. Every time I search on Feedster I find something technorati didn't produce a minute earlier and vice versa. Who ever can give me all the info first will be my god.
- Smarts : I want there to be some way for the searches to be smart. I want to find out who is linking to seanbonner.com without sorting through 900 links to pages that are actually *on* seanbonner.com. I want to see who is talking about the Chicago Metblog without the results including everything ever posted *to* the Chicago Metblog. I don't know if an "exclude posts from this domain" option is the answer, but someone needs to figure this one out.
- Crosschecked RSS : I want an RSS feed for the search that works. Both sites offer these but they are both flawed. Here's some problems I have: Every time my aggregator runs, several posts are marked as new. They are not new. They are old. I've seen them several times in the last few days, and nothing has changed. The posts haven't been updated, commented to, trackedback, or anything. I've asked the authors "hey, did you change anything on this post?" they haven't. Sometimes I'm actually subscribed to the feeds on those sites, and those feeds know the post hasn't been updated, but the search feed thinks it has. This gets worse - sometime it thinks ALL the posts are new, and even worse than that, it thinks they are new, and new posts. So in one search feed, there will be 10 entries that really just point to the same three posts. Same titles and everything. I want there to be something that knows it's given me that info before.
That's what I want at the moment. I'll add more as I think of them.
Adam Greenfiled just want's someone at Technorati to respond to his e-mails.
Posted by sean on July 30, 2004 05:02 PM |
View blog reactions
Previous Entry:
bookmarks
Next Entry:
Will Ferrell is god.
Who said the What now?
Posted by: Ruth on July 30, 2004 05:16 PM
you are a mind reader! i have been composing an eerily similar list today... sheesh, you beat me to it sean! i'll have to refer to yours and note my diffs and/or adds... best... j.
Posted by:
judith on July 30, 2004 05:17 PM
As one of the "Feedster Scotts" I can only say :
a) great job thinking
b) we understand
c) keep watching; things are getting better at Feedster all the time.
And Judith -- I'd love to see your list.
Thank you.
Scott
Posted by:
Scott Johnson of Feedster on July 30, 2004 06:13 PM
This is definitely a very good starting point but it's a difficult proposition. I had this issue at blogrolling.com where the spec for URL's is basically too soft. You can point umpteen URL's at a single site via wildcards and even though it's the same content for the most part it will fubar the search engines. If you set soemthing like www.sbdc and sbdc and foo.sbdc and bar.sbdc all going to the same site there is a lot of processor intensive fuzzy matching that has to happen behind the scenes and it's expensive. Also if you are smart enough to know that an engine is reading multiple URL's you write code on your end to take the first part of the domain as a random seed and then generate a shitload of bogus text that will throw off all the external engines and read them as disparate sites while maintaining all the relattive links you have. It's a sticky situation at best. People like to talk about how email is broken but after being in the space I think that URI is in much dire need of an overhaul.
Posted by:
Jason D- on July 30, 2004 06:16 PM
And I think that Dave Sifrey was as classy as could be on JC's site and I think that Feedster still does a better job even though their jobs are actually different. I find a LOT more relevant searches on Feedster. Even though I thought Scott Johnson was from Friendstter the first night I met him ;-) Sorry bout that Scott. Blame it on Halley :-P Had to clarify that before this next bit. My bet in this game at this point is on Feedster. They have much better uptime and better relevance. What I'm not sure about is do they have a live API like Technorati? Technorati's backend is probably impacted negatively by their open API and extra traffic. While I LOVE the open API, I think for sites liek theirs they should charge a small fee to access it. That's what stopped me from opening the Blogrolling system to an open API. I just couldn't afford to handle the extra load from people just fucking around and taking my cycles from the people who paid to get them. Anyway back to work. Great thread!
Posted by:
Jason D- on July 30, 2004 06:21 PM
"This is definitely a very good starting point but it's a difficult proposition"
I don't doubt that at all, this is definitely an ideal world kind of list, not really a "please turn this option on for me by tomorrow" list.
Posted by:
Sean Bonner on July 30, 2004 06:51 PM
Post A Comment