Jesse's Journal: August 2008

Be a coder like me

Computers have been good to me. No doubt about that. My programming skills just landed me an awesome job in a company that's totally out of my league considering my lack of formal education. When I tell people they often express a longing desire to learn to program too. It's not as hard as you think, and at the risk of losing my job to one of you, I'd like to share an excellent tool for learning to code. It's called AppJet.com and it's a serverside javascript framework, and hosting system. If that sentence didn't make your head explode, you might just be able to learn to code. So don't wait for me to explain it, just go. Trust me it's easier than you think. But don't tell my boss!

Posted by email from Jesse's posterous

New clothes for my first day of school!

Typed with my thumbs

Posted by email from Jesse's posterous

Office decorum

Typed with my thumbs

Posted by email from Jesse's posterous

Dawn

Typed with my thumbs

Posted by email from Jesse's posterous

560 mission st

Typed with my thumbs

Posted by email from Jesse's posterous

OpenTaste - Technical

One of my main goals for agglodex is to enable it to export the interests that it collects from your aggregated lifefeed, so that users could allow other sites to consume their attention data in order to make better recomendations. For this reason I have been following APML closely as they devlop the spec. They have long struggled to make the format expressive enough to entice third parties to adopt it. But before the standard even had the chance to catch on another group has started a second format for representing what it claims is even more than "attention data". OpenTaste is an OpenID/OAuth inspired standard for exporting general preferences in any format. Definitely worth my consideration!

http://www.opentaste.net/technical.html

Typed with my thumbs

Posted by email from Jesse's posterous

Different needs for different feeds

Feeds feeds feeds, seems like that's all I'm dealing with anymore. Feeds from different sources; my feeds, my friends' feeds, strangers' feeds, and sites' feeds. Feeds of different content types; blog feeds, music feeds, video feeds, photo feeds, link feeds, activity feeds, event feeds, news feeds, message feeds. Big feeds, little feeds, interesting feeds, and boring feeds. So much data coming down so many pipes, what are supposed to do with it all!?

The problem is that feeds come in all different types and yet we try to manage them all with the same tools. Feedreaders try to help us manage these multitudinous data channels but are quickly overwhelmed by the static. Perhaps a better understanding of what we're dealing with could help us organize them better.

Strangers' feeds

These are feeds that are produced by people you don't know very well. Often times they are composed of entries from a varity of authors. Popular blogs like BoingBoing, ReadWriteWeb, and Techcrunch fall into this category. Even if you know the guys who write them, the entries are targetted at the world at large not little old you, and so you can never really tell how interesting they will be. Good sources of these are Digg, Del.icio.us, and other social news sites since the stories will be pre-sorted for interestingness, but they also happen to be the most chatty.

These are what feedreaders were built for. All you have to do to manage them is add them to your feedreader, categorize, and read at your leisure. You might skip a few days and return to find your feed reader is packed with unread posts, but since none of the entries was specifically meant to be read by you, you can usually just "mark all as read" and start fresh.

Update feeds

Some sites produce feeds to let you know about changes. Flickr gives you a feed of your most recent comments, and Dreamhost provides a feed of their hosting status to alert customers about downtime. These feeds serve the purpose that email has dutifully served for years. These feeds are most often produced by a website, not a human, but sometimes, in the case of comment and discussion feeds they are made by people. The main difference between these and Strangers' feeds is that they are guaranteed to be relevant to you. Why else would the site have made a feed specially for you!?

Putting these feeds in a feed reader can cause problems. What if an important item gets lost in the fray? Someone sends you a direct message on Twitter, or a comment on Disqus? You can put all these in a folder in your news reader, but then what if you miss something because of the millions of entries that show up in Digg's Top Stories feed?

The best way to deal with Update feeds is the same tool that you've been using to get these updates all along; your trusty mail client. Most mail clients have allowed you to subscribe to feeds since the early days of RSS. Mail.app has a decent one, as does Thunderbird. Putting all of your Update feeds in your mail client will ensure that you check them frequently, and most mail clients even alert you when you get a new item. Sure news readers can do this too, but your update feeds will likely be a drop in the bucket compared to the firehose of news articles that quickly start flowing into your newsreader.

Friends' Feeds

Strangers's feeds are meant for the world at large, and Update feeds are meant for you and you primarily, but Friends' feeds lay somewhere in between. They are produced by people you know, so it's a safe bet you'll want to see some of the things in them, but likely not all of them. If you have many friends who like to use social media, you'll have quite a few entries to look through. However, depending on your friends' editorial skills they might take longer to browse because you'll have to follow links and find out what they are about for yourself. There's likely to be a bunch of boring stuff in these, but if you want other people to look at what you make, you have to do due diligence and consume their content too. It might even make you better friends!

There are three kinds of Friend feeds. The first are individual feeds from individual sites. Your friend will have as many of these as they have website accounts, and there's no way to keep up with whether or not they are even updating them anymore, so they are largely junk.

The next type of feed is the my friends feed that social websites often offer. These feeds will give you all the content that your friends on that site are making. If any of the social networks would allow a truly open social graph (I'm looking at you Facebook and MySpace) you'd be able to get all your friends' content on any new site you joined. Unfortunately, each site has it's own "friends list", and you're going to have a hard time "friending" all the people you know on every site you ever use.

The last type of feed that your friends might make is what you might call an aggregated feed or lifestream. I'll talk about these more in a second, but they are basically a collection of all the content that your friend makes on lots of different sites. These are a great way to subscribe to your friends' content. The only catch is that they have to actually build one for you to subscribe to. With more and more sites offering users ways to generate content and syndicate it with feeds, and many excellent lifestreaming applications available this will soon be the preferable way of keeping up with what your friends are doing online.

So it should be pretty clear what these feeds, are and why you want to read them, so how do you manage them? To tell the truth, I don't know yet. When I put them in my feedreader they cause a lot of noise, and irritation since they are rarely self-explainatory. The best way I've found to consume these feeds is with FriendFeed. If your friends are on FriendFeed, you can easily find them, and immediately get access to all their feeds. Trouble is, not everyone you know is going to use FriendFeed. For those few of your friends who refuse to get a FriendFeed account, you can create "imaginary friends", but you have to manually add all their accounts on every site that you know they use. FriendFeed also has quirks that make it less than ideal in my opinion, and I just hate having to build yet another friends list.

My Feeds

The last type of feed--not to mention my favorite being that I'm shamelessly self-interested--are the ones that we generate ourselves. I'm not just talking about your blog, but your Flickr photos, your YouTube favorites, your Delicious links, your Tweets, and every other feed that is full of stuff that you made. These feeds are of no interest to you because you already know what's in them, but your friends and family might already be subscribed to a few of them. The trouble is that unless you're as famous as Kevin Rose, nobody's going to go the trouble of finding all those feeds, let alone subscribing to each and every one.

The trick to managing these feeds is to "aggregate" them into one all-encompasing "lifestream". There are many ways to do this. If you're a do-it-yourselfer like me, you might opt to build a Yahoo Pipe, or use SimplePie to mash together your feeds. If you don't want to get your hands dirty there are myriad services that will do it for you, Tumblr.com, Soup.io, Plaxo Pulse, FriendFeed.com, and Swurl.com are some examples of sites that do this. Some do the job better than others. Tumblr for example only allows you to add a certain number of feeds, and FriendFeed truncates entries. Most of these further complicates the issue by having content creation fetures of their own which means one more feed to add to your collection.

But is mashing your content together all there is to be done with those feeds you spend all your time at work building? I don't think so, and that's why I built Agglodex.com a site that not only aggregates all the feeds you make, but also analyzes the entries you create to determine what you're interested in. It can use that information to reccomend similar users, and entries you might like to see. It can even share your interests with other sites using a new type of feed called APML. You could give such a feed to a site like Digg and it could show you articles that you might be interested in, or give it to NetFlix and get better movie recomendations.

Conclusion

I hope I've helped to illuminate the world of feeds in all their varieties, and given you some ideas about how to organize them all. I truly love feeds for liberating all the user generated content on the web from the confines of the sites where it's made. I've always seen the immense potential of all that data, and have explored it for years. Agglodex.com is the outcome of years of ideas, and research. In the few months that I've been tinkering with it I've been so excited to see if people will use it. Now it's ready for users. So please sign up and add some of your feeds.

Posted by email from Jesse's posterous

.Mac Reader: jesse h's upcoming shows

feed://sonicliving.com/user/367/cal.xml

Typed with my thumbs

Posted by email from Jesse's posterous

I hate math

Agglodex(http://www.agglodex.com) is humming along, eating feeds, and tagging entries. The storage class is happily keeping all the data. Now comes the hard part; using that data.

My first task is to calculate how similar two users are so I can display a list of similar users on each profile. To accomplish this I have a collection called storage.relations that links two users, and has fields for their similarity. A cron job periodically loads a user and compares them to as many other users as it can, storing the results of the analyzation in the collection. The similarities will be based on the two users' terms. Each time an entry appears in a user's feed it is analyzed for significant terms which are stored in storage.terms. Each time a term is identified, the interest.count for that user and term is incremented along with the term.count which counts how many times a term is used sitewide.

My first attempt was using Euclidian distance. In this algorithm the interests that two users share are looped over, finding the difference between their interest.count, squaring it, and summing all the squares. I then divide 1 by 1 plus the sqrt of the sum and that gives me a number between 1 and 0 where 1 is complete similarity. This worked okay.

The second attempt was using the Pearson coefficient. This one was more complicated, and yielded a stranger score from -1 to 1 where 1 meant that the users had the same interest.count for every term they share.

My third attempt is trying to use the Tanimoto coefficient, which is an extention of the Jaccard index, but failing miserably. The Jaccard index is a way of calculating similarity between two sets of binary data, like two questionnaires of yes/no questions. However my 'interests' have a 'count' and are therefore more like vectors I think. Does anyone know how I could apply the Tanimoto coefficient to my users?

One problem I have with these algorithms is that they don't take into account all the terms that the users *don't* share. If two users each have hundreds of interests but happen to share an interest in "Google" or "iPhone" is what they have in common more important than what they don't?

I suck at math, so any help I can get here would be much appreciated! Thanks in advance!

I wonder how this thing works

Oh boy, yet another way to spam all my blogs at once. This ones called Posterous.com and it lets me email things to it. That included photos apparently, as many as I an fit in an email. All of these publishing aggregators, like Ping.fm, are really making it hard not to get duplicates in my content stream. Agglodex is gonna need some way eliminating dupes. I think FriendFeed might already be doing so.

Posted by email from Jesse's posterous

Jesse's Journal

Thursday, August 14, 2008

Be a coder like me

Monday, August 11, 2008

New clothes for my first day of school!

Office decorum

Dawn

560 mission st

OpenTaste - Technical

Thursday, August 07, 2008

Different needs for different feeds

.Mac Reader: jesse h's upcoming shows

Wednesday, August 06, 2008

I hate math

I wonder how this thing works

About Me

Links

Previous Posts

Archives