ToDo or Toodledo. That is the Question. Again.

4 0 Fil Salustri Sunday, May 30, 2010 Edit this post

One of my most popular posts at my other blog is a comparison of ToDo and ToodleDo for the iPhone. The original post was written a while ago, and both apps have had several significant revisions since then. So, I'm refreshing the post here: I've gone over the presentation, update the information to reflect the current versions of these two apps, and tweaked the data to reflect my most current thinking.

I like PDAs because they help me manage the things I have to do – and I’m all about the todo lists. I don’t know if I’ve become dependent on lists because I have a bad memory, or if my memory is failing because I use lists for everything. Still, there it is.

Over the past year or so, a number of task manager apps have come out for my beloved iPhone, and I’ve been trying most of them. It’s surprising how I keep coming back to the same two apps, and equally surprising (to me) that after months of playing around with them, I still can’t quite decide which one I prefer.

The two apps is Appigo’s ToDo, and ToodleDo for the iPhone. Both cost only a few dollars, and both are very well-rated by the public at large.

So, I figured, let's use some design analysis tools to evaluate the two apps, and see what the numbers say.

I’m going to use two tools: pairwise comparison, and a weighted decision matrix. These tools aren’t only useful for analyzing designs – they’re basic decision-making tools, and they’ve always done right by me to evaluate designs, conceptual or otherwise.

Both tools depend on having a good set of criteria against which the two apps will be compared. You might not know what decision to make, but you need to know how you’ll know you’ve made the right one. In our case here: How do I know when I’ve found a good task manager app?

The formal term for what I’m doing here is qualitative, multi-criterion decision-making. It generally comes involves four tasks, which in my case are:

Figure out criteria that apply to any “best” task manager.
Rank the criteria by importance, because some criteria will affect my decision more than others.
Develop a rating scale to rate each app.
Rate the apps with the rating scale and the weights.

Here’s my criteria, in no particular order of importance, based on years of using other task management tools:

Fast. No long delays when telling the app to do something.
Easy. Minimal clicking (e.g. not having to hit “accept” or "save" for everything, or burrow into deeply nested forms and subforms).
Start dates. Tasks shouldn't appear on any standard task list until its start date (if given).
Due dates. Obviously, but not mandatory on all tasks.
Repeats. Repeating tasks at regular intervals.
Priorities. At least three levels of priority for tasks.
Sync. Easy syncing to some remote service that is fairly robust, using standard formats, that let's me access my tasks from other devices.
Groups. Group tasks by tag or folder or project or whatever.
Sorting. Multiple ways to sort tasks.
Hotlist. Some overview page showing only near-term, important tasks; preferably customizable in terms of how I define "important."
Restart. Picks up next time I run it where I left off last time (oddly, not every iPhone app does this).
Recovery. Be able to uncheck tasks that were accidentally checked off.
Subtasks. Treat a single task as if it were a group/project/folder.
Checklists. A degenerate case of a task is just an item in a checklist. Not every "task" really deserves all the attributes. Checklists that can be used as templates (i.e. copied over and over again) would be even better.
Conditional deadlines. Due dates based on due dates of other items (e.g. task B is due two weeks after task A is completed).
Backlinks. Given a task, one-tap access to the group/project/folder in which the task lives.

Oddly, not a single iPhone app I’ve checked out so far meets all my requirements. In particular, I’ve not even heard of an app that even tries to meet the last two requirements. I say “oddly” because I don’t think these requirements are excessive or bizarre, and I do think they'd be immensely useful. Still, there it is.

Next, we have to develop weights to assign relative importance to the criteria. The word relative is key here; we’re not going to say that one criterion is certainly and universally more important than any other. What I want is to know how important each is with respect to the others and my own experience. Remember, one size never fits all.

This is where pairwise comparison comes in. Details on how this works are given in another web page (it isn’t hard). The chart below is just the end results. In each cell is the criterion that I thought was more important of the pair given by that cell’s row and column. Since it doesn’t make sense to compare something to itself, and since these comparisons are symmetric (comparing A and B is the same as comparing B and A), then I only need to fill in a little less than half of the whole chart. If you’re thinking this took a long time, you’d be wrong. It took me about 30 minutes to fill in the whole thing.

Fast

Easy

Start Dates

Due Dates

Repeats

Priorities

Sync

Groups

Sorting

Hotlist

Restart

Recovery

Subtasks

Checklists

Cond. Deadlines

Backlinks

This leads to the following weights:

Fast	2.46%
Easy	6.56%
Start Dates	4.10%
Due Dates	11.48%
Repeats	11.48%
Priorities	4.10%
Sync	5.74%
Groups	9.84%
Sorting	9.84%
Hotlist	4.10%
Restart	3.28%
Recovery	4.10%
Subtasks	6.56%
Checklists	4.92%
Cond. Deadlines	8.20%
Backlinks	3.28%

So this tells me, for instance, that having due dates and repeating tasks are the two most important criteria. Task grouping and sorting are a close second. And so on.

The point of this process is that the human mind is not good at juggling a bunch of variables, but it is very good at comparing one thing against another. Take the trivial case of choosing between three alternatives, A, B, and C. If you prefer A to B, and B to C, then you should accept the logic that A is the most preferred item. To do otherwise just isn’t rational. That’s exactly what pairwise comparison does. And there’s good evidence that this technique actually works.

The next step is to choose a rating scale. This scale will be used to rate each app with respect to each criterion.
There’s a variety of scales I could use, and a great deal of research into qualitative measurement scales has been done. The scale that works best for me – and seems to be the most general – is a five-point scale from -2 to +2, where 0 means “neutral,” -2 means “horrible,” +2 means “excellent,” and -1 and +1 are in-between values. If you prefer something a little finer, you can use a 7-point scale from -3 to +3. I think it’s important to have a zero value to indicate neutrality, and I find it meaningful to have negative numbers stand for bad things and positive numbers for good things.

It’s interesting to note that in some industries (e.g. aerospace), I’ve noticed a tendency to use an exponential scale – something like (0, 1, 3, 9). This is because aerospace people tend to be extremely conservative (for reasons both technical and otherwise), so they tend to underrate the goodness of things. This scale inflates any reasonable rating to make up for that conservatism.

But I’m neither an aerospace engineer nor particularly conservative, so I’ll use the -2 to +2 scale.

Now we can do the weighted decision matrix. The gory details are given elsewhere. The weights come from the pairwise comparison above. In a decision matrix, we rank each alternative to some well-defined reference or base item. We need a reference because we need a fixed point against which to measure things. For this comparison, I'll use the task manager that I am actually using these days, Pocket Informant for the iPhone, as the reference.

I worked up a weighted decision matrix comparing ToodleDo to ToDo. Here it is:

		Ref (PI)		ToodleDo		ToDo
	Wgt	R	S	R	S	R	S
Fast	2.46	0	0	0	0	0	0
Easy	6.56	0	0	-1	-6.56	1	6.56
Start Dates	4.10	0	0	0	0	-2	-8.2
Due Dates	11.48	0	0	1	11.48	1	11.48
Repeats	11.48	0	0	1	11.48	1	11.48
Priorities	4.10	0	0	-1	-4.1	1	4.1
Sync	5.74	0	0	0	0	0	0
Groups	9.84	0	0	0	0	0	0
Sorting	9.84	0	0	1	9.84	0	0
Hotlist	4.10	0	0	1	4.1	1	4.1
Restart	3.28	0	0	0	0	0	0
Recovery	4.10	0	0	0	0	0	0
Subtasks	6.56	0	0	0	0	0	0
Checklists	4.92	0	0	0	0	1	4.92
Cond. Deadlines	8.20	0	0	0	0	0	0
Backlinks	3.28	0	0	0	0	0	0
	100.04		0		26.24		34.44

This table might not look like much, but it tells a bit of a story. The column marked Wgt is the weight of that criterion taken from the pairwise comparison. Each of the three apps gets two columns. The R column is the rating I gave it; PI is the reference, so it gets zeros in every category. That way, if another app does better than the reference, it gets a positive rating, and if it does worse than the reference, it gets a negative rating. The S column is the actual score, which is the rating multiplied by the weight for that criterion. The numbers at the bottom of the S columns are just the arithmetic sums of the individual scores.

If you look at the ratings for ToDo, you see that it’s a bit better than ToodleDo on some points, and a bit worse on others. But the +1's don’t actually cancel out the -1's because of the weights. The criteria on which ToDo beat ToodleDo are more important to me than the others, because the weights are higher. That makes ToDo noticeably better than ToodleDo.

It's interesting to note that this version has me preferring ToDo over ToodleDo, whereas my original post had it the other way around. This is because of all the updates to both apps since I first compared them. Even though there are some things about ToodleDo that really turn my crank, ToDo is the better app, because it does better on things that I think are more important.

And that jives nicely with my intuition. I started with ToDo, then switched to ToodleDo (just before I did my first comparison). But now, given the improvements to ToDo, it's taken the lead again. If it weren't for the decision matrix, I'd only have a "gut feeling" telling me which was better. But now, having done the comparison twice, I understand and can explain why I preferred one, then the other, then the one again.

One might ask, then, why I'm using Pocket Informant since both ToodleDo and ToDo beat PI. The answer is simple: appointments. PI integrates appointments sync'd with Google Calendar right into the app. That is an absolute deal-breaker for me: it's just too useful for me to have my appointments and tasks all available under one roof, so to speak. If I'd've added appointments as a criterion, both ToDo and ToodleDo would have lost to PI.

Back during my first comparison, I ran into a problem with ToodleDo that - though it has been corrected since - remains noteworthy with respect to doing these kinds of comparisons.

The problem was this: ToodleDo used to generate the next in a series of repeating events only when it sync'd with the ToodleDo service. ToDo, on the other hand, handled repeating events internally.

This was a problem for me when I travelled. I had gone to Berlin for a conference. And I didn’t have a data plan for my iPhone (that’s a whole separate story), so I couldn’t sync either app. But that meant ToodleDo couldn’t roll repeating items over properly. So before I went to Berlin, I sync’d up ToDo and used it while I was gone. And when I came back I switched back to ToodleDo. I did that whenever I travelled.

Does the evaluation consider that? No it doesn’t, because I didn’t. The evaluation is only as good as the evaluator. When I evaluated the two apps, I was nestled snugly at home, WiFi at the ready – and sync’ing either ToDo or ToodleDo was a non-issue. If I’d've done the evaluation in Berlin, I’m sure I’d've gotten different numbers, because the repeating events problem would have been right there in my face, irritating the hell out of me.

So this underscores a limit with the evaluation method – indeed, a limit with any method: it’s only as good as the situation you’re in when you use it. Some people might say a method is only as good as the information you use, but it’s more than that. My situation, in this case, includes me, my goals (at the time), my experiences, all the information I have handy, constraints, and anything else can possibly influence my decisions at the time.

The problem, then, is that a method depends on the situation when it’s used. But that situation may be different for the person doing the evaluation than for the person(s) who will have to live with the decision being made. Indeed, it’s virtually guaranteed that the situations will be different, if for no other reason than the implications of a decision will only occur later.

Does this put the kibosh on these kinds of methods?

Not at all. It just means that we must be vigilant and diligent in their application. If I did the evaluation in Berlin, ToDo would have won, because in that situation, ToodleDo would have scored poorly on repeating events. This is as it should be. That means that in the two different situations, the method worked. The problem is that in any one given situation, there’s no way to take into account any other situations.

Happily, there is fruitful and vigorous research concerned exactly with this. Some people call it situated cognition; others call it situated reasoning. We’ve not yet figured out how to treat situations reliably, but I think it’s only a matter of time before we do.

In the meantime, there is at least one other possible way to treat other situations. A popular technique to help set up a design problem is the use case (or what I call a usage scenario). These are either textual or visual descriptions of the interactions involved in using the thing you’ll design. They can be quite complex and detailed. Usage scenarios try to capture a specific situation other than the one that includes the designers during the design process. So it’s at least possible that usage scenarios could help designers evaluate designs and products better.

One final caveat: this evaluation is particular to me. It is unlikely that anyone will agree completely with my evaluation, because their situations are different from mine. So I’m not saying ToDo “is better” than ToodleDo. I’m just saying it seems to be better for me.

As they say: your mileage may vary.

COMMENTS

BLOGGER: 4

TiniOctober 26, 2010 at 9:15 PM
That has to be the best reasoned analysis of HOW to compare the various to do apps available on the market that I've seen in days of research. I've tried ToDo and Toodledoo, but have recently been researching 2Do and Pocket Informant due to personal recommendations from others. I've been considering suggesting RTM to my mother, whose one technological feat is learning to use the iPhone, and who has very different requirements for a to do list. Most reviews simply discuss the reviewers opinions and compare the features. Well done.
ReplyDelete
Replies
Fil SalustriOctober 26, 2010 at 11:21 PM
Thanks for the kind words. Pocket Informant is very robust, but I prefer having an "action list" like Things by Cultured Code, or Taska, than the way PI does things. I've yet to take a serious look at 2Do, but I don't like the look and feel. Highly unscientific, but there's only so many hours in a day. RTM, similarly, doesn't support the action list approach I like so much.

There's always subjectivity in these "reviews." That's the advantage of writing about HOW to do things. So it's most gratifying to reach your acknowledgement of that.

Thanks.
Fil
ReplyDelete
Replies
FlyguybcFebruary 7, 2014 at 3:03 PM
Wow, Filippo, awesome analysis! I have been using PI now for about 6 months and really like. It is complicated which I think truly does turn some people off.
I have never used either ToodleDo or ToDo but I see a lot of references to people that use either of these programs alongside PI. My simple question is, with not having used either program, what would be the advantages of using them with PI?

Thanks
ReplyDelete
Replies

Add comment

FACEBOOK

DISQUS

The Trouble with Normal...

Header$type=social_icons

|FEATURES$type=ticker$count=12$cols=4$cate=0

ToDo or Toodledo. That is the Question. Again.

Labels:

COMMENTS

ABOUT ME

EMAIL NEWSLETTER

Get new posts by email:

/fa-clock-o/ MOST RECENT$type=blogging$m=0$cate=0$sn=0$rm=0$c=4$va=0

/fa-bar-chart/ WEEK TRENDING$type=list

/fa-fire/$type=list-tab

/fa-archive/$type=list-tab

/fa-tags/$type=tab

Total Pageviews

|FEATURES$type=ticker$count=12$cols=4$cate=0

ToDo or Toodledo. That is the Question. Again.

Labels:

SHARE:

COMMENTS

ABOUT ME

EMAIL NEWSLETTER

Get new posts by email:

/fa-clock-o/ MOST RECENT$type=blogging$m=0$cate=0$sn=0$rm=0$c=4$va=0

/fa-bar-chart/ WEEK TRENDING$type=list

/fa-fire/$type=list-tab

/fa-archive/$type=list-tab

/fa-tags/$type=tab

Total Pageviews