Discussion:
moving mail between folders is intermittently failing
support
2010-05-01 02:36:40 UTC
Permalink
We have users whose folders are between 100 MB and 800 MB in size.
Most of those users are using Outlook but some are using Thunderbird.

Lately (and seemingly suddenly) the users are encountering trouble in that
they will move one or more items from the inbox to another folder and get
inconsistent results:

* sometimes the move is honored, with the item appearing in the target
folder
and disappearing from the source folder.
* sometimes the move is partly honored in that the item appears in the
target
folder, but is NOT deleted from the source folder.
* and sometimes the move is not honored at all, despite the client email
program
thinking for a moment that it was (before seeing in moments after that
it wasn't.)

The server is FreeBSD 5.4 using imap2006e, I think. I'll upgrade to
imap2007,
whatever's current in the FreeBSD ports tree, to see if it helps.

(1) I insured that /var/mail and /usr/tmp were permission 1777 as
suggested, but
just now so I don't know if that will prove to help. Things used to work
just fine
without these changes; the only things that may have changed are (a) the
users'
mail folders have gotten larger, (b) the versions of Outlook they are
using are new.
Thunderbird (the latest in general) also suffers the failed transfers,
though, and
only "suddenly" now for no apparent reason.

(2) I see no reference to debugging on the website. I thought I could
perhaps turn
on a flag for imap in inetd to be able to track requests to try to see
what the
server was being told to do, but I see no mention of any such facility
for either imap
or c-client.

==

How can I debug why moving items from one folder to another fails or is
inconsistent?

Thanks.
Mark Crispin
2010-05-01 06:51:33 UTC
Permalink
Comments on your message:

[1] There is no such operation as "move". What you think is a "move" is
actually implemented by two separate operations:
copy from source to destination
delete from source
and optionally a third:
expunge (permanently remove deleted messages from) source

[2] Your report of "sometimes the move is partly honors in that the item
appears in the target but is not deleted from the source" is explained as
the first operation completing successfully, but not the second operation.

[3] Your report of "sometimes the move is not honored at all" is explained
as the first operation never taking place.

[4] UW IMAP uses /tmp, not /usr/tmp.

[5] It sounds like you are using traditional UNIX mailbox format, a.k.a.
"mbox format"; a flat-file containing all the messages, one after another,
with each message preceded with a line starting with "From ". If this is
the case: traditional UNIX mailbox format was designed in a time when a
"very large mailbox" was between 100 KB and 800 KB in size. It does not
scale well to sizes of 100 MB to 800 MB.

In particular, doing things with mailboxes in the hundreds of MB in that
format takes a while. The authors of Outlook and Thunderbird are victims
of a computer science course mindset which, starting in the 1980s, taught
their pupils that all protocols are (or should be) stateless. Thus, they
believe that IMAP is like HTTP; that when a server fails to respond
immediately, that means that the correct remedial action is to disconnect
and try again, or just disconnect and assume that everything happened
anyway.

These developers simply can not grasp the concept of a stateful protocol
like IMAP (HTTP is stateless) in which certain operations (such as
manipulations of an 800MB flat file) might take a while; and that being a
stateful protocol IMAP is guaranteed to respond if you wait long enough.
Unfortunately, years of repeated attempts to educate these developers
proved to be utterly futile. When you talk about "state", you get vacant
stares.

Thus, there is no hope of a working version of crapware such as Outlook or
Thunderbird. The only thing that you can do is see what you can do on the
server to not unduly stress the limitations of the crapware.

One way to do accomplish this is to switch mailbox formats to mix format.
I can't imagine how anyone can tolerate traditional UNIX mailbox format
with a 100 MB mailbox, much less larger. mix format was designed to
accomodate such large mailboxes; and does in a matter of a second or two
what can take minutes with traditional UNIX mailbox format.

[6] What you describe is typical of scaling problems. It seems to "work"
until you reach a critical point. Very likely, the mailboxes got bit
enough that operations now tend to take long enough for Thunderbird to
decide to give up.

[7] Both Thunderbird and Outlook have a way to record protocol
negotiations. The c-client library also has a mechanism to record
protocol negotiations, at the client (not server) level.

You want to do client-based telemetry, not server-based, due to the size
of the protocol logs. IMAP protocol logs are HUGE, and on anything beyond
a small test system you will quickly be buried under the weight of the
logs. On a small test system, you can easily set up a script for server
logs using "tee", at the cost of giving up SSL capability.

You quickly find out that the server logs are much less useful than you
thought. I went through this exercise on my new IMAP server.
Post by support
We have users whose folders are between 100 MB and 800 MB in size.
Most of those users are using Outlook but some are using Thunderbird.
Lately (and seemingly suddenly) the users are encountering trouble in that
they will move one or more items from the inbox to another folder and get
* sometimes the move is honored, with the item appearing in the target
folder and disappearing from the source folder.
* sometimes the move is partly honored in that the item appears in the
target folder, but is NOT deleted from the source folder.
* and sometimes the move is not honored at all, despite the client email
program thinking for a moment that it was (before seeing in moments
after that it wasn't.)
The server is FreeBSD 5.4 using imap2006e, I think. I'll upgrade to
imap2007, whatever's current in the FreeBSD ports tree, to see if it
helps.
(1) I insured that /var/mail and /usr/tmp were permission 1777 as
suggested, but just now so I don't know if that will prove to help.
Things used to work just fine without these changes; the only things
that may have changed are (a) the users' mail folders have gotten
larger, (b) the versions of Outlook they are using are new. Thunderbird
(the latest in general) also suffers the failed transfers, though, and
only "suddenly" now for no apparent reason.
(2) I see no reference to debugging on the website. I thought I could
perhaps turn on a flag for imap in inetd to be able to track requests to
try to see what the server was being told to do, but I see no mention of
any such facility for either imap or c-client.
-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
UCTC Sysadmin
2010-05-05 21:33:08 UTC
Permalink
_______________________________________________
Imap-uw mailing list
Imap-***@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw
Mark Crispin
2010-05-05 22:09:20 UTC
Permalink
I don't know what other formats are available, moreover the "mbox" format
I thought was actually the canonical format of a file containing multiple
email
messages and is used as well by POP.
It depends upon which POP server you use. The POP server in the IMAP
package (UW or Panda) uses the same internal library as the IMAP server,
and thus supports all the alternative formats.

There's a file called formats.txt in the documentation that talks a little
bit about it.
After reading about the Cyrus IMAP server I decided not to allow "it" to
determine to me what I could and couldn't do without it; uw-imap "plays
with others".
UW and Panda are probably the only servers that do this strictly, although
Dovecot comes a lot closers than most. Just about every IMAP server wants
to use its own preferred format which is never mbox format.

Not even UW. The default format is mbox, but that never was preferred.
If they can understand the difference between TCP and UDP, they can
understand state.
I wouldn't be so certain. If you look at a lot of applications these
days, they blat out a query, and read an answer. If they don't get an
answer in what they consider to be a suitable time, they drop the
connection and try again.

Put another way, they treat TCP just like UDP, only with this strange and
annoying "connection" stuff that they don't understand why it's there.

Ironically, the original design for IMAP was for it to be UDP based. I
was talked into making it be stateful and TCP based.
Well, with Thunderbird at least the avenue is (still?) open to muck with
the source. Perhaps when I've nothing else to do, try to rewrite their
IMAP support to behave properly and send them the code, or (ha ha) offer
my IMAP client as a "plugin" for Thunderbird (ha ha.)
I would, if I were paid to do so. The problem is, it doesn't seem as if
there is any market for email clients (and hence funding) these days.
I'm averse to storing
user mail in a noncompatible format (meaning the POP server can't play)
but may have to, for these users at least (I'll put them in their own
sandbox.) If POP supports the mix format, then that objection vanishes.
Which POP server? If it's qpopper, then it probably does not support mix.
If it's ipop3d (bundled with imapd), then it does.
Argh, obviously the question becomes how does one convert existing mail
folders to mix format ... or is there a magic cookie such that the "mix
handler" knows to bow out, and or is there any automatic (blind) way
that upon first use the program can do a conversion in the background?
All I'd need is a filter to convert "mbox" to "mix", and stop the users
getting in until that is done.
There's a program called mixcvt that will do that.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-06 07:22:19 UTC
Permalink
Post by Mark Crispin
If they can understand the difference between TCP and UDP, they can
understand state.
I wouldn't be so certain. If you look at a lot of applications these
days, they blat out a query, and read an answer. If they don't get an
answer in what they consider to be a suitable time, they drop the
connection and try again.
Put another way, they treat TCP just like UDP, only with this strange and
annoying "connection" stuff that they don't understand why it's there.
you may rail about the stupidity of those developers as much as you
want. apparently unlike you, they live in the real world where the only
guarantee which tcp provides is that the data stream is intact - *if* it
arrives. timeouts as a workaround for shortcomings of the tcp/ip stack
under real-world conditions are a perfectly reasonable approach, just
like keep-alive signals in higher-level stateful protocols to prevent
timeouts. if you want to do better, you can use imap directly over ATM
or ISDN or other really stateful protocols. but even these sometimes
just go deaf, because they run with crappy real-world system software,
too.
Dan White
2010-05-06 08:19:24 UTC
Permalink
Post by Oswald Buddenhagen
Post by Mark Crispin
If they can understand the difference between TCP and UDP, they can
understand state.
I wouldn't be so certain. If you look at a lot of applications these
days, they blat out a query, and read an answer. If they don't get an
answer in what they consider to be a suitable time, they drop the
connection and try again.
Put another way, they treat TCP just like UDP, only with this strange and
annoying "connection" stuff that they don't understand why it's there.
you may rail about the stupidity of those developers as much as you
want. apparently unlike you, they live in the real world where the only
guarantee which tcp provides is that the data stream is intact - *if* it
arrives. timeouts as a workaround for shortcomings of the tcp/ip stack
under real-world conditions are a perfectly reasonable approach, just
like keep-alive signals in higher-level stateful protocols to prevent
timeouts. if you want to do better, you can use imap directly over ATM
or ISDN or other really stateful protocols. but even these sometimes
just go deaf, because they run with crappy real-world system software,
too.
And what do you do if the server takes longer that you think it should to
respond to a query? Do you assume that it's a networking issue or a slow
server?

What if, while the server is chugging away at one query, the user of your
application clicks on another message while your application is waiting for
a response from the server? Do you send another query? hang the interface
until the first completes? Kill the imap connection and forget the first
task? Cache and retry?

Treating an IMAP *session* like a stateless http request is doomed to
repeat history. RFC 2683 covers some of this ground.
--
Dan White
Timo Sirainen
2010-05-06 11:20:16 UTC
Permalink
Post by Dan White
And what do you do if the server takes longer that you think it should to
respond to a query? Do you assume that it's a networking issue or a slow
server?
Instead of fighting clients with this, I solved it by having server send

* OK Hang in there..

about every 15 seconds during long running commands. Clients seem to be
happy with it and not disconnect.
Mark Crispin
2010-05-06 18:19:34 UTC
Permalink
Post by Timo Sirainen
Instead of fighting clients with this, I solved it by having server send
* OK Hang in there..
about every 15 seconds during long running commands. Clients seem to be
happy with it and not disconnect.
Unfortunately, the mobile device guys and gals then get unhappy with you
for chewing up their battery. It's not as bad as making them transmit,
but it still causes a wakeup. They aren't placated when you tell them
that it's to work around stupid clients. They get unhappy even with an
untagged OK every two minutes during IDLE to quell a NAT timeout.

As well they should be.

The problem with going down the path of kludges and workarounds for broken
entities is that it's a never-ending process. You are faced both with
something else that is even more broken, and some innocent party that was
fine until you broke it with your workaround.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Timo Sirainen
2010-05-06 18:28:05 UTC
Permalink
Post by Mark Crispin
Post by Timo Sirainen
Instead of fighting clients with this, I solved it by having server send
* OK Hang in there..
about every 15 seconds during long running commands. Clients seem to be
happy with it and not disconnect.
Unfortunately, the mobile device guys and gals then get unhappy with you
for chewing up their battery. It's not as bad as making them transmit,
but it still causes a wakeup. They aren't placated when you tell them
that it's to work around stupid clients. They get unhappy even with an
untagged OK every two minutes during IDLE to quell a NAT timeout.
I think that's a different issue. They're unhappy when an idling device
gets woken up (constantly). The "Hang in there" messages are sent only
when client has requested some command that takes >15 seconds. Most of
the users/clients never see those messages at all.
Mark Crispin
2010-05-06 18:46:20 UTC
Permalink
Post by Timo Sirainen
I think that's a different issue. They're unhappy when an idling device
gets woken up (constantly). The "Hang in there" messages are sent only
when client has requested some command that takes >15 seconds. Most of
the users/clients never see those messages at all.
You're right. It isn't quite the same, and mobile devices are less likely
to run afoul of server head-pats every 15 seconds during long-running
commands.

But there still are issues, even if the device is already quite awake.
On mobile devices where you pay per packet (rather than per KB or MB),
head-pats add packets on top of IMAP's already excessive chattiness.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-06 19:46:46 UTC
Permalink
Post by Mark Crispin
Post by Timo Sirainen
I think that's a different issue. They're unhappy when an idling device
gets woken up (constantly). The "Hang in there" messages are sent only
when client has requested some command that takes >15 seconds. Most of
the users/clients never see those messages at all.
You're right. It isn't quite the same, and mobile devices are less likely
to run afoul of server head-pats every 15 seconds during long-running
commands.
15 seconds are possibly too aggressive anyway. hardly any timeout is set
below a minute, two minutes being typical for many things. dunno about
TB's settings in particular.
the "clean workaround" would include an imap extension which would let
the client decide how often it wants to see keepalives.
a mobile provider may also provide an imap proxy for their clients, so
the over-the-air connection could be optimized for somewhat more defined
QoS characteristics. though that may be in direct conflict with the
provider's interests.
Post by Mark Crispin
On mobile devices where you pay per packet (rather than per KB or MB),
that depends on the provider. mobile data flatrates are also becoming
common nowadays.
Mark Crispin
2010-05-06 20:24:00 UTC
Permalink
Post by Dan White
And what do you do if the server takes longer that you think it should to
respond to a query? Do you assume that it's a networking issue or a slow
server?
Crapware assumes that it is a network issue that somehow is utterly
irrecoverable in TCP, yet magically goes away if you tear down the TCP
connection and establish a new one.

Guess what happens when thousands of pieces of crapware are all doing the
same thing at the same time. It gets ever more entertaining as additional
crapware (with longer timeouts but not long enough) join in the
festivities. Typically, the original event that triggered the problem has
long since resolved itself.

It's very much like a minor traffic slowdown that escalates into a traffic
jam that escalates into a mass freeway collision. And no matter how many
times you attempt to educate people about maintaining a safe following
distance and safe lane changes, they still do the same bad things.
Post by Dan White
Treating an IMAP *session* like a stateless http request is doomed to
repeat history. RFC 2683 covers some of this ground.
Yup. The stateless religion was absolute dogma in CS classes starting in
the late 1970s/early 1980s. As a result, many of the young'uns simply
have no clue how to think otherwise.

Yet, over and over again, the same young'uns end up beating their heads
against a wall in an attempt to re-implement state. When they talk about
the problem they are trying to solve, they confuse it with "cache". When
a young'un talks about "keeping the cache synchronized", I listen
carefully. More times than not, s/he's trying to keep state but doesn't
know how to it.

The whole basis of the stateless dogma 30 years ago was the belief that it
is less inefficent to jump through hoops to attain state in a stateless
world than to maintain a stateful world if you didn't care about state.
I first became aware of it with PARC's Woodstock File System which was the
ideological predecessor of NFS. The WFS paper became the manifesto of the
stateless ideology; yet only a few read it and even fewer noticed that it
rather coyly defined out of the problem space all the cases in which
statelessness did less well.

Basically, statelessness requires accepting the following on faith:
. State is unimportant.
. State is expensive to acquire and maintain.
. If state is important, it can be easily and cheaply acquired on
top of a stateless infrastructure.
. If state can not be easily and cheaply acquired in a stateless
infrastructure, you can get the equivalent effect easily and
cheaply through synchronization.
. Synchronization is magically atomic, so you don't have to worry
about the fact that it is stateless.
. If synchronization is not atomic, and things change in the middle
of the synchronization, it is alright since you'll notice the
issue the next time you synchronize.

There is more to the faith of statelessness, but this ought to be enough
to see the overall picture.

The bottom line is that IMAP is a stateful, not a stateless, protocol; and
that "treating an IMAP session like a stateless HTTP request" is doomed to
failure.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-06 21:46:31 UTC
Permalink
Post by Mark Crispin
Crapware assumes that it is a network issue that somehow is utterly
irrecoverable in TCP, yet magically goes away if you tear down the TCP
connection and establish a new one.
hmmm ... why might they do such an obviously nonsensical thing ... wait,
maybe because it's how reality actually works? ever heard about
connection-tracking packet filters and routers? ill-tempered transparent
proxies? dial-up disconnects?
Post by Mark Crispin
Guess what happens when thousands of pieces of crapware are all doing
the same thing at the same time.
funny, how establishing some reasonable common guidelines for handling
loss of state in the standard could alleviate these problems to a
significant degree. unfortunately, the creator doesn't even acknowledge
the problem. oh, well. tough luck, i guess.
Dan White
2010-05-06 22:34:35 UTC
Permalink
Post by Oswald Buddenhagen
funny, how establishing some reasonable common guidelines for handling
loss of state in the standard could alleviate these problems to a
significant degree. unfortunately, the creator doesn't even acknowledge
the problem. oh, well. tough luck, i guess.
loss of tcp connection != loss of state.

In my own review of several IMAP RFCs, it's clear that connection problems
have been anticipated and several options have been standardized, such as
with uidvalidity and condstor (rfc4551) (among others), which allow a
client to quickly resynchronize its state with the server in the face of
networking issues.
--
Dan White
Oswald Buddenhagen
2010-05-06 23:14:46 UTC
Permalink
Post by Dan White
loss of tcp connection != loss of state.
loss of the state we are talking about here. it's all based on mark's
postulation that a tcp connection is reliable.
Post by Dan White
In my own review of several IMAP RFCs, it's clear that connection problems
have been anticipated and several options have been standardized,
i suggest you verify the chronology of events. and compare the names on
particular rfcs.
Mark Crispin
2010-05-06 23:47:57 UTC
Permalink
Post by Oswald Buddenhagen
loss of the state we are talking about here. it's all based on mark's
postulation that a tcp connection is reliable.
TCP connections are reliable. Run, don't walk, to your nearest technical
bookstore and read about network layering.

Crappy software implementations may be unreliable.
Post by Oswald Buddenhagen
Post by Dan White
In my own review of several IMAP RFCs, it's clear that connection problems
have been anticipated and several options have been standardized,
i suggest you verify the chronology of events. and compare the names on
particular rfcs.
Ah, sophomores who read a little and think that they understand all.

If you actually read through the history of IMAP RFCs, you would have read
RFC 1733 and learned the purpose of IMAP synchronization. You would have
also noticed when UIDs were introduced, and by whom. Next, you would have
learned the purpose of CONDSTORE.

But it's so much easier to jump to conclusions.

Before you mention QRESYNC, you should first see if anyone actually uses
it. The mobile device world barely stifled a yawn over the entire
LEMONADE effort, and no major commerical implementations are doing
anything about it. It turned out, as predicted years earlier, to be a
solution in search of a problem.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Brian Hayden
2010-05-07 00:36:49 UTC
Permalink
Post by Mark Crispin
Post by Oswald Buddenhagen
loss of the state we are talking about here. it's all based on mark's
postulation that a tcp connection is reliable.
TCP connections are reliable. Run, don't walk, to your nearest technical
bookstore and read about network layering.
TCP connections are more reliable than UDP; that does not mean they are
"reliable", full stop.

I agree that most ugly, stupid software too quickly resorts to dumping a
connection. But it sounds like you're just arguing the other, equally
wrong, extreme. Software that doesn't take into account that TCP
connections often do either fail completely or stall for so long as to
constitute a failure to, you know, a normal person who's trying to get
something done interactively, is just wrong. Patience is usually but not
always a virtue.

-Brian
Mark Crispin
2010-05-07 02:17:19 UTC
Permalink
Post by Brian Hayden
TCP connections are more reliable than UDP; that does not mean they are
"reliable", full stop.
You are using the wrong definition of reliable.

Reliable does not mean "does not fail".

UDP has no provision for reliability, ordering or data integrity. TCP
does all of these. Regardless of what happens in the link layer, TCP
delivers a sequential octet stream without duplication or missing data.
The only thing that terminates that stream is a session disconnect.
There is no such thing as "failure" in TCP.

What you think of as being "failure" are all application layer concept:

[1] The application received a session disconnect (FIN) from TCP. This is
completely an application concept; TCP considers this to be a completely
normal shutdown of the session.

[2] The application received a session reset (RST) from TCP. This
indicates that the application attempted to communicate with a TCP peer
that does not exist. This is what most people (mistakenly) call a "TCP
failure".

[3] The application unilaterally decides that a failure has occurred.

Now, [1] and [2] generally indicate the demise of the peer, with [1] being
the normal and expected result of a mutually-agreed upon demise. [2] is
not supposed to happen with debugged implementations, except when a
link-level disconnect outlasts a FIN-wait.
Post by Brian Hayden
TCP
connections often do either fail completely or stall for so long as to
constitute a failure
Now we come into myth vs. reality.

Many people have observed that web browsers seem to fail or stall, and
that hitting the refresh button seems to fix it. Because the web browser
uses HTTP over TCP, they falsely conclude that this is an attribute of TCP
and thus the equivalent of the refresh button is appropriate for other
protocols.

This conclusion is completely and utterly false; and is where stateful vs.
stateless comes in.

A web browser typically has multiple HTTP sessions in progress, each one
of which is charged with resolving a different URI as these are
encountered in the page. The whole idea is not to serialize the rendering
of the web page; the fate of a JPEG being loaded is independent of any
other piece.

A web server, in turn, is obliged to turn around requests as quickly as
possible. It poots data to the session and expects steady progress;
otherwise it abandons the session. The browser is somewhat more patient
but it too abandons the session if steady progress is not forthcoming.

Now comes the important part: The server routinely abandons sessions, or
refuses to initiate them, for load based reasons.

This is alright, because HTTP is stateless and drops are expected in HTTP.
There is no reason to believe that immediate retry will not succeed.

IMAP, on the other hand, is a stateful protocol. A server does not drop
IMAP sessions except under specification defined conditions:
[a] The server received a TCP-level FIN or RST from the client.
[b] Negotiated session disconnect (LOGOUT command).
[c] Server crash.
[d] 30 minute client inactivity timeout.

If your IMAP connection "stalls", there is no reason to believe that
disconnecting, creating a new connection, and retrying the operation will
in any way help the situation.

At best, it is a waste of effort; you destroyed your session state that
now has to be rebuilt to get you back to where you were before.

More typically, it is futile; the underlying problem impacts your new
session just as your old session was impacted. As in the best case, you
wasted effort; and now are worse off because you gave up all the data in
your state which you could have used.

In the worst case, it is harmful; not only is it futile, but it has also
made a bad situation (e.g., server overload) worse.

"But, but," you protest, "what if some router went out and came back up?"

TCP recovers from that; or would recover if you let it. Please read up on
the subject of TCP retransmissions and their algorithms including
backoffs. These old guys who came up with TCP 30+ years ago knew what
they were doing.

Now, you may feel that the standard for TCP retransmission algorithms may
need adjusting to reflect modern-day networking. You may be surprised to
learn that I agree; and that the backoffs to avoid swamping the 56KB links
on ARPAnet need updating for modern Ethernet and wireless link layers.

So hop to it. Get involved with the standards development process. Do
the experiments to work out what are suitable retransmission and backoff
algorithms in the modern world. RFC 2988 is nearly 10 years old; and in
particular sections 2.4 and 2.5 are probably completely obsolete and
should be changed to something quite different.

Don't duplicate TCP's functionality in the application layer (in a FAR
less efficient and effective manner).

Simple solutions to complex problems backfire.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Brian Hayden
2010-05-07 02:43:10 UTC
Permalink
Post by Mark Crispin
Reliable does not mean "does not fail".
Coincidentally, nobody said it did. Interesting!

This is another fun historical dissertation, at whose core is: "change the
RFCs, and until then maintain some righteous anger."
Post by Mark Crispin
[1] The application received a session disconnect (FIN) from TCP. This is
completely an application concept; TCP considers this to be a completely
normal shutdown of the session.
[2] The application received a session reset (RST) from TCP. This
indicates that the application attempted to communicate with a TCP peer
that does not exist. This is what most people (mistakenly) call a "TCP
failure".
[3] The application unilaterally decides that a failure has occurred.
Now, [1] and [2] generally indicate the demise of the peer, with [1] being
the normal and expected result of a mutually-agreed upon demise. [2] is
not supposed to happen with debugged implementations, except when a
link-level disconnect outlasts a FIN-wait.
That is quite a sleight of hand there. It makes your pats on the heads of
the "young'ns" look even sillier. You've oversimplified [2[ to the point
where it edges from "oversimplified" to "misleading."

-Brian
Mark Crispin
2010-05-07 02:59:04 UTC
Permalink
Post by Brian Hayden
Post by Mark Crispin
Reliable does not mean "does not fail".
Coincidentally, nobody said it did. Interesting!
If that was not your meaning in claiming that "TCP is not reliable", then
you don't know what you are talking about.

TCP is most certainly reliable.
Post by Brian Hayden
This is another fun historical dissertation, at whose core is: "change the
RFCs, and until then maintain some righteous anger."
I go to the trouble to teach you how things actually work, and you respond
with a typical nihilistic Gen-X retort.

The righteous thing is to follow the specifications; and if you think that
the specifications are incorrect then work to get them changed. You're
the one who seems to be angrily insisting that the specifications
shouldn't be followed.

And then to make stupid statements such as "TCP is not reliable".
Post by Brian Hayden
That is quite a sleight of hand there. It makes your pats on the heads of
the "young'ns" look even sillier. You've oversimplified [2[ to the point
where it edges from "oversimplified" to "misleading."
Pshaw. So you want to bring up Linux's "half-duplex" close behavior, eh?

That's irrelevant to RST in IMAP sessions.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Brian Hayden
2010-05-07 03:06:40 UTC
Permalink
Post by Mark Crispin
I go to the trouble to teach you how things actually work, and you respond
with a typical nihilistic Gen-X retort.
Was that what you said, or was it not? It quite clearly was, so the retort
was typically nothing (particularly indicative of a generation to which I
do not belong).
Post by Mark Crispin
The righteous thing is to follow the specifications; and if you think that
the specifications are incorrect then work to get them changed. You're
the one who seems to be angrily insisting that the specifications
shouldn't be followed.
Oh? I'd be quite curious for you to teach me where it was that I said such.
Post by Mark Crispin
And then to make stupid statements such as "TCP is not reliable".
Or stupid statements such as, "You're talking about failures at the
application layer, but you shouldn't try to address it in the application"?
Post by Mark Crispin
Pshaw. So you want to bring up Linux's "half-duplex" close behavior, eh?
You may project anything you wish...
Post by Mark Crispin
That's irrelevant to RST in IMAP sessions.
... so that you may be comfortable in refuting it.

-Brian
Paul Vixie
2010-05-07 12:59:02 UTC
Permalink
Date: Thu, 6 May 2010 19:59:04 -0700 (PDT)
...
I go to the trouble to teach you how things actually work, and you respond
with a typical nihilistic Gen-X retort.
The righteous thing is to follow the specifications; and if you think that
the specifications are incorrect then work to get them changed. You're
the one who seems to be angrily insisting that the specifications
shouldn't be followed.
And then to make stupid statements such as "TCP is not reliable".
...
i am a boomer not a genx'er, but i believe that i have earned the respect of
many genx'ers. let me speak intergeneratioanlly when i say that tcp as i
experience it in the field is unreliable. tcp in my house and isc's office
works fine. and i regularly sleep my laptop for several days and then wake
it up with no loss of tcp state (that is, i don't even need "screen" at home
unless i'm switching from laptop to desktop or similar.)

to reiterate, tcp as i experience it in the field is unreliable. in the field
i am using other peoples' networks. i cope with this unreliability in the
usual way, i restart my clients (including KDE Kontact/KMail) quite often.
Paul Vixie
2010-05-07 12:47:34 UTC
Permalink
Date: Thu, 6 May 2010 19:17:19 -0700 (PDT)
...
If your IMAP connection "stalls", there is no reason to believe that
disconnecting, creating a new connection, and retrying the operation will
in any way help the situation.
...
i was with you right up until that point. i regularly use other people's
access networks (hotel or airport wireless for example) and they regularly
flow-limit and rate-limit me. fairly often my only recourse, with ssh or
imap tcp sessions, is to abandon one without ceremony and start another.

in ssh i manage this by using the "screen" utility so that my shells and
editors stay running while i'm in between active connections. i've had
"screen" state last almost a year, several times. often only a system
upgrade/reboot will cost me my true "session state" for ssh.

if imap had something akin to "screen", i would be most pleased and i would
use it. since it doesn't and since i am not in control of the proxies and
NATs i traverse in my travels, i find imap's heavyweight statefulness to
be out of touch or perhaps even anachronistic.

not imap's fault, not uw-imapd's fault... but in this case the first mover
in "workarounds that hurt rule obeyers" was the invisible hand of the market.
Mark Crispin
2010-05-06 23:26:35 UTC
Permalink
Post by Dan White
loss of tcp connection != loss of state.
Correct. There's a further equation:

loss of network connectivity != loss of TCP session != loss of state

There is no reason to lose a TCP session because of a short-term loss of
network connectivity.

Although it is a myth that TCP/IP was designed to survive a nuclear war
(although I heard it c.1983 at the Internet engineering meeting in
Oberphaffenhoffen), an important design goal of TCP/IP is to be robust at
the Internet and Transport layers in the face of link layer outages.

Specifically, a link/network layer outage is NOT supposed to trigger a
disconnect at the internet/transport layers (TCP/IP), much less the data
layers (session, presentation, application).

I frequently amaze the young'uns with demonstrations of sessions that live
past a physical disconnect of the network. I'll even hibernate one of the
boxes and remove its battery. They act as if some magic trick has been
performed, rather than seeing TCP/IP working the way that it was designed
to work with software that follows the specifications.
Post by Dan White
In my own review of several IMAP RFCs, it's clear that connection problems
have been anticipated and several options have been standardized, such as
with uidvalidity and condstor (rfc4551) (among others), which allow a
client to quickly resynchronize its state with the server in the face of
networking issues.
Yes. But it's also important not to lose sight of the desirability of not
losing state to begin with.

The IMAP state resynchronization facilities are best seen as a means to
reacquire state after an intentional disconnect: the user exited his email
client, shut down his laptop, etc.

As error recovery, they are properly the last resort rather than the first
action to be taken. Above all else, state should not be glibly tossed out
in the assumption that error recovery will resychronize.

There's too much of the thinking of "my computer is giving me a problem,
so I'll reboot it. I don't want to wait for a shutdown, so I'll just pull
out the power plug."

It may be that rebooting is necessary when you have a problem. It may
even be that pulling out the power plug is necessary. But those are last
resorts. They are not routine measures, much less the very first thing
that you try!

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-06 23:32:17 UTC
Permalink
ever heard about connection-tracking packet filters and routers?
ill-tempered transparent proxies?
Tell us all about the connection-tracking packet filters and routers
that existed in 1986.
who cares what existed back then? we are talking about *today*. but you
are still arguing as if the internet looked the same as 25 years ago.
dial-up disconnects?
Dial-up disconnects did exist in 1986. Back in 1986, as long as the
remedial action (restoring the dial-up session) was taken in a reasonable
amount of time, all current TCP/IP sessions resumed as if the disconnect
never happened.
apparently you missed the invention of dynamic ip assignment.
No, it's not impossible. It just requires everybody to cooperate and do
things the right way.
if you don't realize how that is living in a dream world, then i really
can't help you. specifically, many (if not most) of the problems we are
facing are caused directly or indirectly by those who purposefully play
against the rules. i'd expect an NRA-campaigner to understand the
concept of defense and the price one has to pay for it.
Mark Crispin
2010-05-07 00:37:52 UTC
Permalink
Post by Oswald Buddenhagen
who cares what existed back then? we are talking about *today*. but you
are still arguing as if the internet looked the same as 25 years ago.
"Who cares about anything from the past? There is nothing to be learned
from past experience, and nothing done in the past applies today."

http://www.generationaldynamics.com/cgi-bin/D.PL?d=ww2010.i.java080701

Those who refuse to learn from the past end up repeating it.
Post by Oswald Buddenhagen
Dial-up disconnects did exist in 1986. Back in 1986, as long as the
remedial action (restoring the dial-up session) was taken in a reasonable
amount of time, all current TCP/IP sessions resumed as if the disconnect
never happened.
apparently you missed the invention of dynamic ip assignment.
"I don't know how to do it, therefore it's impossible."

I regularly do what other people believe is impossible.

It is not exceptionally difficult to resume IP connectivity after
disconnect for a dynamic IP. It just requires an engineer with the wit to
recognize that it is desirable to be able to do that, and the imagination
to work out how to do it.
Post by Oswald Buddenhagen
No, it's not impossible. It just requires everybody to cooperate and do
things the right way.
if you don't realize how that is living in a dream world, then i really
can't help you.
It is not a dream world to build things so that they work properly with
other things that work properly.

Nor is it a dream world to require that other things work properly.
Apple and Microsoft are both quite strict in the enforcement of their
rules; and have no hesitation to break things for those who violate them.

It is only with open standards that we find rule breaking as the norm; and
interestingly the same agents that vigorously enforce their rules seem to
have little problem with breaking rules to hurt competitors.
Post by Oswald Buddenhagen
specifically, many (if not most) of the problems we are
facing are caused directly or indirectly by those who purposefully play
against the rules.
The solution to problems caused by violation of the rules is not to
violate the rules further.

And if it turns out to be infeasible to get the rule violator fixed, the
workaround must not cause adverse impact to rule obeyers.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-07 07:30:26 UTC
Permalink
Post by Mark Crispin
It is not exceptionally difficult to resume IP connectivity after
disconnect for a dynamic IP. It just requires an engineer with the wit to
recognize that it is desirable to be able to do that, and the imagination
to work out how to do it.
yes, indeed. i could open a vpn to my private server with a static ip.
now, that's a solution which sounds like something most users would be
capable or even willing to do. riiiiight ...

and before you claim that it would be soooo trivial to let the dialup
reconnect assign the same ip: no, it won't. german consumer adsl-providers
typically forcibly disconnect sessions every 24 hours and assign a new
ip, with the explicit purpose to thwart users' attempts to violate their
TOS by running servers (of course, they call it "for technical reasons").
Post by Mark Crispin
Apple and Microsoft are both quite strict in the enforcement of their
rules; and have no hesitation to break things for those who violate them.
yes. and if such a standard gets obsolete, they deliver a new one or
become obsolete themselves.
Post by Mark Crispin
It is only with open standards that we find rule breaking as the norm;
yes, because it is cheaper in the short run. what do you expect? real
world, dude.
Post by Mark Crispin
specifically, many (if not most) of the problems we are facing are
caused directly or indirectly by those who purposefully play against
the rules.
The solution to problems caused by violation of the rules is not to
violate the rules further.
given the choice between something that works poorly and something which
doesn't work at all, most people will choose the former. how surprising
...
Post by Mark Crispin
And if it turns out to be infeasible to get the rule violator fixed, the
workaround must not cause adverse impact to rule obeyers.
i find it impossible not to think about your NRA association when you
make such statements ...
support
2010-05-06 10:09:36 UTC
Permalink
Post by Oswald Buddenhagen
Post by Mark Crispin
If they can understand the difference between TCP and UDP, they can
understand state.
I wouldn't be so certain. If you look at a lot of applications these
days, they blat out a query, and read an answer. If they don't get an
answer in what they consider to be a suitable time, they drop the
connection and try again.
Well, that's not paying attention to what protocol (TCP) they're using,
or the
protocol (IMAP) needs an adjustment perhaps (I don't know what the protocol
is, so forgive me) for an out-of-band inquiry, as in "are you still alive?"
Or if they were (able) to send another message down the pipe, saying
"forget it
and goodbye" so that at least the IMAP client knows to go away immediately.
My IMAP jams and I have to kill processes to free it up, if I do work
from home
and then return to office processes which have been idle for many hours
or over
a weekend, the mail reader (Thunderbird in my case) seems to return a
"nothing new" from the processes despite having changed the folder contents
at home. If it's Thunderbird's problem, I can't fix it from Thunderbird,
although
I haven't killed TB to try that hypothesis (duh.)

Naively, "how hard could it be to rewrite IMAP handling in Thunderbird?"
Doesn't seem like a massive project, overall. From what I "hear" here I
estimate
the protocol should be smart enough for me as a client to asynchronously
clear,
reset or signal a connection to the IMAP client. If that's not true, it
requires
bookkeeping on the client side to know whether a wait was anomalous, and if
the same client can't be talked to during a "long local file operation",
I wouldn't
so quickly blame TB. The user never considers how long things might take
on the
server, though I've learned to be patient to let big moves finish.
Sometimes I
never get an "ACK" for operation done (large move seems stuck) and have
to kill
or retry something to resync TB to the server's actual state.
Doing stupid things to obtain certainty (in real time for the user) is
expedient;
were the people who wrote TB that clueless? If IMAP provides the right
feedback
hooks, the TB authors should have availed themselves of them to manage
remote processes more smartly.

I had looked for a timeout flag for the IMAP client in inetd to set a
timeout
for an idle connection, is there one? I'd've been using it by now if
there were,
so I think the answer is no.
Post by Oswald Buddenhagen
Post by Mark Crispin
Put another way, they treat TCP just like UDP, only with this strange and
annoying "connection" stuff that they don't understand why it's there.
That's "not understanding the difference between TCP and UDP."
Tell them they're "misusing the IP API" and they might get it.
Didn't they ever wonder why the network calls and handling were
different if the protocols "were the same"?
Post by Oswald Buddenhagen
you may rail about the stupidity of those developers as much as you
want. apparently unlike you, they live in the real world where the only
guarantee which tcp provides is that the data stream is intact - *if* it
arrives. timeouts as a workaround for shortcomings of the tcp/ip stack
under real-world conditions are a perfectly reasonable approach, just
like keep-alive signals in higher-level stateful protocols to prevent
timeouts. if you want to do better, you can use imap directly over ATM
or ISDN or other really stateful protocols. but even these sometimes
just go deaf, because they run with crappy real-world system software,
too.
Mark Crispin
2010-05-06 21:42:24 UTC
Permalink
Post by Oswald Buddenhagen
you may rail about the stupidity of those developers as much as you
want. apparently unlike you, they live in the real world where the only
guarantee which tcp provides is that the data stream is intact - *if* it
arrives.
"Mr. Newton, your notions about gravity are of no use to those of us in
the real world where heavier objects fall faster than lighter objects."

An earmarks of a sham argument is "we live in the real world".
Post by Oswald Buddenhagen
timeouts as a workaround for shortcomings of the tcp/ip stack
under real-world conditions are a perfectly reasonable approach,
"Tailgating and cutting across multiple lanes of traffic as a workaround
for the shortcomings of highways under real-world conditions are a
perfectly reasonable approach."

It is important to grasp that an action that may seem to be beneficial in
empirical testing may in fact be quite harmful, not just to other agents
but also ultimately to oneself; and furthermore that the benefits are
more illusory than real.

In the highway example, there are agents called "police officers" whose
function includes issuing traffic tickets, thus inflicting lesser harm (a
fine and points on one's driving license) as a pedagogical attempt to
avoid the greater harm of a mass collision.

The benefit -- and demerit -- of open standards is that there are no
police officers. Open standards depend upon voluntary compliance.
Closed standards, through their patents and licenses, can (and do!)
enforce compliance.

It's also important to grasp that what one things are "shortcomings" may
in fact be there for a reason; and that when you do something bad to
"workaround a shortcoming" you may be sabotaging something important.
Post by Oswald Buddenhagen
just
like keep-alive signals in higher-level stateful protocols to prevent
timeouts.
The difference is that in higher level stateful protocols, both the
timeouts and the keepalives are defined in the specification, along with
the rules that both must follow.

The problem is with ad-hoc actions based upon incomplete or misunderstood
empirical evidence.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Oswald Buddenhagen
2010-05-06 22:11:07 UTC
Permalink
Post by Mark Crispin
Post by Oswald Buddenhagen
you may rail about the stupidity of those developers as much as you
want. apparently unlike you, they live in the real world where the only
guarantee which tcp provides is that the data stream is intact - *if* it
arrives.
"Mr. Newton, your notions about gravity are of no use to those of us in
the real world where heavier objects fall faster than lighter objects."
An earmarks of a sham argument is "we live in the real world".
you want rhetorics? how about that one: an earmark of a denialist
argument is "but it is the *law* [never mind that the constraints make
it not applicable in the given case]". let me illustrate that:
Loading Image...
Post by Mark Crispin
Post by Oswald Buddenhagen
timeouts as a workaround for shortcomings of the tcp/ip stack
under real-world conditions are a perfectly reasonable approach,
"Tailgating and cutting across multiple lanes of traffic as a workaround
for the shortcomings of highways under real-world conditions are a
perfectly reasonable approach."
"not everything which looks like an analogy is actually a turd ..." err,
wait, i think that went differently ...

i'll be way more impressed when you actually propose something that
works on a large scale instead of insisting on your idealized worldview
and insulting the intelligence of everyone who tries to solve very real
problems.
Mark Crispin
2010-05-06 22:41:54 UTC
Permalink
Post by Oswald Buddenhagen
i'll be way more impressed when you actually propose something that
works on a large scale instead of insisting on your idealized worldview
and insulting the intelligence of everyone who tries to solve very real
problems.
Problems are not solved by violating specifications.

Problems are not solved by prattling "I am in the real world", "I am
trying to solve very real problems", "your idealized world" when what you
really mean is that you are too lazy to work out how to solve problems AND
comply with specifications.

It takes hard work, and time, to understand a problem well enough to
design, build, and deploy the correct solution. It is tempting to throw
together a kludge that covers up the symptom, and let the problems caused
by the kludge be Someone Else's Problem.

And that is how crapware is produced.

If you think that you can create something better than IMAP, hop to it.
Judging from what I've seen of KDE, I'm not particularly worried.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Linda Walsh
2010-05-03 01:50:09 UTC
Permalink
Post by support
We have users whose folders are between 100 MB and 800 MB in size.
Most of those users are using Outlook but some are using Thunderbird.
---
Not to negate anything Mark has stated about large email folder. I
generally keep my mboxes below 100MB except for archives, however,
comparing performance on the same machine, so I don't have a great deal
of experience with mboxes of such size. I've generally tuned my mboxes
for the size of my 'server machine'. On older machines, 10-20MB was the
limit of my comfort level mbox size. My current server can handle up to
100MB or more without much inconvenience, but I wouldn't want to go much
beyond that for my purposes.


What I can speak about is improving performance on a particular
piece of hardware. My current distro, SuSE11.2, sad to say, gave me a
large hiccup when I last upgraded from 11.1->11.2. One of aspects of
this hiccup was drop of inclusion of imap-gw as a default inclusion (not
a major hangup, I've built it before, could again). They tried to set
me up with cyrus -- which seemed rather unwieldy in comparison, not to
mention, I don't like the idea of giving up my 'mbox' format (works with
too many text searches when I'm logged into my server). So I looked
around. I know there could be those who might think me a traitor for
giving up so quickly, but not having tried anything other than umap-gw
for the past ...geez, 5-10 years, I might be a bit naive about the
landscape.

What I stumbled upon was 'dovecot' (goog: 'dovecot imap'). It
supports the same formats as the imap-gw(mbox, and mail format, the
differences between I never fully groked), as well as the format used by
cyrus. It includes built in search capabilities that have enabled my
mozilla-Tbird on Windows-client, -based searches, even on unretrieved
bodies, to go VERY fast -- near the speeds of me logged in using grep.
It's ability to handle multiple mboxes and connections is also
impressive as well as it's attention to security.

Under 'imap-gw', I generally used 2-4 connections max, to imap-gw,
othersize, I ended up with too many server-dropped messages (I have
~70-80 mboxes that are checked as active receive points for new mail --
not just an Inbox). So now I run with about 16 connections
concurrently, and my email checks don't begin to bog down in my imap
server, I could use more threads, but with my disk-subsystem, it doesn't
make sense (I'd need more 2nd-level-read spindles offered from
RAID50/60).

It uses separate threads for each connection, and seems to have no
problem putting the load on the disk. In short, it looks like 'dovecot'
might be able to be recommended, as a imap-gw compatible replacement if
imap-gw isn't working for you, as is. imap-gw is a workhorse with an
excellent track record, so I wouldn't move if you don't need to.

Dovecot, though, seems to handle larger numbers of concurrent
connections more reliably than I experienced with imap-gw (but I'm not a
typical email customer). That said, with 800MB mboxes, you need a good
wide first level RAID (0, 5, 6), and for multiple clients, or high
numbers of connections, a good RAID50/60 with which I believe the
2nd-level '0' goes straight to number of concurrent I/O's it can support
in parallel.

No matter the client -- the backend of the client is vital. I use xfs
(no-barriers with battery backed-up UPS for the server), to get the best
performance for read/write on large files.

Hope this isn't taken as stepping on anyone's fingers -- and maybe Mark
could look at dovecot and give a thumbs up/down for anyone else -- I'd
trust his judgment on imap servers/clients over mine in a heartbeat. My
experience is often 'unique', to my setup and workload. Just wanted to
share it as I sorta got pushed to try something else and didn't want to
give up my mbox format. Dovecot is what I ended up with.



Linda Walsh
Mark Crispin
2010-05-03 02:41:30 UTC
Permalink
Dovecot is a good server. It is one of only two (the other being Panda
IMAP) that fully passes IMAP compliance testing:
http://imapwiki.org/ImapTest/ServerStatus
[UW IMAP flunks two of the tests...it hasn't been updated in 2 years.]

The main concern that I have with using Dovecot for traditional UNIX
mailbox files ("mbox") is that Dovecot gives up some of the aggressive
compatibility with ancient/stupid mbox practices for performance.

UW (and Panda) try damn hard to be compatible with even the most ancient
stupid things that people do with mbox files; and take a considerable
performance hit for doing so.

UW and Panda assume the worst about mbox. It assumes that NFS is probably
in the picture, that you may well be farting around in some ancient 1980s
mbox tool at the same time that the IMAP server is trying to do something;
and thus it has to go through extreme checks to make sure that your mbox
file doesn't get trashed.

This aggressive support for worst case was there for a reason. That worst
case actually existed once upon a time. I hope that it is forever
extinct, but people tend to do crazy things... Oh well, mankind will
probably survive even though it refuses to take my advice... :)

The issues to be aware of in Dovecot are:

[1] Access to the mbox files via NFS; a true idiocy but nonetheless one
that UW itself insisted upon doing for years (over my repeated and
vigorous objections). I don't think that Dovecot tries to make an NFS
back end work right. I wouldn't blame him for not doing so; most of the
UW IMAP performance slowdown with mbox files is code to make NFS work (for
a half-assed "sort of" definition of "work").

[2] Dovecot runs multi-threaded (which itself requires OS support), and
the threads exchange state information. Among other things, this allows
significant performance benefits and multiple read-write access to the
same mbox format mailbox...as long as Dovecot is the only consumer of the
mbox file. Once again, UW IMAP did not have the luxury of being able to
assume that.

The nice thing about mix format is that there was no need to be compatible
with ancient idiocies; and as a result mix is so much faster. Even
without the threading, mix is probably faster than Dovecot on mbox because
even Dovecot has to do some mbox operations the hard way.

Nonetheless, if you really need mbox format, and are sure that you won't
be running dinoware and/or doing stupid things like access via NFS, then
Dovecot is definitely an option to consuder.

If you want to use maildir format, I would go further and say that Dovecot
is the ONLY choice; do not use an unsupported third-party driver in UW and
especially do not use Courier.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Linda Walsh
2010-05-03 04:05:52 UTC
Permalink
Post by Mark Crispin
UW (and Panda) try damn hard to be compatible with even the most ancient
stupid things that people do with mbox files; and take a considerable
performance hit for doing so.
Probably from having read your rantings on the topic before, I'd not do
any of those things...but memories, while rarely ever lost, are sometimes
depointerized through lack of use, so anything's possible.
Post by Mark Crispin
[2] Dovecot runs multi-threaded (which itself requires OS support), and
the threads exchange state information. Among other things, this allows
significant performance benefits and multiple read-write access to the
same mbox format mailbox...as long as Dovecot is the only consumer of the
mbox file. Once again, UW IMAP did not have the luxury of being able to
assume that.
---
Seems to be fine with me using the 'mbox.lock' locking files to
gain exclusive access. I believe was a compatibility setting somewhere.
Post by Mark Crispin
The nice thing about mix format is that there was no need to be compatible
with ancient idiocies; and as a result mix is so much faster. Even
without the threading, mix is probably faster than Dovecot on mbox because
even Dovecot has to do some mbox operations the hard way.
---
I think the current version of dovecot does suport 'mix' (folders and
messages in same?), but I didn't test it -- didn't want to screw up my
working mail store. Maybe I'll eventually feel braver, out of curiosity.
Post by Mark Crispin
Nonetheless, if you really need mbox format, and are sure that you won't
be running dinoware and/or doing stupid things like access via NFS, then
Dovecot is definitely an option to consuder.
If you want to use maildir format, I would go further and say that Dovecot
is the ONLY choice; do not use an unsupported third-party driver in UW and
especially do not use Courier.
Doesn't Cyrus use maildir format, or is Cyrus=Courier? One dir, many little
files? just seemed likea mess to me. But my 80+ active mailboxes might
seem a mess to some. No reason to NFS -- the IMAP server should be
where the source files were and
use it to mitigate access...using NFS and IMAP... two means to access same
read/write share would almost inevitably lead to a mess. I'm still trying
synchronize everything between smb and local views of regular files, and end
up with observable quirks.

One of the reasons that drew me to Dovecot was that my OS does support
threads, so I wanted to use use things that provide multi-thread usage to
better parallelize my workload -- it's the only way I'll ever do a
better job
of processor utilization.

Thanks for the appraisal -- makes me feel like I wasn't crazy for moving the
direction I did, given my hardware/software setup.

-linda
Mark Crispin
2010-05-03 04:51:26 UTC
Permalink
Post by Linda Walsh
Seems to be fine with me using the 'mbox.lock' locking files to
gain exclusive access. I believe was a compatibility setting somewhere.
The .lock file is a delivery lock; to prevent more than one agent from
writing (= appending) to the mbox at the same time. It doesn't
synchronize between agents which hold state on the mbox.

The main issue is if any other mail reading program is consuming the mbox.
If Dovecot is the only consumer you will be OK. But if you have other
consumers (including Pine, Alpine, elm, /usr/ucb/mail, UW IMAP, etc.)
accessing the mbox while Dovecot is doing its thing there may be a
problem.
Post by Linda Walsh
I think the current version of dovecot does suport 'mix' (folders and
messages in same?), but I didn't test it -- didn't want to screw up my
working mail store. Maybe I'll eventually feel braver, out of curiosity.
I wasn't aware of Dovecot supporting mix. As far as I know, Dovecot only
supports maildir (its preferred format) and mbox.
Post by Linda Walsh
Doesn't Cyrus use maildir format, or is Cyrus=Courier?
No to both.

Cyrus format is a completely different format, with more in common with
netnews than maildir.

What may have confused you is that both Cyrus and maildir put each message
into a separate file. However, Cyrus does extra stuff to make it scale a
bit better.

Maildir, in turn, does extra stuff to be NFS-safe at the cost of not being
at all ameniable for IMAP. Dovecot actually implements a modified version
of maildir which is not NFS-safe...
Post by Linda Walsh
One dir, many little files? just seemed likea mess to me. But my 80+
active mailboxes might seem a mess to some.
Some people have many more mailboxes than that.
Post by Linda Walsh
No reason to NFS -- the IMAP server should be where the source files
were and use it to mitigate access...using NFS and IMAP... two means to
access same read/write share would almost inevitably lead to a mess.
Well, then, you are more sensible than a great many people! ;)
Post by Linda Walsh
I'm still trying
synchronize everything between smb and local views of regular files, and end
up with observable quirks.
Hey, if you really want fun and laughter, try synchronizing SMB, NFS, and
local files. Simultaneously! ;)
Post by Linda Walsh
One of the reasons that drew me to Dovecot was that my OS does support
threads, so I wanted to use use things that provide multi-thread usage
to better parallelize my workload -- it's the only way I'll ever do a
better job of processor utilization.
I think that you may have mistaken what Dovecot's multi-threading does.

The multi-threading allows multiple simutaneous read/write access to an
mbox format mailbox, as long as Dovecot is the only consumer of the mbox
file (and you don't want to violate that assumption). It does this by
exchanging semaphores between the threads, which run in the same process;
otherwise there are no such semaphore with mbox format.

You get the same level of service in UW IMAP and Panda IMAP using mix
format. mix has its own equivalent semaphores and does not need to be
multi-threaded.

My new IMAP server at Messaging Architects is in fact also multi-threaded,
but it doesn't need the threading for semaphore exchange. It uses an
expanded form of mix that has metadata and stubbing (which I call "virtual
mailboxes" and am quite happy with/proud of). Right now, we're just using
the stubbing for user quarantines, in which the per-user quarantine
mailbox has stubbing pointers into the global quarantine which contains
the actual messages. The other extension in mix is that it is
clustered(!).
Post by Linda Walsh
Thanks for the appraisal -- makes me feel like I wasn't crazy for moving the
direction I did, given my hardware/software setup.
Yes, Dovecot is a reasonable server; and as I said in a previous message
Dovecot and Panda IMAP are the only two servers which are tested to be
fully compliant.

I haven't yet had my new MA server tested yet, mostly because there are
some known issues in the underlying storage architecture that need to be
resolved first. I expect that it will eventually test fully-compliant as
well.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Timo Sirainen
2010-05-03 08:32:14 UTC
Permalink
A bit too much misinformation here so I'll have to reply :)
Post by Mark Crispin
The main issue is if any other mail reading program is consuming the mbox.
If Dovecot is the only consumer you will be OK. But if you have other
consumers (including Pine, Alpine, elm, /usr/ucb/mail, UW IMAP, etc.)
accessing the mbox while Dovecot is doing its thing there may be a
problem.
Dovecot allows non-Dovecot programs to access mbox files. As long as they use compatible read/write locks, there aren't any corruption problems. The only potential problem is that flag changes and such may not be noticed immediately, but there are also settings to make Dovecot read/write the mbox state more aggressively (so worse performance). But the default behavior is actually pretty much the same as uw-imap's.
Post by Mark Crispin
I wasn't aware of Dovecot supporting mix. As far as I know, Dovecot only
supports maildir (its preferred format) and mbox.
There's a mix-inspired upcoming new mailbox format "mdbox" (or multi-dbox, also dbox=single-dbox which uses compatible mail files, but only single mail/file).
Post by Mark Crispin
Maildir, in turn, does extra stuff to be NFS-safe at the cost of not being
at all ameniable for IMAP. Dovecot actually implements a modified version
of maildir which is not NFS-safe...
Many people are using Dovecot with NFS, but you're right, it's not entirely safe because I assumed I could flush NFS caches as necessary, but that didn't turn out to work as well as I expected.
Post by Mark Crispin
I think that you may have mistaken what Dovecot's multi-threading does.
There is no multi-threading in Dovecot! Multiple processes, sure, but it's single-threaded everywhere. (But there is initial support for handling multiple client connections in a single process (in a single thread).)
Post by Mark Crispin
The multi-threading allows multiple simutaneous read/write access to an
mbox format mailbox, as long as Dovecot is the only consumer of the mbox
file (and you don't want to violate that assumption). It does this by
exchanging semaphores between the threads, which run in the same process;
otherwise there are no such semaphore with mbox format.
IPC is done only via filesystem._______________________________________________
Imap-uw mailing list
Imap-***@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw
Mark Crispin
2010-05-03 15:32:31 UTC
Permalink
For what it's worth: Timo is the author of Dovecot. His comments about
its implementation should be considered authoritative. Mine are based
upon memory and second-hand/third-hand information.

What is important - and what I will/do comment upon - is whether or not
another server is compliant with the specification. Dovecot is compliant.

I guess that the threaded semaphores stuff was in Communigate Pro. Linda
comment about threading obviously confused me.

So, if I read you correctly, shared mbox access doesn't communicate flag
changes? You don't use an external index file to avoid having to re-read
the entire file? Do you allow shared expunge?

Did you ever test it over NFS and SMB (and NFS and SMB simultaneously)?
That's the kind of crap that I had to support when I did the code in UW
IMAP. I hope that nobody has to support such nonsense ever again.
Post by Timo Sirainen
A bit too much misinformation here so I'll have to reply :)
Post by Mark Crispin
The main issue is if any other mail reading program is consuming the mbox.
If Dovecot is the only consumer you will be OK. But if you have other
consumers (including Pine, Alpine, elm, /usr/ucb/mail, UW IMAP, etc.)
accessing the mbox while Dovecot is doing its thing there may be a
problem.
Dovecot allows non-Dovecot programs to access mbox files. As long as
they use compatible read/write locks, there aren't any corruption
problems. The only potential problem is that flag changes and such may
not be noticed immediately, but there are also settings to make Dovecot
read/write the mbox state more aggressively (so worse performance). But
the default behavior is actually pretty much the same as uw-imap's.
Post by Mark Crispin
I wasn't aware of Dovecot supporting mix. As far as I know, Dovecot only
supports maildir (its preferred format) and mbox.
There's a mix-inspired upcoming new mailbox format "mdbox" (or
multi-dbox, also dbox=single-dbox which uses compatible mail files, but
only single mail/file).
Post by Mark Crispin
Maildir, in turn, does extra stuff to be NFS-safe at the cost of not being
at all ameniable for IMAP. Dovecot actually implements a modified version
of maildir which is not NFS-safe...
Many people are using Dovecot with NFS, but you're right, it's not
entirely safe because I assumed I could flush NFS caches as necessary,
but that didn't turn out to work as well as I expected.
Post by Mark Crispin
I think that you may have mistaken what Dovecot's multi-threading does.
There is no multi-threading in Dovecot! Multiple processes, sure, but
it's single-threaded everywhere. (But there is initial support for
handling multiple client connections in a single process (in a single
thread).)
Post by Mark Crispin
The multi-threading allows multiple simutaneous read/write access to an
mbox format mailbox, as long as Dovecot is the only consumer of the mbox
file (and you don't want to violate that assumption). It does this by
exchanging semaphores between the threads, which run in the same process;
otherwise there are no such semaphore with mbox format.
IPC is done only via filesystem.
-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Timo Sirainen
2010-05-03 16:09:20 UTC
Permalink
Post by Mark Crispin
For what it's worth: Timo is the author of Dovecot. His comments about
its implementation should be considered authoritative. Mine are based
upon memory and second-hand/third-hand information.
What is important - and what I will/do comment upon - is whether or not
another server is compliant with the specification. Dovecot is compliant.
I guess that the threaded semaphores stuff was in Communigate Pro. Linda
comment about threading obviously confused me.
Yes, probably.
Post by Mark Crispin
So, if I read you correctly, shared mbox access doesn't communicate flag
changes? You don't use an external index file to avoid having to re-read
the entire file? Do you allow shared expunge?
I use an external index file to avoid re-reading entire file again and
allow shared expunge. But there are all kinds of tricks to get better
performance and ability to use non-Dovecot software to access the
mboxes:

1) If mbox file's mtime and size match what is stored in index, index is
assumed to be up to date and mbox file isn't even opened until
necessary.

2) If mtime changes but size doesn't, assume that someone else wrote
flag changes to messages -> re-read the entire mbox file.

3) If file size decreases, assume expunged messages -> re-read entire
mbox file.

4) If file size increases, assume a new message was appended -> try to
read it. If the reading fails (no valid From_-line at expected offset),
re-read the entire mbox file. If reading succeeds, enable "dirty flag",
because it's not known if there could have been also other changes.
Whenever entire mbox file is re-read, the dirty flag is cleared.

5) Whenever reading a message from cached offset, verify that there's a
valid From_-line. If dirty flag is set, verify also that X-UID: header
is for expected message. If either fails, re-read the mbox file.

6) Whenever SELECTing mbox file and dirty flag is set, optionally either
re-read mbox file (default) or just open it and keep the dirty flag.

7) Writing flag changes (and other header updates) to mbox file are
optionally delayed (default), until mailbox is closed or messages are
expunged or CHECK is run. This is same as with UW-IMAP I think. If
non-Dovecot MUA changes flags during this session and Dovecot also
notices those changes (due to above checks), the changes that don't
conflict with internal unwritten flag changes are applied to index.

Also as long as Dovecot is the only thing modifying the mbox file, state
is shared via index files, so 1) check always succeeds and the
performance stays good.

As long as the only changes are appends by (non-Dovecot) MDA, only 1)
and 4) can happen and there are again no problems. If non-Dovecot MUA
does other changes, Dovecot might not always notice the changes
immediately, but it never causes corruption.

Some small details above are probably incomplete.

Filesystems are also beginning to support micro/nanosecond mtime
resolution (well, I guess everything except ext2/ext3 does nowadays), so
saving the timestamp in nanosecond resolution could also help notice
external changes more reliably. But I haven't bothered to add support
for that.
Post by Mark Crispin
Did you ever test it over NFS and SMB (and NFS and SMB simultaneously)?
That's the kind of crap that I had to support when I did the code in UW
IMAP. I hope that nobody has to support such nonsense ever again.
I don't know about SMB, but some people are using it over NFS and I
haven't heard complaints for a while. It should work pretty well as long
as fcntl locking is used, because it reliably also clears NFS caches
(hoping of course that nfs.lockd itself doesn't break).
Mark Crispin
2010-05-03 17:17:06 UTC
Permalink
Post by Timo Sirainen
1) If mbox file's mtime and size match
Perhaps that works better today than 20 years ago. Back then, you could
not trust mtime to reflect reality in any reasonable way particularly when
NFS was involved. The most common circumstance is that mtime simply
wasn't updated. This happened even with local files. The explanation
that I got at the time was that it was somehow "inefficient" to keep the
mtime reliably up to date.

UW imapd doesn't trust mtime for any purpose, and takes a big hurt for
that.
Post by Timo Sirainen
3) If file size decreases, assume expunged messages -> re-read entire
mbox file.
This is reasonable if you have UIDs in the file (which of course is the
case today) since that allows you to resynchronize nicely. In fact,
that's excactly how mix resynchronization works.

Back in the day, there was no good way for UW imapd to resynchronize in
this case. Oh, it could have done an MD5 checksum of each message, but...
Post by Timo Sirainen
7) Writing flag changes (and other header updates) to mbox file are
optionally delayed (default), until mailbox is closed or messages are
expunged or CHECK is run. This is same as with UW-IMAP I think.
Yes.
Post by Timo Sirainen
Also as long as Dovecot is the only thing modifying the mbox file, state
is shared via index files, so 1) check always succeeds and the
performance stays good.
Those three things: being able to trust mtime, index files, and the
ability to resynchronize, are the big things in Dovecot. For various
reasons those very things weren't feasible in the day (now nearly 20 years
ago) when UW IMAP's mbox code was first written; and when it became
feasible it was done in more modern formats (first mbx, then mix).
Post by Timo Sirainen
Filesystems are also beginning to support micro/nanosecond mtime
resolution
Long overdue!!
Post by Timo Sirainen
It should work pretty well as long
as fcntl locking is used, because it reliably also clears NFS caches
(hoping of course that nfs.lockd itself doesn't break).
That's a big hope; and in my experience a futile one. I used to be able
to tell when SUN broke my test for NFS (and thus not use fcntl locks) when
I would get reports of cluster-wide hangs on Solaris boxes.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Timo Sirainen
2010-05-03 22:57:52 UTC
Permalink
Post by Mark Crispin
Post by Timo Sirainen
1) If mbox file's mtime and size match
Perhaps that works better today than 20 years ago. Back then, you could
not trust mtime to reflect reality in any reasonable way particularly when
NFS was involved. The most common circumstance is that mtime simply
wasn't updated. This happened even with local files. The explanation
that I got at the time was that it was somehow "inefficient" to keep the
mtime reliably up to date.
UW imapd doesn't trust mtime for any purpose, and takes a big hurt for
that.
It's still true that NFS usually has attribute caching enabled, and mtime doesn't necessarily update (I think default is something like max. wait of 60 seconds after change). But that's what I tried to prevent with my attempts to force NFS clients to flush their caches. This mtime flushing actually works pretty easily in all modern OSes: just open and close the file and then stat/fstat. Other types of NFS cache flushes work less well. And people don't like to disable the caching, since it increases load by 10x in the NFS server.
Post by Mark Crispin
Post by Timo Sirainen
3) If file size decreases, assume expunged messages -> re-read entire
mbox file.
This is reasonable if you have UIDs in the file (which of course is the
case today) since that allows you to resynchronize nicely. In fact,
that's excactly how mix resynchronization works.
Back in the day, there was no good way for UW imapd to resynchronize in
this case. Oh, it could have done an MD5 checksum of each message, but...
Yeah, I actually also fallback to MD5 of a few specific headers if X-UID: headers haven't been written to disk yet.
Post by Mark Crispin
Post by Timo Sirainen
Also as long as Dovecot is the only thing modifying the mbox file, state
is shared via index files, so 1) check always succeeds and the
performance stays good.
Those three things: being able to trust mtime, index files, and the
ability to resynchronize, are the big things in Dovecot. For various
reasons those very things weren't feasible in the day (now nearly 20 years
ago) when UW IMAP's mbox code was first written; and when it became
feasible it was done in more modern formats (first mbx, then mix).
The annoying thing with Dovecot's mbox optimizations is that they're pretty complex and I'm sure there are bugs there. Also it's difficult to sometimes figure out if something is a bug or just a side effect of some other software modifying the mbox, possibly with incompatible locking rules, etc.. So I'm kind of hoping people would stop using mbox. :) Or maybe I could at least simplify the code._______________________________________________
Imap-uw mailing list
Imap-***@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw
Mark Crispin
2010-05-04 04:09:04 UTC
Permalink
Post by Timo Sirainen
This mtime
flushing actually works pretty easily in all modern OSes: just open and
close the file and then stat/fstat.
Yup. You just said it: "modern OSes"...

I remember an OS which would not update mtime if ANY local agent had the
file open. So, no matter how many times you did an open/close/stat, you
would get the same out of data data. That, interacting with NFS attribute
caching, made things quite painful.

Perhaps this is now just a sad memory and no longer needs to be worried
about.
Post by Timo Sirainen
Yeah, I actually also fallback to MD5 of a few specific headers if
X-UID: headers haven't been written to disk yet.
That may work as long as Received: headers are included.

Also, these days, MD5 is not patent encumbered nor is it under any
export restrictions. That wasn't the case back then...

UW IMAP has no such thing as "X-UID headers haven't been written to disk
yet" for existing messages. That state only exists with new mail, and the
first thing UW IMAP does is write those X-UID headers. Safer, but slower.
Post by Timo Sirainen
The annoying thing with Dovecot's mbox optimizations is that they're
pretty complex and I'm sure there are bugs there. Also it's difficult to
sometimes figure out if something is a bug or just a side effect of some
other software modifying the mbox, possibly with incompatible locking
rules, etc..
Yeah, and when I was supporting 80,000 people using that format over NFS I
did not want to take that risk.
Post by Timo Sirainen
So I'm kind of hoping people would stop using mbox. :)
You and me both! ;)

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Linda Walsh
2010-05-04 05:51:09 UTC
Permalink
Post by Mark Crispin
Post by Timo Sirainen
So I'm kind of hoping people would stop using mbox. :)
You and me both! ;)
----
Grr.... you fraggle-robbin $#@!... My antique perl code's been doing
fastidious
locking since ...well a long time! plblblbl... I can't help I like the
compactness
of having a bunch of messages in 1 file. I have between 70-80 'active'
(meaning they
get incoming messages), and maybe 120 folders like this (single file,
multi-message)
overall, with file sizes ranging up to 20-30MB as norm, maybe 60-80MB in
archives,
vast majority under 10MB, but message totals? Gads...at 1000-2000
messages/day,
my local file count would be extreme. So with large disk systems, I
optimize for
large files where xfs does better, but small files and large number, and
my filesystem
would suffer (Actually that 1000-2000 count has probably dropped since I
fell off
of lkml again...;) ).

Given the slowness of today's disks in seeking, it's a good tradeoff,
one that may
not be necessary, after the next DOJ anti-trust lawsuit against
solid-state drive
manufacturers -- probably not till 2012-2013 at the rate they move (unless
some non-colluders enter the market place and force prices down
significantly
before then)... ;^). With solid-state disks as fast as todays hard
disks and
seek speeds 100x-1000x faster, all benchmarks are off, though still xfs
does show
lowest SYSTEM cpu usage in comparable benchmarks of any fs.

But with solid state the differences may be down in the noise level.

(just had to speak up for the mbox'ers-who-follow locking-club .... ;) )
-l
Paul Vixie
2010-05-04 07:10:52 UTC
Permalink
Date: Mon, 3 May 2010 10:17:06 -0700 (PDT)
Post by Timo Sirainen
1) If mbox file's mtime and size match
Perhaps that works better today than 20 years ago. Back then, you could
not trust mtime to reflect reality in any reasonable way particularly
when NFS was involved. The most common circumstance is that mtime simply
wasn't updated. This happened even with local files. The explanation
that I got at the time was that it was somehow "inefficient" to keep the
mtime reliably up to date.
nfs file attribute changes are always at least three seconds out of date
as witnessed between a process running on a client and process running on
the server and possibly much longer between two processes running on two
clients, because of the way the caching/pipelining works. this got a LOT
better with the nqlease stuff back in 1993 but i don't know if that's in
every nfs implementation even today.
Post by Timo Sirainen
3) If file size decreases, assume expunged messages -> re-read entire
mbox file.
This is reasonable if you have UIDs in the file (which of course is the
case today) since that allows you to resynchronize nicely. In fact,
that's excactly how mix resynchronization works.
i note that there are no X-UID headers in MH. how much would it help
uw-imap's MH performance/correctness if these headers were added by inc(1)
and other file-writing functions in MH/NMH?
Post by Timo Sirainen
It should work pretty well as long as fcntl locking is used, because it
reliably also clears NFS caches (hoping of course that nfs.lockd itself
doesn't break).
That's a big hope; and in my experience a futile one. I used to be able
to tell when SUN broke my test for NFS (and thus not use fcntl locks)
when I would get reports of cluster-wide hangs on Solaris boxes.
these days almost nobody still accesses the system mailbox by NFS, nor
access user mailboxes on NFS from more than one client at the same time.
so, dovecot's assumptions are pretty reasonable. compile-time options
in uw-imapd that changed its assumptions in this way would be popular.
Mark Crispin
2010-05-04 19:03:19 UTC
Permalink
Post by Paul Vixie
nfs file attribute changes are always at least three seconds out of date
as witnessed between a process running on a client and process running on
the server and possibly much longer between two processes running on two
clients, because of the way the caching/pipelining works.
The problem is even worse than the delay. Updates over NFS occur out of
order; or at least once upon a time they did. This would result in the
data and the inode being completely inconsistent with each other. I
forget what it was that would provoke NFS into this behavior, but I run up
against it all the time.

Once this state occurred for that file, it seemed that nothing short of
swamping the buffer cache would clear it. Not even the normal NFS
open/close/stat trick was good enough.

It was quite a shock for me, coming from an environment in which even
network filesystems were guaranteed to maintain full synchronization.
Post by Paul Vixie
i note that there are no X-UID headers in MH. how much would it help
uw-imap's MH performance/correctness if these headers were added by inc(1)
and other file-writing functions in MH/NMH?
It would help correctness, as it would remedy the problem caused by
mh compact. UIDs can't be renumbered, but that's what compact does to the
file numbers. UW IMAP uses the file numbers as non-persistent UIDs, which
unlike persistent UIDs are useless for synchronization.

Unfortunately, it would greatly hurt performance. It would require that
all the files be read at open time in order to get the UIDs. A
synchronization step would also do the same thing.

A better implementation would use an index file that maps between a UID
and a device/inode number of the file. To open and synchronize, you
stat() all the files and then correlate that with the index to build a
map. This would also identify newly-added and expunged messages.

All this requires is an atomic snapshot of the directory. But, as the
more honest maildir developers will tell you, that's the rub; there's a
timing race that can occur with file renames while you are reading a
directory...

In my opinion, it's better to use data formats designed for the task at
hand, rather than ever-escalating steps to get legacy formats to do what
they were never designed nor intended to do.

Remember, when I first designed IMAP, a big criticism was that it was
"impossible" for more than one agent to consume a mailbox at the same
time, yet this funny IMAP protocol was claiming to offer that service.
Post by Paul Vixie
these days almost nobody still accesses the system mailbox by NFS, nor
access user mailboxes on NFS from more than one client at the same time.
So the world today has finally accepted my advice from 20+ years ago.
I'm surprised, though; since "do everything via NFS" was the SUN corporate
religion (maybe Oracle has disestablished it). I recognized the absurdity
of layering a NAS (IMAP) on top of a NAS (NFS) early on, but SUN (and IBM)
insisted for a long time that the right way to do IMAP was to have a
cluster of IMAP servers consuming an NFS server.

Maybe in another 20+ years people will accept my advice on how to do IMAP
clients. The designers of webmails already follow those principles.
Post by Paul Vixie
so, dovecot's assumptions are pretty reasonable. compile-time options
in uw-imapd that changed its assumptions in this way would be popular.
UW IMAP is a dead project. If I ever do anything like that, it would be
in Panda IMAP.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Timo Sirainen
2010-05-04 19:41:35 UTC
Permalink
Post by Mark Crispin
A better implementation would use an index file that maps between a UID
and a device/inode number of the file. To open and synchronize, you
stat() all the files and then correlate that with the index to build a
map. This would also identify newly-added and expunged messages.
It wouldn't be enough to identify message with device/inode, because
inodes get reused. So if message A is expunged and a new message B is
saved (both externally to IMAP server's knowledge), IMAP server might
now think that A still exists, except now it has B's contents.
Paul Vixie
2010-05-05 09:56:30 UTC
Permalink
Date: Tue, 4 May 2010 12:03:19 -0700 (PDT)
...
A better implementation would use an index file that maps between a UID
and a device/inode number of the file. To open and synchronize, you
stat() all the files and then correlate that with the index to build a
map. This would also identify newly-added and expunged messages.
i'll see if i can break stiction on a real API and real indexing for NMH.
...
In my opinion, it's better to use data formats designed for the task at
hand, rather than ever-escalating steps to get legacy formats to do what
they were never designed nor intended to do.
of course. but my primary mail interface is emacs mh-e and i'm not going
to abandon it, nor the many filters and cronjobs and tools i've based on MH,
just to support my secondary need to open attachment-containing messages in
an IMAP client. i fully understand that i do not represent a growing segment
of the mail market. as before, i'm thankful that uw-imap supports MH at all.
Post by Paul Vixie
these days almost nobody still accesses the system mailbox by NFS, nor
access user mailboxes on NFS from more than one client at the same time.
So the world today has finally accepted my advice from 20+ years ago.
indirectly. NFS isn't a growing market segment. most new mailboxes are
IMAP-only and there are fewer and fewer accessors of /var/mail/$username (or
even UNIX systems containing such files) every year.
I'm surprised, though; since "do everything via NFS" was the SUN corporate
religion (maybe Oracle has disestablished it). I recognized the absurdity
of layering a NAS (IMAP) on top of a NAS (NFS) early on, but SUN (and IBM)
insisted for a long time that the right way to do IMAP was to have a
cluster of IMAP servers consumeing an NFS server.
whatever sold the most iron was the corporate mantra of that moment. the
people who say "just use Exchange" today are cut from that same cloth.
Post by Paul Vixie
so, dovecot's assumptions are pretty reasonable. compile-time options
in uw-imapd that changed its assumptions in this way would be popular.
UW IMAP is a dead project. If I ever do anything like that, it would be
in Panda IMAP.
yes, that's a separate problem. i may try to add MH support to Dovecot so
that i won't have to maintain a fork of the uw-imap abandonware nor use a
non-open codebase (Panda). in the shorter term i need to consider whether
to add indexing and a real API to NMH so that any of this becomes possible.
Mark Crispin
2010-05-05 18:44:26 UTC
Permalink
Post by Paul Vixie
of course. but my primary mail interface is emacs mh-e and i'm not going
to abandon it, nor the many filters and cronjobs and tools i've based on MH,
just to support my secondary need to open attachment-containing messages in
an IMAP client. i fully understand that i do not represent a growing segment
of the mail market. as before, i'm thankful that uw-imap supports MH at all.
Have you thought about making mh be an IMAP client? You might need to
have some sort of proxy daemon to keep state, since IIRC mh is actually
a set of programs invoked from the shell. I don't know if your other
tools use the mh programs, or if they separately know about the mh layout.

With mh-e, you ought to be able to make it babble IMAP protocol and keep
your interface.

Doing that would make it possible to move your mail to any IMAP provider;
which you may not want to do but at least you have the option.
Post by Paul Vixie
whatever sold the most iron was the corporate mantra of that moment. the
people who say "just use Exchange" today are cut from that same cloth.
Yeah. There's a lot of that! ;)
Post by Paul Vixie
i may try to add MH support to Dovecot so
that i won't have to maintain a fork of the uw-imap abandonware nor use a
non-open codebase (Panda).
Another alternative is to join the re-alpine project on sourceforge. UW
IMAP is part of re-alpine, so technically there already is an open
codebase fork. I don't think that the re-alpine people have done much, if
anything, in the IMAP part; so they would welcome your contributions.

I haven't decided about opening Panda IMAP. The issue is, as it always
has been, funding to support any work other than my own personal use.
There's only one task remaining that I personally need in Panda IMAP;
anything else that I do is at someone else's request.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Andrew Laurence
2010-05-05 20:42:36 UTC
Permalink
Post by Mark Crispin
Post by Paul Vixie
of course. but my primary mail interface is emacs mh-e and i'm not going
to abandon it, nor the many filters and cronjobs and tools i've based on MH,
just to support my secondary need to open attachment-containing messages in
an IMAP client. i fully understand that i do not represent a growing segment
of the mail market. as before, i'm thankful that uw-imap supports MH at all.
Have you thought about making mh be an IMAP client? You might need to
have some sort of proxy daemon to keep state, since IIRC mh is actually
a set of programs invoked from the shell. I don't know if your other
tools use the mh programs, or if they separately know about the mh layout.
Some years ago, the mh maintainer asked me about making mh be an IMAP client. I think he noodled on it a bit, but I don't think anything came of it. I'll ask him.
--
Andrew Laurence
***@uci.edu
Paul Vixie
2010-05-05 20:50:16 UTC
Permalink
Date: Wed, 5 May 2010 11:44:26 -0700 (PDT)
Have you thought about making mh be an IMAP client? You might need to
have some sort of proxy daemon to keep state, since IIRC mh is actually a
set of programs invoked from the shell. I don't know if your other tools
use the mh programs, or if they separately know about the mh layout.
i have indeed considered teaching the mh interface to use imap as a mail
store rather than using the file system. if i add an API layer inside MH
then i'll certainly be thinking along those lines. damnably, and as you
say, there'd have to be a lot of state to preserve the insanity of "message
numbers". luckily all of my tools including mh-e just use the MH command
set, nothing makes any assumptions about the file system layout.
With mh-e, you ought to be able to make it babble IMAP protocol and keep
your interface.
well, sure, but mh-e is only a small part of my mail UI. and besides which
there are better emacs modules for speaking IMAP, if i just wanted that.
i may try to add MH support to Dovecot so that i won't have to maintain
a fork of the uw-imap abandonware nor use a non-open codebase (Panda).
Another alternative is to join the re-alpine project on sourceforge. UW
IMAP is part of re-alpine, so technically there already is an open
codebase fork. I don't think that the re-alpine people have done much, if
anything, in the IMAP part; so they would welcome your contributions.
i had no idea. thanks.
I haven't decided about opening Panda IMAP. The issue is, as it always
has been, funding to support any work other than my own personal use.
There's only one task remaining that I personally need in Panda IMAP;
anything else that I do is at someone else's request.
i regret that i am not part of an empire who can afford to hire you just to
work on open source software. brian reid of DEC WRL deserves huge thanks
for hiring me and then letting me work on BIND after UCB abandoned it. we
need more empires in which people like yourself can hide while making stuff.
Yiorgos Adamopoulos
2010-05-05 21:23:14 UTC
Permalink
Post by Paul Vixie
i regret that i am not part of an empire who can afford to hire you just to
work on open source software.  brian reid of DEC WRL deserves huge thanks
for hiring me and then letting me work on BIND after UCB abandoned it.  we
need more empires in which people like yourself can hide while making stuff.
That is why people like you and me can donate to Mark (I did last
week) in order for him to continue working on stuff that makes our
systems tick. Enough people can form a "Panda-IMAP empire" :)
--
http://gr.linkedin.com/in/yiorgos
Mark Crispin
2010-05-05 21:58:26 UTC
Permalink
Post by Yiorgos Adamopoulos
That is why people like you and me can donate to Mark (I did last
week) in order for him to continue working on stuff that makes our
systems tick. Enough people can form a "Panda-IMAP empire" :)
And thank you! The donations do help keep Panda IMAP alive, and keeps me
still on this mailing list; I would have dropped out a while ago
otherwise. And I am, slowly, chugging away at some feature additions
(most notably fast mix delivery).

Sadly, though, it's a long way from forming an empire... ;)

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Paul Vixie
2010-05-05 22:48:48 UTC
Permalink
Date: Thu, 6 May 2010 00:23:14 +0300
That is why people like you and me can donate to Mark (I did last week)
in order for him to continue working on stuff that makes our systems
tick. Enough people can form a "Panda-IMAP empire" :)
it would take a lot of us before it became a compelling amount of money.
meanwhile i'd be using non-opensource software in my infrastructure (which
i just won't do.)
Mark Crispin
2010-05-05 21:49:56 UTC
Permalink
Post by Paul Vixie
damnably, and as you
say, there'd have to be a lot of state to preserve the insanity of "message
numbers".
IMAP is a stateful protocol. There is nothing insane about message
numbers in a stateful message access protocol; this is the entire
mechanism upon which state revolves.

If you don't want state, then hack HTTP to export messages the way that it
already exports HTML documents.
Post by Paul Vixie
i regret that i am not part of an empire who can afford to hire you just to
work on open source software. brian reid of DEC WRL deserves huge thanks
for hiring me and then letting me work on BIND after UCB abandoned it. we
need more empires in which people like yourself can hide while making stuff.
Sadly, such beneficent empires are few and far between these days.

I am being paid today to work on cool email stuff, but it's not open
source.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Paul Vixie
2010-05-05 22:56:51 UTC
Permalink
Date: Wed, 5 May 2010 14:49:56 -0700 (PDT)
damnably, and as you say, there'd have to be a lot of state to preserve
the insanity of "message numbers".
IMAP is a stateful protocol. There is nothing insane about message
numbers in a stateful message access protocol; this is the entire
mechanism upon which state revolves.
i must have spoken improperly. if i say "scan" and learn thereby about
messages 1,3,5,20 and then one month later after every computer has been
power cycled four times i say "show 5" i want the same message. this is
insane but i want it anyway. imap's statefulness isn't nearly persistent
enough for me. a "show" command that used MH as a mail store would have
to have the same behaviour, but it can be local to the MH host rather
than relying on IMAP extensions. (so, some other MH-over-IMAP client
could see different message numbers for the same underlying messages.)
Mark Crispin
2010-05-06 00:43:49 UTC
Permalink
Post by Paul Vixie
i must have spoken improperly. if i say "scan" and learn thereby about
messages 1,3,5,20 and then one month later after every computer has been
power cycled four times i say "show 5" i want the same message.
Oh. In that case, what you want are UIDs.
Post by Paul Vixie
imap's statefulness isn't nearly persistent
enough for me.
Actually, it would be if the mh code could implement UIDs correctly. The
problem is that the the mh code uses the filename numbers as the UID; but
then has to account for the mh compact command, which renumbers all the
files.

If you never use the compact command, then IMAP UIDs would be just what
you need. Otherwise, you need to have some other means to tie a permanent
UID to a particular message, while preserving the IMAP requirement of
being strictly ascending in the mailbox.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Loading...