Kurzgeschichten - Lieder - RPG | über mich | Links - Gedichte - krude Ideen |
Statement:
> This is a short summary of the Discussion of the Pros and Cons of Queues on the GDF.
> I do this in chronological order and by pro and Contra and with a set of proposed solutions in between. You can read most of the mails at gdf-queue-disscussion.txt
Following I give you my conclusion of which points really matter again:
Main solution: Having the Queue only for large files or different Queues for small and large files (the first short, the second long).
Example: 0-3MB, 3-15MB, 15+MB (upper-value non-inclusive)
Pro: Fairness: Battering of the sharer with requests doesn't help. everybody has to wait as the same time.
Pro: Who requests first, downloads first.
Pro: Ensurance: You get the file when you wait long enough.
Pro: Without Queues it is hard to get files at all, especially large ones.
Contra: Waiting time: You always have to wait for your files (User Experience).
Contra: Those who can't stay online long enough will never get their files.
Contra: Those who have to pay per hour will leave if the waiting gets too long.
Contra: For smaller files it works quite well without Queues and mostly doesn't with them (my experience).
Contra: A Queue creates traffic, thus reduces supply.
Pro: Higher UpTime: People Stay online for longer periods, because they know they will get their file eventually => More Sources.
Pro: Short Term Remanence: A shutdown or crash with reboot doesn't kill a dl for good.
Pro: User Opinion: Queued looks better than busy.
Contra: Waiting time for each download isn't a good User Experience.
Contra: It helps the servent, which uses it, as long as it is the only one, which uses it. As soon as everyone uses it, the advantage fades. So it forces its integration by all, once it is implemented by one.
Contra: Entry barrier for new programmers.
Now you have 3 Pros, 4 Cons.
Having the Queue only for large files or different Queues for small and large files (the first short, the second long) cancels out the User Experience and wait Contra, because you always have to wait for big files and a bit more doesn't really matter.
Then you have the opposite: 5 Pros, 2 Cons.
This sounds like the best Idea I read in the whole threads.
If you manage to make it an easy to implement system (and no one fears new competitors) Queuing doesn't sound so bad.
And don't forget to manage the problem with always dying downloads first. The best network is nothing, when the downloads aren't reliable (and put me back into the Queue after 200kB of 1,3MB and after I waited in the Queue for flat 30min aka 25c).
-----
Summary:
This is a short summary of the Discussion of the Pros and Cons of Queues on the GDF.
I do this in chronological order and by pro and Contra and with a set of proposed solutions in between. .
Personal statement: I am contra to Queues, because I have a pay per hour line - 50c/h. I can't wait long while my prog does nothing. I get far more Up, than downloads finished, like it is now, if this increases even more (and gets more expensive) I'm going to become frustrated. It is about 20 uploads against 3 to 5 downloads at the moment for each time I run it. I could live with Queues for large files.
I made some comments in there, as I copied the arguments in. They begin with ">>>>>" and end in "-Draketo <<<<<".
Please excuse the sometimes not perfectly orderly structure of the chronological argumentations. We often had arguments repeated.
Besides: I can't believe, that i really read all those messages during the last few weeks, now that I put them all in one txt-doc... humans produce a hell of much text, when discussing.
----------
Pro and Contra:
Pro:
> Fairness: Battering of the sharer with requests doesn't help. everybody has to wait as the same time.
> Ensurance: You get the file when you wait long enough.
> Higher UpTime: People Stay online for longer periods, because they know they will get their file eventually => More Sources.
> Short Term Remanence: A shutdown or crash with reboot doesn't kill a dl for good.
> Who requests first, downloads first.
> User Opinion: Queued looks better than busy.
> Without Queues it is hard to get files at all, especially large ones.
> Passive Queues hold up even when the uploader is firewalled.
Contra:
> Waiting time: You have to wait for your files.
> Those who can't stay online long enough will never get their files.
> Battering for the Queue instead of the dl => No real change.
> It helps the servent, which uses it, as long as it is the only one, which uses it. As soon as everyone uses it, the advantage fades. So it forces its integration by all, once it is implemented by one. => More coding work for all programmers => High Barrier for new programmers.
> Entry barrier for new programmers.
> Doesn't increase the supply.
> When I launch GTKG with a ~1250 slot queue, it takes a good 20 minutes for the queue to purge itself and allow slot #1251 to become slot #1. During that time, GTKG allows no uploads to proceed.
> Queues can't be long enough: Evidence: From our beta test network of 158 nodes, there are currently 245 active downloads (receiving data or queued) and 5,384 waiting downloads (not active transferring or queued).
> A Queue creates traffic like a very slow download, thus reduces supply.
----------
Proposed Solutions:
> PARQ (read the Specs in the files section).
> What we need to do is make files easier to obtain. This can be done
with a combination of the following:
- Prefer uploading a complete file to one connection instead of
uploading several pieces to several connections. Shareaza has a
rotating slot scheme that gives each connection a subset of data
before disconnecting it. This is flawed - it doesn't encourage the
shortest time transfer of complete files.
- Prefer uploads to faster connections over slow ones. Some servents
have small,medium,and large file queues. This is an interesting idea,
but it would probably be better to have queues based on connection
speed. The principle is that reserving slots for very fast
connections will reduce the time required to propagate a complete
file (assuming the other end is sharing)
- Partial file sharing, with immediate sharing of in progress
downloads. Upload servers insert the IP and port of downloaders in
progress into a separate PFS mesh. This is identical to the scheme
used by BitTorrent - downloaders immediately share verifiable chunks
as they are transferred, and the upload server propagates their
information in a mesh.
- Full HTTP pipelining on the download side. BearShare currently
supports full pipelining on the upload side, and there remains the
work of implementing it on the download side. If all servents fully
pipeline requests on keepalive connections, there would be a 5% to
10% improvement in transfer times (sometimes more if the host is near
its send bandwidth limit).
- Forced sharing of the last N downloads. After a servent hashes a
new download, it can keep the file handle open and return query hits
for the file, and share the file. This prevents the user from
deleting the file or moving it out of a shared directory. These files
would be shared regardless of the user setting for sharing. There
would be a limited number of slots, and a (small) bandwidth limit on
these forced shares. This directly attacks the freeloading problem,
and creates many new upload slots on the network. Forced shares would
be canceled when the application exits.
> A Queue for large files, no queue for smaller ones or different Queues for different File sizes.
> >>>>> Just for User Experience: Make always one slot without Queue, so I have the chance to get some files quickly. For example if someone just wants to extract a few samples of a certain artists too see if a CD is worth purchasing. Preferably a slot for small files only. -Draketo <<<<<
> >>>>> Some solutions I see while reading everything again *sigh*: Do different queues and have the queues for rare files tell the servent not to ask every 60s, but every 10 min.
----------
-----
-----
> It helps the servent, which uses it, as long as it is the only one, which uses it. As soon as everyone uses it, the advantage fades.
-----
> Complex to implement.
-----
> Remanence: the queue must survive a servent shutdown, either on the
client side or on the server side, provided the servent comes back
"quickly".
1) Why "must [the queue] survive a servent shutdown" of more than 5 minutes?
> Remanence allows the client or server to become unavailable for a short period of time (max. 5 minutes) and yet not perturb the queue. This property is valuable for modem users (this includes DSL users), which can experience a connection drop, and servent developers who can have their servent crash in the middle of an experiment.
> That is a nice feature of this scheme, one that is worthy of further exploration.
However, I again state that this should probably for <5mins drop, long enough to reboot and continue.
> Servents do crash. On occasion due to bugs, or they are shutdown by their
user. Sometimes, even the X server crashes, and that takes down the servent
as well.
2) Where did this requirement come from?
3) Why is it really desirable? From an end user standpoint?
> You get the guarantee that if you were queued on a server, you can get back
your entry some day. If you found a rare file, then this is a precious
feature.
---
Abuse:
However, you lose the "cost proportional to consumption", preventing massive abuse. How can you prevent a client from taking 1000s and 1000s of queue slots now that they wouldn't pay any such price? If you don't how is this any better?
---
Long Queues don't help at all:
> Here was the analogy I used to play Devil's advocate for queues:
Picture a street, Main Street, with 5 clubs on it.
There are several people who want to get into these clubs, but they can't satisfy the number of requests to enter the club. Hence, there are some people inside, and several people waiting near the front door. Some of those waiting people are well-behaved, and return every minute or so to see if someone left. Others are not so well behaved. They try every 10 seconds, or so. Not so often that I have them arrested for harrasment (blocked), but just on the other side of 'hammering' behavior. So, for example, there are 10 "waiters". The good ones re-check every minute, so in 10 minutes time, any given "Good Waiter" will check 10 times. The "Evil Waiter" will check every 10 seconds, so in 10 minutes time he will have made 60 requests. Since there is little or no correlation between these requests, and the availability of a free space (someone left the club) it is essentially a random game, first person to make the request once a condition is met wins. Since 'Evil Waiter" made 6 times as many requests, we would expect him to be 6 times as likely to get the next available opening.
Obviously, this is not the behavior we want to encourage.
So, introduce Queueing. Now, right outside the Front door, we add a vestibule . We require that anyone in the vestibule be well-behaved, be a VIP card holder (they must support queuing) and wait their turn to go through the front door.
This is great right now, since all Bear's are VIPs (support queuing) you will always be able to slip into the vestibule if there is an open space in the vestibule . As more clients get VIP cards (support queuing) the situation changes fundamentally. Now, all clients inside the vestibule are well-behaved.
But: What about right outside the front door to THAT vestibule? Who gets the first open queue/vestibule slot when one becomes available? Well, the first person who reqests it, right?
So really, all we have done with Queuing (when more clients support it) is move the congestion (and unfair behavior) from the front door of the club, to the front door of the vestibule . The overall situation is unchanged:
Before, p = person:
[Club] p, p, p, p, [Front Door]pppppppppppp
Now, p = person:
[Club] p, p, p, p, [Vestibule] p, p, p, p, [Front Door]pppppppppppp
---
:Each client knows (for itself) whether it still wants the file from that source,
:why not let them make the decision themselves? Then, only the 250 who still want
:a file from me would contact me, maybe get queued (again), etc... However, so
:would the "new" 250-500 people who want that file from me, and I am not wasting
:my time trying to send it to people who don't want it anymore.
You're forgetting about the dynamic IP problem. Perhaps the IP of the server
changed, and they don't know where it is now. Whereas the IP of the client
did not change yet, so the server can contact back.
Active queueing is FIFO only during the lifetime of your connection. My IP
address changes every 24 hours, so if I don't get a chance to get an upload
slot in 24 hours, then I loose my slot even though I waited for that time.
-----
In fact, the whole concept of remote queuing is flawed and we should
get rid of it. The problem is that if only one vendor supports it, it
gives upload slots to that vendor only, causing disparate sharing.
Remote queuing only pushes the "busy" problem to the end of the
queue, rather than at the last upload slot. It is merely a placebo
that makes users feel comfortable that they are at least waiting in
line.
eMule is a perfect example of how miserable queuing fails. In fact,
GTK-Gnutella cribs a page from eMule's less than stellar playbook by
advocating VERY LARGE queue sizes (5000? madness?).
-----
> We must eliminate PARQ, or specifically, the part where it connects
back to you, IMMEDIATELY!
There is currently a very big problem with uploads, in that servers
are getting overwhelmed with outgoing bandwidth limits trying to
respond to so many incoming requests (even if it is just to Busy
them).
You MUST AVOID making this problem worse by having the upload side
periodically contact the downloader (wasting his bandwidth and yours)
unsolicited. The "passive" queuing with periodic re-contacting is
absolutely a BAD IDEA.
> I don't see it that way.
PARQ will only attempt to recontact two times, and, if ignored, will
stop bugging the host for that queued entry. The message is SO SMALL
that it is far less work than rejecting an incoming Gnutella connection.
So please, give me a good reason. Just because you disagree does not
mean you're right.
> When I launch GTKG with a ~1250 slot queue, it takes a good 20 minutes for the queue to purge itself and allow slot #1251 to become slot #1. During that time, GTKG allows no uploads to proceed.
-----
> How long you can stay online is a big factor for these large files
with queues. The more limited the sources the more value I see for
queues to exist. Before queues I would never attempt to download a
single sourced very large 1GB+ size file. And the large 400MB+ files
with multi sources have a very high rate of success with queues. Many
queues are quick for these files as so many clients lose connectivity
(or are swarming) and I'm quite happy to be next in line to start my
download.
-----
> Also, remember this: Since very few queue-compatible vendors exist,
and any non-queue capable host who requests a file from a queue-
capable one will get a BUSY.
Result: Once a single downloader is occupying a queue slot, and if
there are enough file requests to always keep at least a single
downloader in queue, then ONLY those few (2) vendors which support
queueing will EVER be able to successfully download. No other vendors
will be able to transfer until all queue-enabled vendors are
satisfied. This is a very important observation to understand the
real reason everything seems better now. No Morpheus client will Ever
be able to get a file from a queue-enabled source until all
BearShare/Shareaza client requests have been satisfied (made active).
If there is even a single queues BearShare/Shareaza requestor, all
other clients are put on hold until this VIP client's request has
been satisfied.
So: Right now, you see a huge improvement. However, as more and more
vendors support queuing, you will see the same type of compitition
for queue slots as there used to be for download slots.
> Right. It is true that the fact that you do not support PARQ gives
a small advantage to servents that support it. But it's not my fault.
> This is the unfortunate situation created by Active Queueing, yes.
I used this very same argument against Active Queueing when it was
published, but was not heard.
But PARQ is really different, in that it allows passive queueing.
Even if you don't understand queuing on the client side, you'll get
queued nonetheless, and in a fair manner. Provided you abide by
HTTP rules, process the Retry-After header correctly, and don't change
your IP address. This is all what is required!
-----
> :Remote queuing only pushes the "busy" problem to the end of the
:queue, rather than at the last upload slot. It is merely a placebo
:that makes users feel comfortable that they are at least waiting in
:line.
> No, it is a fair queueing system. It avoid having people hammering in
order to get the chance to grab an upload slot that was just freed.
Quite the contrary, if you don't abide by the rules edicted by the
server, via the Retry-After header, you'll be punished.
> :So really, all we have done with Queuing (when more clients support
:it) is move the congestion (and unfair behavior) from the front door
:of the club, to the front door of the vestibule. The overall
:situation is unchanged:
> It cannot be proven that this reduces busy signals. It only pushes
the problem to the end of the queue, instead of the end of the active
upload slots (the "vestibule problem" according to Dave).
> This is why PARQ defines arbitrarily large queues. In effect, it makes
the Vestibule so big that in practice, it never fills up.
> I've heard bigger loads, but this one is pretty large. You are
kidding of course, right?
-----
> :Now, there are other factors to consider, such as user oppinion
:(Queued 11/12) looks better than 'Busy', even if there was a chance
:for you to slip in to the Busy slot as soon as it opened, faster than
:waiting for 11 transfers to complete.
Well, the only advantage here is that, hopefully, if you're not a
brain dead servent, you'll set up a larger retry delay for a queue entry.
So the requester will not hammer you once every minute but probably
once every 10 minutes. You've divided the bandwidth necessary to reply
to those requests by an order of magnitude.
-----
> Hi, I'd like to contribute my opinion on PARQ. I think it does something to improve downloads. You're right, the current (active) queuing system only pushes the problem (lack of slots) to the end of the queue. a passive queue would indeed reduce busy signals and reduce connection overhead. For maintaining an active queue slot an active connection is required and you have to re-ask every 90 seconds or whatever. With PARQ the servant can tell you to ask again in several hours if the queue is very long. In my eyes this reduces network traffic a lot.
> Also note that just because you're passively queued does not mean you don't
have to periodically refresh your entry. If you stop doing that within the
defined lifetime of the entry, the server sends you a QUEUE message (if you
advertised PARQ when requesting). Failure to respond means your entry is
flagged as "dead" and will be reclaimed in a FIFO order when the queue is full
and we need some room.
> Also without queues and with the current system it is more or less coincidence whether you get a queue/upload slot or not. With PARQ you will be rewarded for waiting longer. But for this to work a client which implements PARQ should also rotate upload slots, so that you have a chance to reach the top of the queue. The current queuing system is not suitable for large files. If you have many large files, your upload slots will fill up and the user who has had enough luck to be the first to ask for a slot can use his slot until he has finished downloading. However for large files and most connections this will take several hours.
Also the idea of having a queue that survives a disconnect is a good idea. 24h hour - disconnects are very common. And for downloading large files, you have to wait long, that's a fact. If you spend this time asking every 60 seconds for an upload / queue slot or if you spend the time waiting in a PARQ queue asking every 60 minutes for a slot update doesn't matter.
-----
> Of course PARQ isn't some magic stuff that doubles upload capacity of the network, but it adds more fairness for uploads.
> Exactly my point... What, exactly, is this "fairness"? Can you give me a good mechanism for measuring/quantifying it? If not, this point isn't really valid, as it results in a "user-placebo" which, admitedly, may be better than nothing.
However, be very careful when making claims like "Queues lead to fairness" and the like... Without being able to quantify this you will have a hard time demonstrating anything more than placebo effect.
> >>>>> Comment while parsing: Users stay online longer, so placebo might help. -Draketo <<<<<
-----
> A problem is, that you might have to wait really long for small files. A solution for that problem is having two queues. One for large and one for small files.
> See, this just illustrates my point that queues don't really "SOLVE" any problem! =)
Its all mental! =)
> I think queues help to manage downloading of large files. A year or so ago, when there where no queues I used gnutella to download small mp3 files. The queue-less system worked those days, because mp3s didn't take that long to transfer so upload slots where freed quite often. Now that I download large files, the situation is different. If an upload slot is taken by a large-file upload it will take really long until the slot is freed again. I think we need queues to ensure that you can download large files. And I think it is better to make it in an organized manner rather than leaving it up to coincidence whether you can download or not.
> Lets examine something here... DisneyLand, say, since it was mentioned in-office on Friday as an example of "why queuing works"...
Tons of (long) queues there.
It is my belief that they work *only* because a ***Very High Penalty*** can be imposed for 'misbehavior' (leaving the line results in an hour or more of lost time)
It is really a benefit/punishment system, right?
IF: You stand quietly in line for 2 hours, and don't disturb others or make a scene, i will let you ride a ride.
IF: You wander out of line, looking for a shorter line or differant ride, then I can (and will) remove you from the line you were residing in. Therefore, the *maximum resource consumption PER PERSON* (number of line spots being occupied) can be easily controlled.
These are not the case here. I can easily take up 19 queue spots in 19 queues, without any penalty to me. I can grab a 20th spot with still no penalty. With PARQ, i could (conceivably) take up 1000 slots... Where is the benefit here? Where does the advantage come in? What, exactly, is being solved/accomplished by this queue, besides arbitrarily adding to the expected file completion time?
Noone has yet given a good, quantifiable answer to this question; just stuff like "It seems more fair", or "Queues make downloads better" or other such propaganda-ish statements, without anything to back it up.
> The advantage is that if you have waited long enough you can start riding all the time. You might have to wait longer at first because there are many people in many queues, but after some time you can start the first ride and after you have finished that ride you can go on to the next ride almost immediately. Since having a computer waiting for something is by far not as annyoing as waiting in queue in Disneyland, I don't see a problem with that.
> Hmm, I wasn't pro PARQ much, but this example now convinces me that
PARQ is good. Have you seen the FastPass they use, well that's a
form of passive queue that you don't have to wait in line, but your
line place is being kept. At the end it allows everyone to wait the
same amount of time to get in.
> >>>>> Parser: The disadvantage is: When you can't stay online 24/7 you might never even get a download started. -Draketo <<<<<
> I don't understand what you mean.
If you grab a slot, you have to maintain it by re-requesting within the
specified lifetime. Or you run a chance to loose it.
If you reserve 1000 slots in 1000 queues and don't refresh them, they'll die
when they expire and some room is needed in the queue. So you will have
accomplished nothing, and penalized noone because of that. You will have
just lost bandwidth requesting them in the first place.
The queue is large, but if you don't use your slots, they'll be reclaimed.
If you use them however, then what's the problem? To request 1000 slots, you
will need to allocate the RAM necessary, and make the periodic requests.
There is a limit to the number of requests a servent can make per second,
due to bandwidth constraints.
-----
> Again, the fundamental questions are these; and we cannot progress until they are answered:
(1) WHAT, EXACTLY, are you trying to accomplish with a queuing system?
> The goals are stated in the PARQ specs.
But generally speaking, we want to avoid hosts hammering back every 60
seconds or so to get a slot, and we don't want to give a slot by chance
to someone just requesting now, whereas someone has been patiently requesting
the file for days, but just happens to not be requesting right now.
This is the fundamental goal of any queueing system. PARQ has others, but
the above one is present, and is called "Fairness".
(2) HOW, EXACTLY, is the queuing systems today (active/PARQ/whatever) accomplishing that?
> As far as the main goal stated above is concerned, both Active Queuing
and PARQ satisfy that goal.
Of course, PARQ has other goals, in particular "Legacy" and "Remanence".
(3) WHAT, EXACTLY, is the benefit of queuing system?
> Benefits for the server: less hammering, since retry time can be controlled.
Benefits for the client: once in line, one know that there is no interest
to hammer and retrying at the times specified by the server will allow
correct scheduling of the request when the time comes.
-----
> 1) Do you want a system where the wait time for files is expected to be equally horrible and lengthy, or one where half your files finish rapidly and the other half take awhile?
> I would prefer the later, but my suspicion is that that's not a likely
scenery in a big network with enough users and efficient sources
exchange. Look at the ed2k example. I see the probable scenery more
like:
1) Wait for files for lengthy times or
2) Get 99+% of the time busy responses and get frustrated and
frustrated an frustrated.
> 1(ff)) I like having a decent chance for some files to finish rapidly. I don't like knowing ahead of time that all (or most) files have a known multi-hour wait time. DisneyLand approach has no appeal, to me. Ontario Place (where some rides had LOOOONG lines and others had none) was MUCH more appealing to me (at least, back when I went there) than one where all rides have long lines. Then I can skip the LOOOOnger ones and go to the rapid, fast pace ones... Just a thought...
> I think that queues simply get the chaos out of the equation. Once you
have queues long enough to accommodate all (or most) requests, you can
tweak them to optimize what you want: be it having queues for
different file sizes, or making queues not fifo (for example
prioritizing less demanded files (likely to be rare), clients who have
downloaded less from you, or whatever).
Without queues or with too short ones, basically you have a situation
where not you nor your clients have control over the situation, and
what's worst: abusers (hammering clients) do really have an advantage.
This last point is enough for me to prefer queues.
2) Lets consider from average user, not even we the developers... Is a system where all (or most) files have much more lengthy wait times (of hours+), rather than having a decent chance for half files to finish FAST and other half to finish Slooooowly really that desireable??? Seriously consider this long and hard, before simply assuming it to be the case... =)
At least for (2), I PERSONALLY don't think it is more desireable. You all are, of course, welcome to disagree with me. =)
Would anyone disagree that file transfer times (For not MASSIVELY rare files) assume a more or less bell curve distrobution, exclusive of things like size, etc...
In that, some people (now, or prior, without queues) would get them fast, and others would get them slowly.
Your (pl.) arguments earlier suggest that we are "squishing" this bellcurve towards the average wait time point.
This may not be as good an idea as it seems. The possibility of being on that "fast tail" of the curve is fairly appealing...
> [The Fairness Argument, you already read.]
---
> Once you have queues long enough to accommodate all (or most) requests
> You can forget about this pipe dream right now.
From our beta test network of 158 nodes, there are currently 245
active downloads (receiving data or queued) and 5,384 waiting
downloads (not active transferring or queued).
It is obvious that there will never be a sufficient number of queue
slots.
> That may be true, but if a client has say 5000 slots and it gives you a busy signal, you know that it is useless to retry this host in the next 24 hours or so. However I think 5000 slots will take quite some time to fill up.
> However, isn't that dependent on what files are you sharing? What's
the setup of your test? Everyone sharing the same, all requesting the
same? Or what?
In my gnutella servant I have never seen more than 70 clients queued.
So in my case I've never run out of queue slots.
> I don't understand. Is that network closed?
> No the network isn't closed, but we can only count leaves/Ultrapeers,
and get download statistics, from BearShare 4.3.0 servents.
-----
> Before there were any queues at all, I found it VERY difficult to
actually download files. The mechanism where whoever happened to
submit a download request at just the right moment was the winner
just didn't work very well. You had a chance of getting your
download right away, but you also had a chance of pounding that
host for days and still not getting the download. With queues,
if I'm willing to wait around long enough and the host is up long
enough, I know I will eventually get my download. Call it the
placebo effect if you want, but to me this is a much better
system.
While it is certainly possible to implement queues in such a way
that everyone has to wait for hours and hours before they can
download anything, it doesn't have to be done that way. The
client I use allows the user to define as many queues as he wants
and to set up rules for how transfer requests are assigned to
those queues. With a single queue it's easy for small file
requests to get stuck behind many large file requests and end up
waiting literally days to start the download. With the multiple
queue capability, I've defined one queue for very small files,
one for medium size files (which I've defined to be between 256kB
and 15MB), and one for large files. This has worked out very
well. There is almost never anyone in line for small files. The
medium size queue seldom has more than a dozen or so requests.
The large queue sometimes gets a bit long, but this doesn't
seem to be a huge problem as people downloading giant files are
conditioned to long waits.
You can certainly come up with theoretical arguments why queues
might not work, but, in practice, they seem to work fairly well.
> That is every bit as subjective as my (equally valid) comment that they do not work.
> Even the eDonkey queues with thousands of entries work reasonably
well, at least from the perspective of actually being able to
download files. (BTW, the only reason the eDonkey queues need to
be that long is due to the way the clients queue up requests.
Unlike the gnutella clients, the eDonkey clients queue up a
download request with EVERY client they know to have the desired
file. Even with this agressive queueing behavior and the alleged
million eDonkey users, I've never had an eDonkey queue with more
than 500 clients in line.)
> However, if we (Gnutella) cannot satisfy the rate of (content) demand now, how is indefinately queuing demand up going to help?
> Because, in reality, demand does NOT queue up indefinitely. Downloads are
completed by other clients. Users get tired of waiting and log
off. Whatever the reason, my observations are that a queue of sufficient,
non-infinite size can handle the demands on a single client.
> What's so great about not queuing? If you queue, you stabilize the
situation. You are establishing an order. If you don't queue, you are
allowing randomness to enter the game.
I have given my analysis of the situation. Accept or disprove, that
is up to you. As far as I am concerned (and my vendor-body agrees
with me, for the most part) is that queues are *only* a placebo.
So, to be clear, you just say that if nobody were implementing queues,
we would be *exactly* equal than with queues?
Please confirm that. That's what I understand from your phrase.
> Given that we all know queues work quite well in the physical
world and also have extensive application in the digital world, I
respectfully suggest that the burden of proof that queues do NOT
work should lie with those trying to argue the negative.
Hand-waving arguments and simplistic real world analogies should
not be taken as proof.
> I truly hope that you are not referring to my post. That was the result of observation and repeated hypothesis, disproval, rinse, repeat. It was also highly influenced by watching IRL people here on Lincoln Road (popular social spot) crowding in, on, and around night clubs and other social hotspots.
My post was very well grounded in reality, i believe. It was not a "hand-waving argument", certainly.
I have given my analysis of the situation. Accept or disprove, that is up to you. As far as I am concerned (and my vendor-body agrees with me, for the most part) is that queues are *only* a placebo.
Ideologies of "fairness" aside, they do not solve any technical problems on their own merit.
> I was referring to something quoted by Vinnie in one of his replies and, I
believe, attributed to you. I'm sorry, but I don't believe any analogy to
night clubs and social hotspots is appropriate to the topic at hand. They
are particularly inappropriate when you represent the queue by the
vestibule of a building. Vestibules are typically small and are
necessarily limited in size.
> Like so-called "Active Queuing"?
> >>>>> This does not apply to PARQ, as far as I understand it. -Draketo <<<<<
> Upload queues can be as large as is
needed.
> Ahh, I see you haven't bothered to FINISH reading the arguments against queuing, specifically the one why large, durable, so-called "FAIR" queue isn't going to work either.
> Your analogy certainly doesn't forbid the use of a large queue,
but it does frame the discussion with a prejudice toward queues that are
fairly small. A more appropriate analogy would be the post office or, as
you mention later on, the DMV, where there are limited opportunities for
service and an apparently unlimited opportunity to wait in line.
> The most attractive part of PARQ (or any queuing system ive seen to date) is resumption of transfer after small (<5 min) downtime.
> I agree here, that's a need for current queuing systems. Note that
ed2k has partly advanced in this aspect. I'm not sure if its queues
can survive to a IP change, but being UDP based they certainly resist
to connection drops.
> Currently, all the major players seem to
be supporting upload queues.
> Citing "All major players support queues" as evidence that "queuing is good, or even works correctly" is nonsense, as it misses the **OBVIOUS** independant correlation... (Follow my links in original post to BearShare labs if interested)
> And if you had taken the time to read my message with the goal of
understanding what I was trying to say rather than looking for phrases you
could quote out of context with the goal of making me look foolish, you
would have realized the sentence above about all the major players
supporting queues was part of a larger construct reporting that, despite
what you seem to be contending to the contrary, my queues have not filled
to (current) capacity even though all the major clients now support queueing.
> For me, is self-evident and intuitive that when demand is high, queues
are an improvement over no queues. For you, it seems not to be the
case, if I understand correctly. I've read the thread mentioned at
bearshare and I don't remember any hard proof there too, just the same
believing that large queues are a hassle.
How is that queues are great (read: necessary) in many domains (CPU
task scheduling, real life) to prevent undesired behavior (starvation,
indefinite preemption), and not for Gnutella? Do you think that these
misbehaviors are not undesired in a p2p network? It's a placebo to
prevent them? There *is* a difference between queues and no queues.
And that difference is certainly for me no worst than no queues. I
have not seen exposed any advantage on not having queues, except that
"waiting is bad, I prefer no wait", like removing queues would remove
somewhat the wait.
You want Gnutella to be the best p2p network and are saying that its
users are better *not knowing at all* if its uptime is going to help
them? That they must try indefinitely requesting to a client instead
of knowing that they may be near to start or far away from start? I
can't believe you consider that placebo. That's a real advantage.
For me, eliminating indeterminate order of upload is far from being a
placebo.
> I've even seen some Morpheus
clients in my upload queue recently. With almost all of the
gnutella world supporting upload queues, I haven't seen the size
of my queues exploding and the file transfers are happening in an
orderly manner. The real world evidence indicates that queues
work. Where's the proof that they don't?
> And here you are wrong again! Except, that not only are you wrong, you are **FABRICATING EVIDENCE**! You are stating OPINION as if it were FACT!.
> Where exactly am I wrong? Where exactly is there OPINION in what was
quoted above? I said I've seen Morpheus clients in my upload queue. This
is a fact. I have seen them. I said almost all the gnutella world is
supporting upload queues. BearShare, LimeWire, Shareaza, Gnucleus,
gtk-gnutella, and, apparently now Morpheus, support upload queues. I think
that's most of the gnutella world. (My apologies if I've left someone out.)
I said I haven't seen the size of my queues exploding. Again, this is an
observed fact on my system. I said file transfers are happening in an
orderly manner. Again, this is an observed fact on my system. Where am I
wrong in any of these statements?
You accused me of fabricating evidence. Please show me exactly where I did
this. If you can't, I expect an apology.
> WE have real data, hard and fast numbers that we are steadily collecting, not just "opinion" about the state of things. Vinnie gave you a teeny sample of that, with Downloads/Uplodas info. Queues aren't filling? Yeah right! Get real....
> If you set your queue sizes small enough, sure they'll fill up. From what
you've said here and your use of the vestibule to represent the queue in
your analogy, I can't help but think you're getting stuck on the idea that
queues have to be some relatively small size. They don't. They clearly
have to be bounded, and a rational discussion of reasonable, practical
bounds would probably be a good idea, but they do not have to be small.
> >>>>> He referred to more than 5,000 queue-entries filled. -Draketo <<<<<
> If you refer to the 15x network that Vinnie mentioned yesterday, I
questioned some things to really understand how queues were filling,
and that was not answered. I have not seen my queues filled, and
that's also fact and not fabrication.
> As to the question of whether I'd prefer long waits or a small
chance of getting some of my files right away (the 50/50 chance
suggested with the question is VERY optimistic in my experience),
I'll go with the long wait with a much higher assurance of
actually getting something. Just one user's opinion.
> There is a plethora of cases where queues do not really "work" as far as I
am concerned.
Do you consider the DMV to be "working" when it takes you 4 hours to
register your car? Four hours of wasted time, where you have to sit there
doing **NOTHING ELSE**??
Of course not! Its broken!
> I submit that there's nothing whatsoever wrong with the queue at the
DMV. The queue works just fine. You position yourself at the end of the
queue. You move in a more or less orderly manner toward the front of the
queue. You eventually make it to the top of the queue and get served. The
queue has done exactly what it's supposed to do.
What's broken at the DMV is that demand far exceeds supply. The queue
simply brings order to what would otherwise be chaos in that
situation. Would you prefer that it was a free-for-all and whoever managed
to push and shove their way to the counter was the one to get served?
> I should also add that this doesn't have to be an all-or-nothing
situation. I recently experimented with a "queue" that I dubbed the Hit &
Run queue. This queue was set up to allow one upload slot and zero entries
in the queue. The idea is to provide some of that old
survival-of-the-fittest-and-luckiest aspect of the pre-queue days, while
still allowing for the more genteel approach of waiting on line like most
civilized people prefer. It's still too early to tell whether this "queue"
will do anything useful.
> >>>>> The problem there is burocracy, I'd think. Too many resources wasted on things, which could be done more efficiently. IMO this is a valid statement here, assuming, that clients should be able to run in background. -Draketo <<<<<
> Gnutella, and file sharing in general, shares at least that characteristic
with the DMV. Demand far exceeds supply. So, there is a choice to be
made. Would you prefer an environment where the strong or the lucky win
out, or one where there's at least an attempt to bring some order to the chaos?
> Actually, you are wrong about this. This kind of queue (post office, etc) has other factors working in its advantage that cause it to be workable. In fact, most IRL queues (for people) have the EXACT same "benefit" that lets them work properly. ( Check out my more exact analogy, in the paper on LONG queues, with TCP buffering, for an exact analogy. )
Even the vestibule scenario had this problem in the analogy, but it was only to help you visualize that the underlying problem wasn't addressed by a queue.
Specifically, They can enforce upon you the following condition(s), which cannot easily be transferred to Gnutella:
1) The act of offering you a queue slot (a place in line) places an immediate, unshakable BAN on other queues. This prevents you from continuing to pile up outstanding "Consume Resource" commands.
2) At 6:00, the post office closes its door (NO MORE DEMAND!!! =) yet the employees will stick around to service the remaining queued Demand. This does *not* transfer to gnutella in any easy way that I can see, unless you force the user to remain online after they would wish to leave. Actually, we are discussing potential solutions to thi problem. Hopefully we will stuble onto something really useful.
There are also a (3) and (4) that are directly responsible for why this analogy doesn't work (except as a concept analogy for the underlying problem, which your post office argument didn't change, and even enforces, same problem of supply and demand!)
> I should also add that this doesn't have to be an all-or-nothing
situation. I recently experimented with a "queue" that I dubbed the Hit &
Run queue. This queue was set up to allow one upload slot and zero entries
in the queue. The idea is to provide some of that old
survival-of-the-fittest-and-luckiest aspect of the pre-queue days, while
still allowing for the more genteel approach of waiting on line like most
civilized people prefer. It's still too early to tell whether this "queue"
will do anything useful.
-----
> Since we all seem to agree that queues are somewhat necessary, we
must not forget that the vast majority of users of Gnutella use
ADSL home connections, and that almost all those connections are
disconnected every 24 hours, and change their IP.
> >>>>> Not all agreed. This discussion wasn't halfways over. -Draketo <<<<<
-----
What we need to do is make files easier to obtain. This can be done
with a combination of the following:
- Prefer uploading a complete file to one connection instead of
uploading several pieces to several connections. Shareaza has a
rotating slot scheme that gives each connection a subset of data
before disconnecting it. This is flawed - it doesn't encourage the
shortest time transfer of complete files.
> Of course this system only works if the client also offers Partial File Sharing. The rotating slot system makes sure that more people get parts of the file and by doing this many upload slots are created. I think this does much to help the distribution of large files.
> I am convinced there is a place for this approach. It should not be
applied to everything of course (and Shareaza doesn't), but there
are some situations where it can be very beneficial -- specifically
when uploading partially downloaded files. Here the objective is
not to get complete files out there (which would not be possible
anyway -- you don't have the complete file locally), but to maximise
the efficient spread of the file data. I think BitTorrent does
pretty well in this respect, and it does it by stopping and starting
a bunch of active upload connections based on certain criteria. A
linear queue is a simpler case of that, but the concept of giving
each host a high-speed turn to grab a verifyable chunk they can
reshare then moving on to the next host is the same. Letting hosts
leech away at a file for as long as they like is not efficient when
content is dissipating quickly from a small number of complete
sources to a large number of incomplete sources.
> - Prefer uploads to faster connections over slow ones. Some servents
have small,medium,and large file queues. This is an interesting idea,
but it would probably be better to have queues based on connection
speed. The principle is that reserving slots for very fast
connections will reduce the time required to propagate a complete
file (assuming the other end is sharing)
> How would you determine the connection speed of the other end ? I'd rather prefer uplaoding to clients that support PFS. And it is a good idea to prefer uplaoding files that you are currently downloading. eMule has put this idea a bit further by giving priorities to your files depending on how wide spread they are. And it seems to work (at least for me).
> There are many ways, here is one: queue a few connections, and give
each of them a couple of minutes of transfer before queueing them
again. After several connections have received transfer, favor the
faster one, and put the other connections into a medium or slow queue.
> I agree, although the metric is similar. What we're really trying
to do there is categorise based on the expected time to complete the
file. We don't want transfers that could complete very quickly
having to wait behind transfers that will take hours or days -- at
least in the case where you get unrestricted access at the head of
the queue. Ignoring all other factors, categorising based on file
size is a reasonable start, and seems to work pretty effectively in
Shareaza. Taking into account connection speed as well would
improve it. The concept of files can be disregarded entirely and
behaviour determined purely on transfer speed, but this does present
some challenges in an environment where many files are being offered.
- There are many ways, here is one: queue a few connections, and give
each of them a couple of minutes of transfer before queueing them
again. After several connections have received transfer, favor the
faster one, and put the other connections into a medium or slow queue.
I doubt that this will work in the real Gnutella world. There are too many factors that have an impact on transfer speed. What if the user starts uploading to his FTP server ? Or if he has a fast download going. Both will have an impact on his upload speed and therefor any measure of the upload speed will be inaccurate.
Because then you'll have a higher chance to get an uplaod / queue slot. You're right with the current active queue system the problem is just moved to the end of the queue, that's why I think PARQ is a good idea.
> - Partial file sharing, with immediate sharing of in progress
downloads. Upload servers insert the IP and port of downloaders in
progress into a separate PFS mesh. This is identical to the scheme
used by BitTorrent - downloaders immediately share verifiable chunks
as they are transferred, and the upload server propagates their
information in a mesh.
> In the system used by Shareaza it is the responsibility of the
downloader to insert itself into the source list, which it does as
soon as it has received enough file data. This gives a bit more
control to make sure the downloader is only listed once it has
enough of the file to reliably share, and if it knows it is
contactable.
I also use a single source mesh rather than a dual system.. we had a
debate about that before that should be easy enough to dig up. In
summary I felt that is the way to go, others disagreed.
> - Full HTTP pipelining on the download side. BearShare currently
supports full pipelining on the upload side, and there remains the
work of implementing it on the download side. If all servents fully
pipeline requests on keepalive connections, there would be a 5% to
10% improvement in transfer times (sometimes more if the host is near
its send bandwidth limit).
> That's pretty much the story in Shareaza's HTTP implementation as
well, and I agree that pipelining requests is a good move. Shareaza
to Shareaza transfers in the other two transfer protocols it
supports at the moment (both of which use pipelining) are noticably
faster than those which use HTTP as a result of this. HTTP can
benefit similarly, although its overhead is still going to be higher
thanks to the overhead of the requests and responses. I've been
giving some thought to abbreviated requests and responses that omit
all but the most critical of headers -- Shareaza already omits
several headers on header blocks after the first, however even more
can be ommitted when the same file is being requested (perhaps even
the URN, although that's starting to get very messy).
> - Forced sharing of the last N downloads. After a servent hashes a
new download, it can keep the file handle open and return query hits
for the file, and share the file. This prevents the user from
deleting the file or moving it out of a shared directory. These files
would be shared regardless of the user setting for sharing. There
would be a limited number of slots, and a (small) bandwidth limit on
these forced shares. This directly attacks the freeloading problem,
and creates many new upload slots on the network. Forced shares would
be canceled when the application exits.
> Well this would be even stronger than forced PFS, but I like the idea, but I'm sure others don't.
> It would be stronger in some ways, yes, but PFS and forced sharing of
the last N downloads are complementary. I agree, it is a sensitive
subject. On the other hand, if it was OK for a user to download the
file, I think it is morally acceptible to force them to share it, for
at least a little while, and with an upper limit on resource usage.
> I'm not absolutely sure about this, but I think this will only work under MS
OS (at least in Java). But I like the idea.
-----
> The bigger problem is that files take so long to retrieve. PARQ (or
remote queues) does nothing to shorten the time required to obtain a
file. All it does is give the user a placebo under the guise of
a "fair" upload system.
> I don't think it is a placebo. It reduces file requests and it makes sure that you can downlaod after you wait for some time, which is not garuanteed without queues. Without queues you have to be agressive to get what you want, and we don't want to be agressive on the Gnutella network, do we ?
> Untrue. If the average queue wait time is longer than the average
uptime, you will lose the queue position when the servent goes
offline (yes, this happens).
> It depends on the queue. PARQ does not.
> Why do you have to be aggressive?
Because then you'll have a higher chance to get an uplaod / queue slot. You're right with the current active queue system the problem is just moved to the end of the queue, that's why I think PARQ is a good idea.
> When you use words like "guarantee",
what I am hearing is that you are trying to fulfill a psychological
need (i.e. you want more certainty that you will get the file, even
if you have to wait a long time).
Your claim that queueing reduces file requests is completely false.
> Vinnie, it is completely true with PARQ. With a large queue, there is
no busy result in practice. And if 4000 entries per queue is not
enough, let's grow the limit.
> You are a victim of selective observation - you are ignoring the
download requests that receive a Busy result because the queue on the
other end is full. You are only paying attention on the sources which
go into a queued state. This is human nature - you WANT to think
queues help, so you only look at the cases where you feel they are
working.
> Why do you have to be aggressive? It is easy to return Busy to IP
addresses which re-request too frequently. Also note, that the
Vestibule Problem STILL EXISTS! Queuing doesn't remove incentive for
aggression, it just moves it from the end of the upload slots to the
end of the queue (i.e. servents will hammer a host whose queue is
full so they can squeak to the end of the line before someone else).
-----
>>>>> Follows a long text -Draketo <<<<<
> This statement does nothing to discount my theory as to **WHY** they
(queues) *kinda* work in real life.
> *kinda*? How can one go to buy at the market without queues? With an
axe? Is there a better solution not being implemented only by pure
malice on the sellers? They amuse our waits? (Note I'm enjoying this
conversation, not saying you're a fool or something. I feel a real
attraction to someday understand why queue detractors still exist ;-)
I need to understand what you understand for real life queues not
working. I think they work. I don't like the wait, but I don't see a
better solution.
There is a plethora of cases where queues do not really "work" as
far as I am concerned. Do you consider the DMV to be "working" when
it takes you 4 hours to register your car? Four hours of wasted
time, where you have to sit there doing **NOTHING ELSE**??
Well, for me a date given by the doctor is in essence a queue in what
you don't need to be present, so you can do other things until time
arrive. Better enough if it's my computer who does the wait. I see
that as a strong similarity with p2p queues, where you simply note
from time to time that you're still alive. I don't keep looking at my
screen to see how I advance in queues.
The main change in the POV is that these four hours of wait are no
waste, as you say. In fact, these are what buy you the download,
contrarily to *not download, maybe, never*.
I almost *NEVER* have these problems... Is it just me? Or does
Gnutella just magically WORK for me, wherever I happen to be trying
from?
Certainly your experience differs from mine. Pre-queue times... uh.
My base point is that between chaos and no chaos, I prefer no chaos.
Please convince me that queues in reality don't remove disorder.
You can say: servers go offline, the mean uptime is bla; the queues
will be never long enough... but that's only the *worst* case. What if
people get used to long uptimes or technology allows it? What if you
are sharing a rare file and can guarantee a client their order of
upload? You still maintain that's no real advantage?
-----
> PREMISE 1: Queue slot give is intended as a low bandwidth, low resource-consuming route which leads monotonically towards an active transfer slot.
NOTES:
1) A queued client which finishes the queued file via other means will remove themselves from a queue.
2) Even if the majority of the time, a queued downloaded sits idle, and periodically creates little bandwidth "spikes" as he(or we) check back in, this can be represented by an even smaller average bandwidth...
2a) Example: 10 queued downloaders each sit idle for 9 seconds, and each transmit 1000 bytes in the 10th. This is (for this argument) equivelent to each of those 10 downloaders using 100 Bytes/second each, or a really slow download speed.
> You are right if there where only active queues. But PARQ also has passive queues which do not require an active connection and are therefore not to be considered as a slow uplaod slot.
Imagine this: A passive queue slot is like having an appointment with a doctor. An active queue slot is like sitting in the waiting room. Of course it is useless to have the waiting room full with people.
> 3) Bandwidth devoted to queue "re-requests" do not contribute towards file download directly. Instead, it is used for AltLocs, headers, queue status update, etc.
4) A QUEUE slot is **NOTHING MORE** than a glorified, slow, upload slot, because:
4a) In both QUEUESLOTS and FILESLOT cases, a downloader may stop downloading early if the file is completed elsewhere
4b) The desired intention in each case is to transfer file byets
4c) Blocking points: (Queue Retry time == Download RetryAfter time)
4d) In a fully working case, after discrete time N, an queue slot will "magically become" an upload slot, causing file to begin to transfer.
4e) Ideal queue case from (4d) is N=0, or no wait time.
PREMISE 2: On GNUTELLA, as a network, there is a severe discrepancy between bandwidth supply and bandwidth demand. Demand must be much higher than supply, or we wouldn't be having this discussion...
NOTES:
There are several reasons for this:
5) Human Nature: As long as some people want to CONSUME but not CONTRIBUTE this will occur
6) ISP technology: ADSL has downloads CONSUMPTION potential 3x-5x greater than the uploads CONTRIBUTION potential
7) Program Uptime Charecteristics: People tend to close the Gnutella program when the DOWNLOADS complete, not the UPLOADS
PREMISE 3: Adding QUEUE slots to the network is ****STRICTLY EQUIVELENT**** to adding upload slots, in the general case. Any distinction between the two is artificial/semantical in nature. They both accomplish the same thing (and, for instance, in BearShare code, they are actually the same implementation!)
See NOTES 5-7 for why
******************
Now, the important part:
REASONING, FROM A TO B:
1) Inherantly in todays GNUTELLA, content and resource DEMAND exceeds content and resource SUPPLY.
2) This inherant problem DOES NOT EXIST SOLELY as a "software" problem; there are real world, physical reasons for this to be the case.
3) Adding an upload slot (SLOW QUEUE slot or FAST UPLOAD slot) to a person who is **ALREADY UPLOADING** does NOT add content/resources to the network.
4) MASSIVELY adding queue/upload slots to the network has this as the ONLY MAJOR, PROVABLE effect: Average Resource consumed/available (PER SLOT) is going to go DOWN!!! Average transfer time per UPLOAD slot is going to go UP.
CONCLUSION: The underlying problem which is CAUSING busies, slow/failed downloads, etc, is NOT ADDRESSED AT ALL by queuing system. Therefore, queuing system (even PARQ) is at mest nominal in any *improvements* to the network. More likely, they are **COMPLETELY** placebo in nature.
> And again I must say:
*) Removing the possibility of infinite starvation for a client,
caused by other, allied or not, clients, is a gain that justifies by
far the incurred penalty of maintaining queues. That's a *real* and
not *nominal* improvement for the network *users*.
So work addressed to improve queues "wasted" resources (like PARQ) are
a real technical improvement, and since I don't foresee a near future
where everyone has unlimited upload capacity, some solution must be
provided, and for now no other than queues is in use (that I know).
I must admit: when I talk about large queues, I'm thinking about the
optimal, ideal, not resource-costly queues, because I see that as a
goal. Your post addresses these real-implementation concerns. You
don't mention, however, my concerns about starvation. I understand you
dismiss them as placebo, or "fairness" non-technological problem? I
don't think a p2p system should forget about that. The ultimate
purpose of a p2p network is to provide its clients their requests, and
you're saying that giving not any theoretical guarantee, even if it
sometimes can fail, is OK?
I'm equally interested in the real costs of queues, however, because I
think they must be addressed as an obligatory step to better p2p
experience.
I would better understand that you complaint about making queues
larger but not more efficient, that's certainly a bad recipe, but
that's precisely what PARQ addresses.
> NOTHING that i said is differant with PARQ:
1) It does NOT add resources
2) It IS an upload slot, whether a connection is maintained or not. In fact, it is EXACTLY like a pre-KeepAlive upload slot. Being "passive" doesn't make it differant.
3) Your comparison with the doctor analogy is a good starting point... If we assume that the **RATE OF DEMAND** for the doctor *CAN* be satisfied by said doctor by making appointments, without external supply being added (more doctors), then his *****RATE OF SUPPLY***** exceeds his *****RATE OF DEMAND*****
THIS IS THE KEY POINT! It has NOTHING to do with active/passive/whatever crap.
> From the above, I gather that your problem is not with queueing, but
with the rate of supply on Gnutella being less than the rate of demand.
I could not agree more. We need more people that share than we need
leechers. We need to increase the supply.
But stop shooting at queueing systems, and to PARQ in particular. I
know PARQ is not going to solve the supply shortage, but it solves
many problems:
1) fairness, hammering,
> Easily solved by keeping an IP address history, and returning Busy
to nodes which request too often. No changes required in other
vendor's code, and no protocol documentation needed.
> Right for excessive hammering. But that does not handle "fairness".
Although life is not necessarily fair, where is it written that the
computer world has to mimic every aspect of real life?
I don't care whether real life is fair or not, this is not the debate.
I'm maybe an idealist (well, I write open-source software after all),
but why can't we strive to add fairness in the computer world, if we can?
> 2) IP changes and
> Is this really a problem Raphael? I wonder how FastTrack deals
with it (hint: it doesn't).
> Well, it is for me you see, when my ISP disconnects me every 24 hours,
and my IP changes. There are times where I cannot move to slot #1
during the time I'm actively queued. And even when I can, maybe I'll
have only 2 hours before the forced disconnection.
So for ME it is important.
What FastTrack does is irrelevant.
> 3) other brutal
disconnects.
> Another marginal problem. But BearShare solves this by keeping the
slot reserved for 15 seconds after the host disconnects abnormally
(in BearShare, these uploads turn red and show "Reserved" for the
status). Again, no changes required in other vendor's code, and no
protocol documentation needed. As long as servents immediately re-
connect after losing a download source, this change is backward
compatible.
> Hmm... 15 seconds is way too short. GTKG will wait ~60 seconds in
case the connection crashes, to let the other end restart the servent,
in case it is the one who crashed. And to avoid hammering if the
remote servent explicitly killed the upload.
> That's a key argument. Adding an additional queue system will effectively not solve the problem.
It seems more urgent to use other systems to meet the demand by increased supplies:
- Push proxies: more hosts to download from, with a better distribution of the pushed content.
- Passive content caches in proxies: less drain on pushing servents that can upload their content to more locations
- Fragmented contents: a better and faster distribution to more hosts, in order to free upload slots faster
The PARQ proposal is still valuable to let the new system work more smoothly, and limiting the congestion of only a few supplies.
> It is precisely BECAUSE the doctor's RATE OF SUPPLY exceeds or meets the RATE OF DEMAND that the queuing system (appointments) works! This EXCEED RATE OF DEMAND property requires that he KEEPS THE DOCTORS OFFICE OPEN until all patients are satisified, as well. If your doctor (or bank) made you leave at 5pm because you were still IN LINE, WAITING then you could NOT make the claim that the queuing system works. Even more in the Doctor's favor, while you are waiting in the waiting room of Doctor XYZ, you CANNOT also be in the waiting room of Doctor ABC.
This is yet another example in the *real world cases* why queue may be appropriate: The entire system is DESIGNED to cause DEMAND to be limited to the available SUPPLY, and then enforcing SUPPLY to last until all of a given DEMAND is met. This CANNOT BE ENFORCED in gnutella! As long as I can sit in Raphiel's queue AND Vinnie's queue AND Mario's queue AND Philippes queue, etc, this WILL NOT work because RATE OF DEMAND has not been in any way curbed, or even placated!
This is DrOffice (real world) case is NOT THE CASE in Gnutella. RATE OF DEMAND is whole magnitutdes larger than RATE OF SUPPLY, and queuing (yes, even PASSIVE queuing) does NOT change this!
Passive != NonUpload... Passive MAYBE == less resource intensive, but it is still an upload slot, plain and simple. If it ISNT one, because it is passive, as you claim, what is the FUNDAMENTAL difference?
So far, only arguments have been "Ahh but this is passive, so it isn't the same" without detailing WHY it isn't...
1 more unread GDFs have appeared in my inbox though, so who knows... =)
> I believe that to be the fundamental problem here; one which (obviously) people think queuing is intended to solve, which it isn't and won't. Simple as that.
I *DO* believe that queuing is useful because:
1) It provides a nice placebo effect to user ( Queued 118/220 looks nicer than "Busy" )
2)
3) For legitimate clients, certain types of behaviors can be enforced for this specific (queued) conenction.
I *DO NOT* believe that queuing is useful because:
1) Making files more reliable. This is closely tied to the fundamental problem above, which is the root CAUSE of files being hard to find. Since queues do not address this, they cannot hope to solve that problem.
2) Average wait time for a given transfer slot goes up.
3) It provides an entry barrier into the Gnutella Developer game... AKA in order to effectively compete with established vendors, a new player must now code an entire Gnutella client **AND a download queuing system**. Without this additional step, now, his client is useless, as it will ALWAYS do worse than the QUEUE enabled systems.
> This is solved by PARQ, you only have to obey the retry-after
header.
> What about the upload side? What about accepting QUEUE connect-backs?
> You should not get QUEUE callbacks if you did not advertise PARQ. It
was a bug in the early GTKG implementation that it happened.
The only downside to that is that you might not be able to contact the
server if you don't handle QUEUE, either because its IP has changed or
because it is firewalled, and the push route was long lost.
> You missed the point, which is that a full implementation of PARQ (or
remote queueing for that matter) is a non trivial exercise for a new
servent developer.
> 4) The "FAIRNESS" of file transfers: This whine is brought up here REPEATEDLY in reference to queues. ITS NOT FAIR! Why should person A get a file before me when I asked an hour ago? ITS NOT FAIR!!!! **BOO HOO HOO**
> I don't get your point here. A person A should get the file before you only if he asked more than an hour ago. This is possible with PARQ.
Of course PARQ doesn't solve the problem that there are too few resources on Gnutella, but there is not that much we can do about it, and I agree queues don't help (much) to solve this problem.
> Why not have fairness as an objective? I would not want to use an
unfair p2p system. Why you don't remove the upload queue in Bearshare?
You are forced to maintain it for downloading, but nobody (you say)
has the right to be queued in your servant. Do the default Bearshare
queue of 0 length. Maybe others will follow your example when they see
the advantages.
> Why have it? An objective for its' own sake is worthless.
Noone has yet to provide a good, realistic reason why it is necessary. Remember, fairness doesn't change the SUPPLY RATE. All it makes sure (apparently, by this definition) is that EVERYBODYS transfers sucks equally much.
Now, I can tolerate the few transfers I make which NEVER COMPLETE, but ONLY because of the few (or several) that are MASSIVELY SUCCESSFUL.
Take away the good (massively successfull) to eliminate the bad(never complete) and replace them all with something like (All transfers take hours, days, weeks, or months minimum) is NOT appealing to me!
> My 8 year old brother whines and complains about this "Its Not Fair" junk...
My 18 year old brother realizes this is TheWayThingsWork(tm) and tries to make the most of it, using the "Its Not Fair" principle to his advantage, when he can.
> Okay, you recommend I try to abuse Gnutella as long as I can hide it
from you? Are you simply making a "golpe de efecto" or you really
believe I should abuse everything I can? Because I see many unfairness
daily, but also don't try to promote it. Quite the contrary.
-----
There is always the danger of "solving" problems that are
irrelevant, and burdening ourselves with overengineered solutions.
I must agree with Vinnie about this comments. I've mentioned PARQ in
my posts because it is in the domain of improving queues, not as the
holy grail. However, things that can be solved locally (like slot
resistance to drops) don't need a new specification, and I don't like
the QUEUE callback.
-----
> Problems solved by QUEUE type system:
1) User opinion is slightly increased by QUEUE as opposed to BUSY
2) Some behaviors can be controlled or enforced even across vendors (Play nice, or no queue for you!
3) "Fairness" - which may or may not be desirable
Problems solved by PARQ queue system specifically
4) Short downtime not catastrophic
5) Graceful(?) handling of DHCP IP address changes as far as transfers are concerned
6) A (or a few) issues with firewalled hosts and "ActiveQueueSystem"
Does this justify:
A) Masssive discussion that may not get anywhere (Is "FAIRNESS" property needed, etc)?
B) Time spent to implement and test *another* queue system?
Remember that massive adoption of PARQ will act as an additional entry barrier to anyone who wants to write a client, without providing a HUGE benefit (i would say is needed) to justify such a barrier.
> Sorry, yes.
You're right, PARQ is not trivial to implement, and it is about an order
of magnitude more complex than simple "Active Queueing" is.
Does that mean it's useless? I would not go that far.
> No, PARQ is not useless, much the same way that a Busy signal from a
host which doesn't support queueing is not useless (i.e. almost
useless).
> PARQ solves some problems, but I understand it can appear more as a band aid
right now. But as Gnutella improves, it will become an asset.
> Personally, I would say a resounding NO to time justified.
Better to solve the *REAL* underlying problems, where ROI is going to be orders of magnitude better.
> OK. Don't implement PARQ. Active queueing is enough for you.
Otherwise, I agree on the need to solve other real problems, as mentionned
by Vinnie and others. It's just that GTKG now happens to have PARQ in,
so it's something less that we have to bother at.
-----
>>>>> A long one from dave again (I don't take the time to cut THIS up, too much work. -Draketo <<<<<
[QUOTE]
Why not have fairness as an objective?
[/QUOTE]
Why have it? An objective for its' own sake is worthless.
Noone has yet to provide a good, realistic reason why it is necessary. Remember, fairness doesn't change the SUPPLY RATE. All it makes sure (apparently, by this definition) is that EVERYBODYS transfers sucks equally much.
Now, I can tolerate the few transfers I make which NEVER COMPLETE, but ONLY because of the few (or several) that are MASSIVELY SUCCESSFUL.
Take away the good (massively successfull) to eliminate the bad(never complete) and replace them all with something like (All transfers take hours, days, weeks, or months minimum) is NOT appealing to me!
I will make a more formal mathematical argument later, when i am "off the clock"
[QUOTE]
I would not want to use an unfair p2p system.
[/QUOTE]
So don't. No one is forcing you to. Just don't come complaining back when the ETA for an average file begins MONOTONICALLY GROWING until it reaches the BREAKING POINT (average user's patience limit to wait for an arbitrary file to download has been reached) which will prompt him to LEAVE THE NETWORK. Yes, I am saying this with a straight face, and no I am not paranoid.
Again, this is a DIRECT RESULT of the fact that we have DEMAND exceeding SUPPLY.
Maybe I can make this EVEN SIMPLER, using an example that NOONE on this panel can POSSIBLY claim isn't appropriate, and everyone has experience with!
(Hopefully this example is SOOO simple and accurate that it cannot be reasonably questioned)
Pretend that your (BUFFERED!) outgoing TCP/IP pipeline speed is 100KBps.
Pretend that your TCP stack has an internal 100KB buffer...
I would like to send data out at 150KBps.
This is **THE MOST PERFECT ANALOGY** I can come up with here...
SUPPLY: 100KBps & 100KB buffer
DEMAND: 200KBps
Now, the first 1/10th of a second, I want to send 1/10th of 200KB, or 20KB. In this time, only 10KB can physically be sent. The other 10KB is put in a buffer (active QUEUE slot)waiting to be serviced...
DISCRETE TIME STEP 1/10:
Sent: 10KB, Buffered: 10KB, Lost: NONE
DISCRETE TIME STEP 2/10:
Sent: 20KB, Buffered: 20KB, Lost: NONE
DISCRETE TIME STEP 3/10:
Sent: 30KB, Buffered: 30KB, Lost: NONE
...And so one...
DISCRETE TIME STEP 10/10 (1 second):
Sent: 100KB, Buffered: 100KB, Lost: NONE <== At this point, our outgoing buffer is full
DISCRETE TIME STEP 11/10:
Sent: 110KB, Buffered: 100KB, Lost: 10KB <== We begin losing packets and have to retry later (BUSY file xfer)
DISCRETE TIME STEP 12/10:
Sent: 120KB, Buffered: 100KB, Lost: 20KB <== And here it becomes obvious that the situation is getting worse, not better... BUFFERING (DATA QUEUE slots) did NOT help at all! Instead, now there is a DELAY on EVERY BYTE i want to send! It must wait for the other 100KB in buffer to send first...
In response to buffer filling up, we may (somewhat reasonably, apparently, based upon comments here) assume that the "PROPER" response is to increase buffer (MORE QUEUE SLOTS! So many, in fact that "IN PRACTICE THEY NEVER FILL UP"
So we start over, with a MUCH bigger buffer (like PARQs MUCH bigger QUEUE)...
Same starting conditions, buffer size of 1GB...
DISCRETE TIME STEP 1/10:
Sent: 10KB, Buffered: 10KB, Lost: NONE
DISCRETE TIME STEP 2/10:
Sent: 10KB, Buffered: 10KB, Lost: NONE
...So far, the same...
DISCRETE TIME STEP 10/10:
Sent: 100KB, Buffered: 100KB, Lost: NONE <==Still the same as before...
DISCRETE TIME STEP 11/10:
Sent: 110KB, Buffered: 110KB, Lost: NONE
DISCRETE TIME STEP 12/10:
Sent: 110KB, Buffered: 110KB, Lost: NONE
DISCRETE TIME STEP 2000/10: (200 seconds later)
Sent: 2000KB, Buffered: 2000KB, Lost: NONE <==Still none lost...
BUT:
Notice, now, that the **AVERAGE WAIT TIME TO SEND A BYTE OF DATA** has begone to **MONOTONICALLY INCREASE**!!!
It also will not stop increasing until we fill our *much expanded* buffer. Every additional bytes sent at this point COMPOUNDS the problem of ever-increasing wait times!!! EGADS!
This as a DIRECT RESULT of the "FAIRNESS" desire.
So, since "ARBITRARY FILE" wait times will ONLY increase with this "FAIRNESS" system, waits get longer and longer and longer and...
When does the madness stop?
Glad you asked, here's when:
Take all users of Gnutella, U.
Ask each of them how many minutes M at most they would be willing to wait for an average file to download, before assuming that "This program/network/whatever sucks!" and go to Kazaa.
Then SIGMA(M)/U = average user "FedUp with delay" time T.
When the NETWORK WIDE delay for an ARBITRARY file exceeds T, then we reach the natrual conclusion to this runaway scenario.
At this point, a steady stream of users will leave, REDUCING (hopefully) only the DEMAND without removing the SUPPLY to the network.
Then, either:
1) Steady state of new users, people leaving, and the *ABSOLUTE WORST POSSIBLE WAIT TIME* for files
OR
2) Users who leave are the SUPPLY, not the DEMAND, and the network dies.
Are my conslusions Draconion? YES
Are they Justified? ***YES***
*Quod Erat Demonstrandum*
[QUOTE]
Why you don't remove the upload queue in Bearshare?
You are forced to maintain it for downloading, but nobody (you say)
has the right to be queued in your servant. Do the default Bearshare
queue of 0 length. Maybe others will follow your example when they see
the advantages.
My 18 year old brother realizes this is TheWayThingsWork(tm) and
tries to make the most of it, using the "Its Not Fair" principle to
his advantage, when he can.
Okay, you recommend I try to abuse Gnutella as long as I can hide it
from you? Are you simply making a "golpe de efecto" or you really
believe I should abuse everything I can? Because I see many unfairness
daily, but also don't try to promote it. Quite the contrary.
[/QUOTE]
I am not saying that you should "abuse abuse abuse, as long as I cannot tell"...
I am saying that QUEUING combined with this "FAIRNESS" concept (as defined previously) is INHERANTLY DETRIMENTAL TO GNUTELLA'S HEALTH.
[QUOTE]
I got news for ya: LIFE ISNT FAIR! NOR SHOULD IT BE!
I'm glad to be in another neighborhood ;-)
[/QUOTE]
So it (life) is fair where you live?
Be prepared for the massive migration of people into your locale... =)
[QUOTE from Raph]
But stop shooting at queueing systems, and to PARQ in particular. I
know PARQ is not going to solve the supply shortage, but it solves
many problems: fairness, hammering, IP changes and other brutal
disconnects.
[/QUOTE]
NO! As long as I feel that the natural conclusion of "FAIR QUEUES" is the rapid destruction of Gnutella, I am going to rant and rave as much as necessary to prevent this!
[QUOTE from ]
So, to be clear, you just say that if nobody were implementing queues,
we would be *exactly* equal than with queues?
Please confirm that. That's what I understand from your phrase.
[/QUOTE]
I believe that the concepts of "FAIRNESS", "FIFO ORDER OF TRANSFERS", "TRANSFER/QUEUE PERSISTANCE ACROSS LAUNCHES", and the protection of the longterm health of Gnutella are MUTUALLY ESCLUSIVE!
-----
> I believe that the concepts of "FAIRNESS", "FIFO ORDER OF
TRANSFERS", "TRANSFER/QUEUE PERSISTANCE ACROSS
LAUNCHES", and the protection of the longterm health of Gnutella
are MUTUALLY ESCLUSIVE!
> Despite I agree with your argument that the main problem of demand exceeding supplies is the real problem that PARQ will not solve, I still think that PARQ will have positive impact when chunked transfers will be more widely used, as no one will be allowed to monopolize a upload slot for extensive time (with the current high risk that finally that downloading user does not reshare the content he downloaded).
Let's implement a scheme in which all downloading users will share the fragments of files they are currently downloading, by caching contents and acting as proxies at least as long as they are downloading the file, and we will greatly increase the supply.
Then PARQ will become interesting to load balance the many downloaders of fragments, with also the help of the mesh to find new locations.
But please, Dave, stop shouting (with uppercase) ! Gnutella can improve simultaneously in several distinct areas, but each improvement alone will not resolve all the problems. If distinct vendors want to work on some area, it can do so according to its own measurements of the network state, and its development strategy.
As long as each improvement is discussed to be interoperable and not cause too much compatibility problem with others (including those that will implement the feature one day, but will not do it immediately), there's some place for everyone, and distinct approaches and experimentations, something that a completely unified network using the same program would not allow...
-----
> So what do you suggest we do?
Go back to a system where everyone has to retry every 60 seconds exactly
to both avoid being banned and seeing the slots it wants taken by someone
else?
How do you retry when the host is firewalled and your push route is lost?
How do you retry when the server's IP changes and your Gnutella horizon
does not let you see the new server. Will you requery every time a source
dies from your set of known IPs or do you trust the download mesh?
It's good to emit critics, and I wish you'd have emitted those critics
back when I published the PARQ specs, almost a year ago. But you understand
that failing to propose an alternative is not going to buy my vote here,
don't you?
It's true that slot #68 in my large queue has an ETA of 14 days, at my
current upload rate. What would the ETA of that host be without queues,
knowing that I have only 3 slots and get many concurrent requests?
If you show me a better system, I'll buy it immediately.
> I will hold you to that. =)
I am working on this problem, rest assurred.
> For some definition of "better" that I will unfortunately not explicit right now, because I
don't know yet what it means.
> Ack... only thinkg worse than a moving target is one which isn't yet defined... heh
But, I expect that that which would seem "Better" to me/Joe average-user/you/whoever is fundamentally the same, so no problem.
-----
> Instead of real world examples or fantasy Post Offices
> You can make fun of the "concept analogies" i used, or you can extract the CONCEPT from them which i was trying to use to illustrate that the underlying problem is being overlooked.
Even if you don't find that useful or helpful at all, how do you answer (what i consider) the most convincing evidence at all, which is the completely equivelent outgoing data buffering case?
If you want to try to poke holes at something, use THAT one, not the "fantasy post office", as you say.
It is intended to be the "equivalent" case, and i believe that it is completely analogous and will stand up to all your scrutiny. Maybe not, though.
> try considering eMule again. I have heard many people say that it doesn't
work. Well here's the first thing you should learn from this
1,000,000 user network - it DOES work.
> Everybody seems sure that there is a magic bullet for PFS and
BitTorrent's Tracker protocols may be it, but until someone has
transformed Gnutella to use these new protocols we're stuck with pure
P2P intelligence and at the moment the most intelligent thing to do
with high demand is PARQ.
-----
> Even if you don't find that useful or helpful at all, how do you answer (what i consider) the most convincing evidence at all, which is the completely equivelent outgoing data buffering case?
> OK so why is PARQ "fairer," well, when a peer decides to dedicate a
few hours, or days, to downloading a file I think it is "just" that
his FIFO (or equivalent "fair") rights are respected no matter how
long the queue is. You are only depriving the rest of your users of
these rights if you shorten your queues.
The extrapolation you neglect to include in your example for Active
Queuing (the current system):
DISCRETE TIME STEP 2000/10: (200 1/10 seconds later)
Sent: 2000KB, Buffered: 100KB, Lost: 1900KB <== ~50% LOST! (as
expected from initial specs :-/)..
..which means there is a massive amount of unmanaged (Lost!) demand
flying round the network, which brings me to the second point I made,
efficiency.
PARQ should be able to precisely schedule upload slots (mosteo's
determinism again), especially with time based slot rotation. The
best Active Queuing can do in a congested situation is back off
exponentially. This would undoubtedly be less efficient than a
specifically timed or passively triggered mechanism and it does not
have PARQ's benefits: "Legacy" and "Remanence".
-----
> I think I see your fear though, that if people see how long they have
to wait, they won't wait. eMule is testimony to the opposite. Really -
'nuff said.
> >>>>> AFAIK eMule is mostly used for large files, where waiting doesn't hurt. -Draketo <<<<<
-----
> "This is **THE MOST PERFECT ANALOGY** I can come up with here...
SUPPLY: 100KBps & 100KB buffer
DEMAND: 200KBps"
I quote the above simply to provide reference. It is not intended to
represent the entirety of the analogy, which was quite lengthy and
detailed. What follows addresses the entire analysis, not just the portion
quoted above.
I believe the fundamental problem with this analysis is an assumption of
infinite demand for the resources of every user who is sharing files. No
one is going to deny that, overall, demand is much greater than
supply. That's a given. But it does not necessarily follow that demand
exceeds supply at every given point in time or for every user. There will
be points in time where, for any single user, supply will exceed demand and
the queue will shrink. There may even be rare points in time where the
queue will empty.
> Say PARQ is implemented by all, users will be able to participate in
more queues at once, maybe even every download they attempt and
there are some who like to attempt every result they recieve and
then cancel most when one gets going. Then the download mesh would
add many more sources to each attempt and all of those would be
attempted and queued. Because of this we would then see queue
lengths >> number of users, maybe thousands of times longer.
> If you do a broad search and get lots of results they bring back lots
of *different* hosts, so the peer that attempts every result will be
taking (possibly) thousands of queue places but it would only swell
each queue by one or two or whatever the value to which max uploads
per unique host is set. It would take thousands of users to behave
this way for queue lengths to grow by thousands.
> But this is not all bad, users cancel nearly all their (queued)
downloads as others complete. The 'long' queues advance more quickly
than regular queues so wait times are not many months longer than
before. Hoorah! Nothing is acheived accept we now use less bandwidth
and more memory.
> >>>>> But then I'd have to micro manage every DL or harm the network by downloading for unnecessarily long times. -Draketo <<<<<<
> On my part I like the benefits of a passive system but not the
ridiculously long queues. I wouldn't want to see 'Queued 1261/2000'
but I would like to see 'Queued 12/12', I could live with
seeing 'Queued'. How about the servent only show the position when
less than 20?
> Now *that* is placebo ;-)
> Exactly mos'. LOL. I'd rather have the real and realistic benefits of
PARQ than a wing and a prayer.
> I have exactly the same opinion, I, as a user, would prefer to see my rank progressiely decrease rapidly from 1261/2000 to 1240/2000 etc... instead of waiting for many hours that the message "Queued" changes into something more informational.
In fact if I can't wait for this time, I can still see how the ranking evolves, and expect a reasonnable response at some expected time.
However, a ranking system will proe beneficial if servents also implements partial file sharing while they are downloading a file, so that they contribute also to the number of alternate sources for the expected file.
There are several ways to manage queues, one could manage a priority queue per file, so that all demanded files are uploaded and shared, or a unique upload queue for all files.
> >>>>> Look farther down. I seperated this. -Draketo <<<<<
> The analogy also assumes a constant, inelastic demand. (I don't want to
pick on the analogy too much, but I do think it's warranted to point out
where the analogy differs from the real situation.) Demand is not
constant, it varies by time of day and probably a dozen other variables,
and, while I have no definite proof of this, common sense tells me demand
is elastic. (If the line at the post office is too long today, I'll try
again tomorrow morning. If I can't download that hot new file tonight,
I'll try again tomorrow.)
> Very good point Jay, **THE MOST PERFECT ANALOGY** does indeed imply
that queues only ever fill up. As you say, we all agree that queuing
will not solve the demand/supply problem and Dave's analogy clearly
shows us that you have to wait longer the further back you are in the
queue, but as I said in my post PARQ does indeed provide fairness in
congested circumstances, using long queues, fairness in times of
network frailty, using persistent queues and improved determinism
where you could be left with none using the current system.
> >>>>> Doesn't PARQ always give everybody the place in the Queue, which he/she had before? What should stop me then from staying in Line forever, even if my IP should change? Doesn't it make demand inelastic with that? These are just the questions bugging me at the moment of inserting this. -Draketo <<<<<
> If this was not the case, if the assumptions about demand in the analysis
were correct, everyone's queues would already be steadily growing. Not
just slowly changing average load as more clients offer queueing, but
steadily and noticeably growing to the maximum allowed queue size. After a
week or two of being online 24/7, anyone offering reasonably popular files
should see their queues filled to capacity all the time. I haven't seen
this. I haven't heard of anyone else complaining about it. I haven't
noticed any increase in the number of people complaining that it's hard to
get downloads to complete.
I've been sharing files for the last month or so on the eDonkey network
which, I'm told, has in excess of one million users, a large percentage of
whom stay online for extended periods of time. If demand were truly
constant, inelastic, and essentially infinite, I should be seeing my
eDonkey queue grow continually larger. I currently have it set to allow
2000 entries in the queue. The most I've ever seen in the queue is a bit
over 500. The majority of the time it hovers between 150 and 200
entries. At this moment there are 165 entries in that queue. (And, bear
in mind, this is a network where the clients queue up requests with EVERY
node that has the desired file. It's just about the worst case for queue
abuse.) My eDonkey queue is not growing out of control. The queues I
have that are used by gnutella almost never grow beyond 20 entries.
> Very good point Jay, **THE MOST PERFECT ANALOGY** does indeed imply
that queues only ever fill up. As you say, we all agree that queuing
will not solve the demand/supply problem and Dave's analogy clearly
shows us that you have to wait longer the further back you are in the
queue, but as I said in my post PARQ does indeed provide fairness in
congested circumstances, using long queues, fairness in times of
network frailty, using persistent queues and improved determinism
where you could be left with none using the current system.
-----
There are several ways to manage queues, one could manage a priority queue per file, so that all demanded files are uploaded and shared, or a unique upload queue for all files.
The first approach would allow rare files that are also more rarely demanded to be served faster, greatly enhancing the distribution of the newest files that don't have a lot of locations or sometimes not any other known location.
This would create separate queues for each shared file, and each shared file would then be queued individually independantly of the number of people requesting them. So the indication "Queued X/Y" could be an estimate based on the total of shared files currently requested (independantly of the number of people requesting each of them), and the number of people in the current file queue.
I forgot to speak about other possible approaches when managing per-file queues: in order to maximize the number of alternate locations and not favor only the most popular files, one could design each queue so that it will track the number of alternate locations currently known.
When a upload finishes, a random selector could allow switch between two approaches when selecting the next file to handle: it could be the next per-file queue in the global queue, or the per-file queue that has the lowest number of alt-locations (this implies two parallel global queues with one entry in each one per demanded file). When this next file to serve is determined, the per-file queue is rotated in the two global queues and reordered and rotated accordingly to their respective policy. The next client to serve is then the first one in the selected per-file queue.
This approach would still allow serving the most demanded files (those that have the longest per-file queues) without sacrificing too much the diversity of files available in the network (a bad effect of a single global queue, because a single file in the total of shared files on a host would jeopardize all the upload resources if it's too much "popular" and too new and thus too widely demanded, and that makes too many QueryHits unusable if they are not "popular" i.e. demanded).
> I agree that there are many ways to configure queues. I have had
great *fun* with Shareaza's new queues editor - it is a good feeling
for a user to be able to share files the way *they* want to.
I personally hope Shareaza implements PARQ because the popularity of
my larger files means that it's ED2K queuing implementation is the
only protocol able to efficiently queue the number of people that
want the files which obviously means that the files are unfortunately
not shared on the Gnutella networks.
The holy grail of distributing popular files without forgetting the
rare-un's is IMHO impossible, but I do think there could be better
more efficient PFS systems that might give more capacity to less
popular files.
Implementing PARQ will certainly help distribution of less popular
files because the long queues will ensure your request for the rare
file is not forgotten.
> I don't think so: if PARQ is deplyed in a way that all upload servers have now very long queues, it will be nearly impossible to find alternate sources before a long time for a rare file.
> Why do you think so?
Since the downloader retries every so and then (20 minutes is the maximum
retry delay in current GTKG's implementation), it can exchange the mesh
info with the server on a regular basis.
How is this making it nearly impossible to find alt sources?
> We can penalize a bit the most popular files that also get the most demand as they certainly have more potential sources in the mesh. But for rare files with unique locations, they will be impossible to transfer because they are immediately placed far away in the queues by the most demanded files. So if the unique source disconnects, the rare unique file will never be uploaded somewhere else, and the downtime for these files will be much higher than for files that are requested and uploaded often. Also, it's impossible to predict the effect of being in a long queue that contains lots of requests for large files, if my demand for the rare file is immediately placed at end of the queue. This file will not be duplicated before long.
The most demanded files will not have this problem: their "downtime" is much more smoothed. A demand for those files can stay longer in the queue but still benefit for more alt-locations some of them having shorter queues. If one uploader disconnects, there still remains the rest of the mesh to continue the download, and the total time to wait for those files can be much shorter as one can swarm these sources by placing its demands in multiple queues.
This means that the rate of successful transfers for rare files will be small or nil, and things may get even worse when servents will start uploading only small fragments: for a rare file, each fragment will need to be requeued. If a fragment has been transfered, of course it can be shared, but the mesh will not provide alternate locations for the rest of the file, notably if the file is large.
I think it's best to boost a little the least popular files. This will not affect much the most demanded ones, but will help provide more diversity to the content available on the Gnet. I would not like to use Gnutella only to be offered the same set of popular files that I have seen many times or that I may already have: so if I browse the host to look for other interesting files for the same subject, I will find rare ones but I will be placed at the same position as files that are the most demanded and have the most important number of alt-locations.
So I think that the number of alt-locations discovered for each file should be considered to help boost the files that are less distributed in the Gnet (because downloaders won't be able to attend at multiple queues to download it by fragments).
> How can a queuing system (with limited short queues in active
queuing, or infinite queues in PARQ) support this similar policy ?
By using per-file queues in which downloaders can take a ticket for
that file, and for which the uploading server will collect
alternative locations. The more alt-locations is available, the less
priority is given to that file. I don't want to promote a
all-or-nothing system, but just say that per-file queues can help
service rare files better without affecting much the service for
popular files.
If I'm understanding correctly, you are proposing a two level system:
One with a queue per file (usually fifo?) and a priority queue of
queues, which determines from what queue the next starting upload must
be selected.
If that's the case, I would note that that's theoretically equivalent
to have a single priority queue where slots with same priority are
treated in fifo order. Maybe this vision could be more appealing to
some. However your model seems to me for now cleaner from a "thinking"
perspective, from a practical (implementation) one maybe this second
is easier.
Another problem that multiple queues have and I've not seen discussed
(but which affects for example the latest Shareaza queuing system) is
when a request can satisfy the entry condition of two or more queues
in the same server.
When you split the requests by file size, if they don't overlap, this
problem doesn't arise (use however [x,y) intervals :). But let's say
you have two queues, one prioritized by "rareness" and another by fifo
policy.
In that case, instead of choosing only one queue by some reason
(precedence, shortness), I think that what must be done is:
*) Enqueue each request in every queue that satisfies the entry
condition.
*) When a request reaches a head and is ready to upload, remove it
from the other queues.
This ensures optimal wait time for clients and that no queue can be
emptied while other has candidate clients.
This is my theoretical view, maybe it's implementation detail (you
could review queues in a periodical manner and requeue clients if
other queues are shorter), but at least is a real problem to not
forget about.
I'm still thinking on the default queues I want to define in my
server, but for now I think on these: (please explain the problems you
find in it):
*) a fifo queue.
*) a shorter file first queue.
*) a rarest file first queue.
Also, they preempt after 15 minutes of upload or 15 MB sent. This seem
reasonable default for everything like music and below, leaving the
preemption only for really big files. The multiple-queuing is in place
(yes, queues are three times larger, but that only takes a bit more
memory and they shorten also thrice as fast :)
I see the problem of the "live" (uploading) slots. What's the usual
number for a 56k modem user? If it's <3 then some other defaults must
be provided (or take Philippe approach and establish a priority for
the ready queues). What does Gtk-Gnutella? Minimum a live slot per
queue or not?
Kind regards.
-----
> >>>>> Follows another analogy. -Draketo <<<<<
I'd like to make an analogy:
Consider a large marketing company where business workers all have a lot of documents to duplicate of various sizes. The company has a limited set of photocopiers to duplicate these documents.
When someone wants to have its photocopies, it takes a ticket in front of the copier it wants to use. However copiers have different capabilities, and for some large or colored documents only one model can satisfy the demand, but for basic documents such as letters all models are usable.
People that have letters to duplicate will take a ticket in each queue. The number of tickets available is assumed to be infinite (this is the case for PARQ which can support extremely long queues), but one cannot take a ticket for a queue if he already has a ticket for another document. But to avoid that the queues stay idle for too long time, a policy limits the number of pages that can be duplicated: when this quotais exceeded, the ticket is taken and gies the right to take a new ticket for the same machine.
Now what will happen? If there is no other policy, the people that have letters to duplicate can use all copiers and satisfy their demand in a reasonable time (as if they had waited only once in only one queue) even if they have lots of copies to create.
But those people that have large documents to duplicate will only be able to wait at the same single queue for the specific copier. If the number of pages to create is above the threshold, once he has got his quota, he must wait a second time for the same duration in the same queue to continue his task.
The total time to wait then becomes N times longer for that person that can wait at a single queue. Suppose that that copier gets out of paper, and needs to be put offline waiting for someone to service it, the worker will need to wait also that this matintenance completes. On the other side, other workers with letters can simply go to all the other copiers and have their work completed without much more delay.
Now add a policy, and give some priority to those workers htat have large or color document to duplicate to use the specific model. They now can regain some avantage which will compensate the fact that there's no other machine able to process the document. The impact is that they will experiment a shorter queue on a single machine, than people that can attend simultaneously on several queues to process their copies in parallel on several machines when they gain the first rank oneach of them.
The total productivity of the workers in terms of pages duplicated per hour becomes more fair, and each one can do its job in time and send the letters to its customers...
How can a queuing system (with limited short queues in active queuing, or infinite queues in PARQ) support this similar policy ? By using per-file queues in which downloaders can take a ticket for that file, and for which the uploading server will collect alternative locations. The more alt-locations is available, the less priority is given to that file. I don't want to promote a all-or-nothing system, but just say that per-file queues can help service rare files better without affecting much the service for popular files.
Of course this model will probably not work with active queueing (because one cannot have a lot of per-file queues) but will prove useful when PARQ will be used in addition to PFS and an upload policy that limits the size of fragments delivered to each client.
> I'd like to avoid more analogies. Like most metaphors they break down
at some point and although they are useful for making complex
situations more understandable I think the long queue reality is
simple enough to deal with directly.
Having said that I agree with most of the logic behind your previous
post but as I stated I am certainly not sure (is that a Rumsfeld-ism)
LOL) of any solutions to the rare file problem, and especially the
large rare file problem you mentioned - that's a doozy! :)
I certainly do not want to stop you coming up with ideas though
Philippe and I feel sure that a long queue gives you more information
to work with re. optimizing for rare files than a short queue does.
I am certain of PARQ's benefits now though and I hope it's powerful
scalable queuing becomes another corner-stone of the Gnutella network.
-----
I'm now following this discussion for a while, and I want to add a new
point in this discussion: Getting a file instantly.
For small files, I like about the Gnutella Network, that I can simply
click them, and they are downloaded instantly, or not at all. There is
no waiting included.
For a large file I'd already have to wait some time for the download to
finish, so a queue doesn't bother me.
So I like the Idea of a queue for big files, and some simple upload
slots for smaller ones. Maybe 1/3 of the slots for larger files using
PARQ or something similar (at least one) and the rest as simple upload
slots (at least one).
To reduce the demand a bit, you could reduce the number of simultaneous
downloads,, or let the User do that. I am on a modem, so if 20 dls
start at once, I get none of them (all of them break -> hosts appear to
be offline, but aren't). The largest number, which still work is three,
with a much too great failure rate, so for me I'd need it at two.
So simply check only three files-to-be-downloaded at once, standardly
(maybe let the user specify this), and don't check any more f-t-b-dd,
when there are already three dls running. Simply queue them on the
requesters side. So no upload spots are wasted on very slow dls.
-----
> >>>>> Another long one from dave, which I don't have the patience to fully process. -Draketo <<<<<
> [Quote]
Any "arguments" yourself and sidekick Dave have put up have been squashed.
[/Quote]
Sidekick? Jackass.
I don't see anything that has been posted to counter my statements.
You personal opinion aside, my technical statements stand unattacked, and even agreed/conceeded (some LimeWire guys) or partially so (Raph).
Don't give me this crap.
For the record, here are *YOUR* contributions to the discussion.
[Quote]
I am certain of PARQ's benefits now though and I hope it's powerful
scalable queuing becomes another corner-stone of the Gnutella network.
[/Quote]
Great technical argument there. Glad you are "certain" about this.
[Quote]
Very good point Jay, **THE MOST PERFECT ANALOGY** does indeed imply
that queues only ever fill up. As you say, we all agree that queuing
will not solve the demand/supply problem and Dave's analogy clearly
shows us that you have to wait longer the further back you are in the
queue
[/Quote]
No, it doesn't. Had you read the post, and the analogy attached to it, you would see that there is a very real drain on the queues, and that there is a limit placed on overall queue growth. You probably didn't read it clearly, though.
(*1*)
But wait, you DID read it:
[Quote]
The extrapolation you neglect to include in your example for Active
Queuing (the current system):
DISCRETE TIME STEP 2000/10: (200 1/10 seconds later)
Sent: 2000KB, Buffered: 100KB, Lost: 1900KB <== ~50% LOST! (as
expected from initial specs :-/)..
..which means there is a massive amount of unmanaged (Lost!) demand
flying round the network, which brings me to the second point I made,
efficiency.
[/Quote]
Which, while correctly fitting the FORM of my analogy, is completely irrelevant to it.
> Read mosteo's posts, man. The relevant point I was making was that in
congested situations the chance of one getting a download are left to
chance because the requests are lost. A significant amount in your
analogy.
> Recall, this was to illustrate the problem with essentially infinte, durable queues.
Active queuing is not this.
So, what was your excuse for the butchering in (*1*)? Who knows?
[Quote]
I love your pretty stars Dave, they make me smile.
[/Quote]
Yes, they make me smile as well.
[Quote]
Instead of real world examples or fantasy Post Offices try
considering eMule again. I have heard many people say that it doesn't
work. Well here's the first thing you should learn from this
1,000,000 user network - it DOES work.
[/Quote]
So, you say it works. The users say it doesn't.
I have good evidence and reasoning to indicate that, in fact, it is not a Good Thing (TM).
You are entitled to your opinion.
> Well The Users do say a lot of contradictory things. I must have a
word with them. I dare say any sample you take of 1,000,000 will
never be enough to judge.
I go only on my own experience which hopefully means that the people
with similar tastes are experiencing the same. The evidence is on my
hard drive - evidence, such a crude word :). I have no special set-up
and popular tastes so I say eMule Works Like a Charm (TM) and has
done for years (hence the 1,000,000 users).
> Well, KaZaA works like a "charm" and they don't have queues.
> I guess the final "vote" will be users, and which system they are happier with.
If, in fact, the sampling of eMule users I have heard from is grossly non-representative, then you would have nothing to worry about.
[Quote]
May the best man 'win', of course :)
[/Quote]
Of course....
OK, so I see no "evidence" from you, that is real or otherwise, that would disprove ANY of the statements I made earlier.
So much for being "generally debunked."
> Nice to see you finally read my post, how about you do the other
users the courtesy of the same and you will be, although there isn't
much to debunk, queuing seems a relatively simple matter.
> Um, this is quite hard for me to parse, bu tI think you are saying that if I read everyone's posts, then my argument is "debunked"?
Well, I have and did read all posts at the time they were posted, and I disagree still with your assessment.
-----
GDFers,
As much as I love a little fighting match every now and then, lets try to
stay a happy family and not fight needlessly. A lot of people still think
that Gnutella is a crappy protocol so I think we have more to do.
Openness isn't worth anything if we can't work together well. We have to
bring people around to the benefits of an open protocol. To do that, it has
to be clearly better.
> Well said, greg =)
> Greg, I notice that no LimeWire representatives have commented on
remote queues.
Dave has given me pretty convincing arguments that remote queues
don't do a whole lot to relieve the demand, or increase supply.
What is your take on the matter? My objection to adding new queuing
systems, or playing around with the one that we have, is based on
Dave's "vestibule problem" observation, and our data in the labs.
> I'm not sure what Greg's take on it is, but some of us were just
talking the other day about how remote queues don't seem to have had
much of an impact at all. Now that I think about it, it makes sense
that they wouldn't have much of an effect. They're better than
getting a busy, but not much. Not that I think we shouldn't use them,
but I agree that it's not a very productive avenue to continue
pursuing, as it will likely not lead to many more improvements. Push
proxies and other optimizations (PFSP and DHTs) are far more important
in my view.
> I'm somewhat loath to try and wade back through your discussion (wasn't here
at the time I believe). If there is a summary of the pros and cons, I would
love to see it.
I agree with Adam that PFSP and DHT (-like) features are probably more
important. However, there is something to be said for queuing as well. I'm
not sure what the cons you mentioned were other than perhaps too many
connection attempts. The one major PRO that Mark likes over here is that
queues encourage people to stick around. Being in position 708 might not be
too useful but number 20 might. Staying on for 20 hours could be useful
(although most users can't or won't stay on that long).
I like queues because they are a good way of discouraging greedy clients
from continuously sending out broadcast queries. If most resources are
locked up in a queue, then you can't benefit by spamming more hosts or
repeatedly trying to connect every minute or every few seconds. However, we
can't have every client generating a lot of connection attempts for
maintaining queue slots either.
-----
> Well to be honest when Dave told me that remote queues didn't do much
good I of course thought he was hallucinating or had no idea about
what he was talking about in general.
After since looking at the data, the behavior of upload servers whose
queues are full, and considering the arguments presented against
remote queues, I now have to agree.
However, I will say this, that some of the arguments against remote
queues don't apply when there are queues for different file sizes. I
would be interested in an analysis of queuing behavior on the network
when small/large (or more) queues exist.