GOSSiP Frequently Asked Questions 1. GOSSiP Basics 1.1 What is GOSSiP?2. GOSSiP Concepts 2.1 What is a GOSSiP node?3. GOSSiP Concerns 3.1 What data is stored by a GOSSiP node?4. GOSSiP involvement 4.1 Where can I learn more? 1. GOSSiP Basics 1.1 What is GOSSiP? GOSSiP (Gossip Optimization for Selective Spam Prevention) is a distributed, peer-to-peer reputation management system. It tracks the behavior of e-mail senders and shares senders' reputations among participating mail servers. These reputations may then be used by mail servers as part of a comprehensive program to combat unwanted e-mail.1.2 Why should I use it? GOSSiP gives you additional information about e-mail senders based on observed behavior, in a manner that doesn't bring into question the motivations of some centralized authority, because there is no central authority in GOSSiP. The opinions on which you may base your decisions are obtained from peers you select.1.3 What does GOSSiP protect against? GOSSiP reports may be used by your mailserver to decide whether to accept or reject an incoming email. The reports are based on the opinions of your peers, along with your own accumulated experience with the sender. These opinions reflect whether you and your peers consider the sender to be a reputable or disreputable source of email. In this initial instantiation of GOSSiP, reputation is based on a sender's propensity to send spam, as observed by yourself and your peers.1.4 What doesn't GOSSiP protect against? GOSSiP isn't designed to be an accreditation or authentication system. It can't state whether the identity being assumed by an email sender is being used by the individual associated with that identity. Nonetheless, GOSSiP can distinguish between "real" identities and "forged" identities fairly accurately.1.5 How does GOSSiP work? Your GOSSiP node receives identity information from your mailserver when an incoming email is received. Your GOSSiP node consults its own experience with that identity, and queries its peers about the identity. The peers ask their peers, and so on to a certain depth. Each of the peers aggregate these responses along with their stored experiences, and share this information with your GOSSiP node. Your GOSSiP node decides how much it trusts the information from each of these peers, and then combines all this information into a single score reflecting the reputation of the identity in question. Based on this score, your GOSSiP node makes a recommendation to your mailserver as to whether the mail connection should be allowed to proceed.2. GOSSiP concepts 2.1 What is a GOSSiP node? A GOSSiP node is the basic unit of the distributed GOSSiP architecture. The node is a host running certain software, capable of sharing information with other nodes and with one or more mailservers. The node must also be capable of receiving feedback information.2.2 what is an identity? Identities uniquely identify email senders. Each identity has an associated reputation score, observed behavior history, and confidence rating.2.3 What is a reputation score? A reputation score is a single value that reflects a node's own experience with an identity as well as that of the node's peers (and, recursively, their peers). A query propagation mechanism ensures that queries are addressed by an adequate number of nodes, and increases the probability that responses include nodes with significant experience with the identity.2.4 What is the query propagation mechanism? When a GOSSiP node sends out a query about an identity's reputation, the query is sent to the node's immediate peers. Each query has a predefined value that determines how far a query should propagate before nodes stop passing it on to their peers. This value is called the Time-To-Live (TTL) value. Each query also contains a Unique Message Identification String (UMIS).2.5 How are reputation scores combined and computed? When a node receives reputation scores and confidence scores from each of its peers, it needs to combine these values with its own observed data and arrive at a new reputation score and confidence score to pass along to the requesting node, or to make a recommendation to a mailserver.2.6 What is the confidence rating? The confidence rating is a measure of how confident a GOSSiP node is in the reputation score it is reporting for an identity. The confidence rating based on a node's own data reflects the amount of experience the node has had with the identity in question.2.7 What is the observed behavior history? The observed behavior history for an identity is a first-in, first-out (FIFO) cache of binary values reflecting the previous N evaluations of that identity's email, as determined by the node's feedback mechanism. N is a fixed size determined when the node is deployed. Each cache entry can have a value of 0, 1, or null (no data). Each interaction with an identity creates an entry in this cache, substituting the null value for either a 0 or a 1.2.8 what is the feedback mechanism? The feedback mechanism is a crucial part of the GOSSiP architecture. A GOSSiP node accepts feedback on recently-received mail, and stores this feedback by inserting an entry into the observed behavior history.2.9 what is peer trust? When a node receives feedback for a message-ID, the node examines the cached responses provided by its peers for that message. If a peer disagreed with the feedback (e.g., suggesting the identity was a reputable source of email when in fact the message was spam), and if the node's own opinion of the identity agreed with the feedback, the peer's trust score is decreased. If a peer agreed with both the node's own opinion and the feedback, the peer's trust score is increased. If the feedback disagreed with both the node's own opinion and a peer's opinion, the peer's trust score is left untouched.3. GOSSiP Concerns 3.1 What data is stored by a GOSSiP node? A node stores the addresses of its peers, along with unique identifying strings associated with recent messages, and the peer's responses for those messages. The node stores the identities with which it has experience, along with its own recent history regarding that node. Full email addresses are not stored by a node.3.2 What happens when peers don't agree about an identity's reputation? When peers disagree about the reputation of a given identity, the node attemps to find consensus among the peers' responses. Obvious outliers are ignored, and an average is taken of the adjusted reports as described in 2.5. If the unlikely situation arises that all peers are equally opposite one another, the result would be a reputation near zero, which the node would interpret as "proceed with caution".3.3 What happens when a GOSSiP node doesn't agree with its peers? When a node disagrees with its peers, the discrepancy is resolved as described in 2.5.3.4 Is it possible to overcome a poor reputation? It is in fact possible to build a good reputation starting from a poor one. One of the underlying principles of GOSSiP is that reputations are earned through consistent behavior. If for some reason an identity has obtained a poor reputation, the reputation can be improved by repeatedly and consistently adhering to the standards and mores of the community in which the identity wishes to participate. In most communities, this would mean, "don't spam us." A single or small handful of conforming messages isn't enough to improve a truly poor reputation. To overcome a very poor reputation, the identity would need to demonstrate a sea change in its behavior.3.5 Is it possible to ruin a good reputation? Just as in 3.4, consistent behavior will determine an identity's overall reputation. If an identity has obtained a good reputation and begins exhibiting a prolonged pattern of undesireable behavior, that identity's reputation will start to decline. The speed with which the reputation declines would depend on an individual node's configuration as well as that node's prior history with the identity. Nonetheless, a few nonconforming emails won't destroy an otherwise pristine reputation. It would take a sustained spam run from a single identity (note that an identity is not the same as an email address) to adversely affect a strongly positive reputation.3.6 Can forged addresses affect legitimate users' reputations? In a word, no. Identities are defined in 2.2 as being a combination of the right-hand-side of an email address and the IP from which the mail is being sent. Legitimate identities tend to come from a small number of IPs, whereas forged addresses tend to come from a wide variety of sources, and often in small batches for a given address. GOSSiP would correctly classify these forged identities as separate from legitimate email sent by legitimate users through their normal channels.3.7 What about malicious feedback? As GOSSiP currently stands, the feedback mechanism is designed to receive information from a trusted, automated source -- typically, the spam filter that works in conjunction with the mailserver to which the GOSSiP node is tied. Since no human feedback is accepted, the possibility of retaliatory feedback or other malicious attempts to game the feedback system is minimized.3.8 What about erroneous feedback? Since the feedback comes from your own automated spamfilters, the necessity of monitoring and correcting these filters already exists. If the spamfilters remain uncorrected, the errors will propagate to the GOSSiP node and the reputation(s) of the identity/ies being affected by your erroneous filters. However, the chances of these erroneous reputations being propagated beyond your single node are low. GOSSiP is designed to work in conjunction with, rather than a substitute for or override to your spamfiltering regimen.3.9 Are there any privacy issues? One known issue is that the query propagation mechanism will allow a GOSSiP node to respond with its own stored data without first combining it with its peers' data when the query TTL is 0. At worst, this allows the querying node to know that a particular domain name-IP address combination has been seen by that node in the fairly recent past.3.10 How does a GOSSiP node distinguish between an identity that has a near-zero reputation because it's a new identity versus a near-zero reputation due to a history of inconsistent behavior? As described in 2.6, the reputation score is mediated by the confidence raing to distinguish between these two cases.4. GOSSiP involvement 4.1 Where can I learn more? The official website for the GOSSiP project is at http://www.sufficiently-advanced.net/. There you will find a draft specification of the GOSSiP architecture along with links to the mailing list archives and subscription mechanism.4.2 Where can I get code? As of 7/23/2004, code is not yet ready to be released. If you're interested in assisting the development of GOSSiP, you are strongly encouraged to subscribe to the mailing list and join in.4.3 Is there a mailing list? Yes. |