faq

News

FAQ

Documentation

Mailing List

Downloads

Links

GOSSiP Frequently Asked Questions

1. GOSSiP Basics

1.1 What is GOSSiP?
1.2 Why should I use it?
1.3 What does GOSSiP protect against?
1.4 What doesn't GOSSiP protect against?
1.5 How does GOSSiP work?

2. GOSSiP Concepts
2.1 What is a GOSSiP node?
2.2 what is an identity?
2.3 What is a reputation score?
2.4 What is the query propagation mechanism?
2.5 How are reputation scores combined and computed?
2.6 What is the confidence rating?
2.7 What is the observed behavior history?
2.8 what is the feedback mechanism?
2.9 what is peer trust?

3. GOSSiP Concerns
3.1 What data is stored by a GOSSiP node?
3.2 What happens when peers don't agree about an identity's reputation?
3.3 What happens when a GOSSiP node doesn't agree with its peers?
3.4 Is it possible to overcome a poor reputation?
3.5 Is it possible to ruin a good reputation?
3.6 Can forged addresses affect legitimate users' reputations?
3.7 What about malicious feedback?
3.8 What about erroneous feedback?
3.9 Are there any privacy issues?
3.10 How does a GOSSiP node distinguish between an identity that has a near-zero reputation because it's a new identity versus a near-zero reputation due to a history of inconsistent behavior?

4. GOSSiP involvement
4.1 Where can I learn more?
4.2 Where can I get code?
4.3 Is there a mailing list?

1. GOSSiP Basics
1.1 What is GOSSiP?

GOSSiP (Gossip Optimization for Selective Spam Prevention) is a distributed, peer-to-peer reputation management system. It tracks the behavior of e-mail senders and shares senders' reputations among participating mail servers. These reputations may then be used by mail servers as part of a comprehensive program to combat unwanted e-mail.

1.2 Why should I use it?

GOSSiP gives you additional information about e-mail senders based on observed behavior, in a manner that doesn't bring into question the motivations of some centralized authority, because there is no central authority in GOSSiP. The opinions on which you may base your decisions are obtained from peers you select.

1.3 What does GOSSiP protect against?

GOSSiP reports may be used by your mailserver to decide whether to accept or reject an incoming email. The reports are based on the opinions of your peers, along with your own accumulated experience with the sender. These opinions reflect whether you and your peers consider the sender to be a reputable or disreputable source of email. In this initial instantiation of GOSSiP, reputation is based on a sender's propensity to send spam, as observed by yourself and your peers.

1.4 What doesn't GOSSiP protect against?

GOSSiP isn't designed to be an accreditation or authentication system. It can't state whether the identity being assumed by an email sender is being used by the individual associated with that identity. Nonetheless, GOSSiP can distinguish between "real" identities and "forged" identities fairly accurately.

GOSSiP isn't an anti-spam, anti-virus, or anti-*anything* system. GOSSiP's purpose is to observe and track behavior for various identities, form opinions about this behavior, and share these opinions with others. Currently, these opinions are based on spamming tendencies, but the GOSSiP architecture can be adapted to track a wide variety of behaviors.

1.5 How does GOSSiP work?

Your GOSSiP node receives identity information from your mailserver when an incoming email is received. Your GOSSiP node consults its own experience with that identity, and queries its peers about the identity. The peers ask their peers, and so on to a certain depth. Each of the peers aggregate these responses along with their stored experiences, and share this information with your GOSSiP node. Your GOSSiP node decides how much it trusts the information from each of these peers, and then combines all this information into a single score reflecting the reputation of the identity in question. Based on this score, your GOSSiP node makes a recommendation to your mailserver as to whether the mail connection should be allowed to proceed.

If the mail is allowed to go through, the GOSSiP node will accept feedback about the mail for a certain period of time. This feedback is used by your GOSSiP node to evaluate the decisions it made about its own information, and the information supplied by its peers.

2. GOSSiP concepts
2.1 What is a GOSSiP node?

A GOSSiP node is the basic unit of the distributed GOSSiP architecture. The node is a host running certain software, capable of sharing information with other nodes and with one or more mailservers. The node must also be capable of receiving feedback information.

A GOSSiP node is usually separate from your mailserver, but this is not a requirement.

2.2 what is an identity?

Identities uniquely identify email senders. Each identity has an associated reputation score, observed behavior history, and confidence rating.

An identity is based on the pairing of a sender's IP address and the right-hand side of the RFC2821 "MAIL FROM" address.

2.3 What is a reputation score?

A reputation score is a single value that reflects a node's own experience with an identity as well as that of the node's peers (and, recursively, their peers). A query propagation mechanism ensures that queries are addressed by an adequate number of nodes, and increases the probability that responses include nodes with significant experience with the identity.

Reputation scores also reflect a node's trust in other nodes based on previous experience with those nodes.

Finally, reputation scores incorporate the confidence reporting nodes have in their data.

2.4 What is the query propagation mechanism?

When a GOSSiP node sends out a query about an identity's reputation, the query is sent to the node's immediate peers. Each query has a predefined value that determines how far a query should propagate before nodes stop passing it on to their peers. This value is called the Time-To-Live (TTL) value. Each query also contains a Unique Message Identification String (UMIS).

When a node receives a query, it decrements the TTL value. It then compares the UMIS against a cache of recently-received UMIS values. If the UMIS has been seen recently, the node does not incorporate its own data into the query response. Instead, it merely acts as a relay for the request.

If the TTL is not 0, the node passes on the request to its peers and waits for a response. If the TTL is 0 and there was no UMIS cache match, the node responds with its own reputation score for the identity, along with its confidence rating for that score.

When a node passes on a query, it waits for each peer to respond, either with a reputation score and confidence rating, or with a message indicating that the request has already been seen. If no response is seen from a peer, the node waits for a timer to expire.

Once one of these conditions has been met for each peer, the node combines all peer reports with its own data (unless the UMIS cache match succeeded) to arrive at a new reputation score and confidence value. This value is then passed back to the peer that submitted the original request.

2.5 How are reputation scores combined and computed?

When a node receives reputation scores and confidence scores from each of its peers, it needs to combine these values with its own observed data and arrive at a new reputation score and confidence score to pass along to the requesting node, or to make a recommendation to a mailserver.

This process includes several steps. First, reputation scores from the peers are compared, and obvious outliers are tossed out. Next, each peer's reputation score is examined with respect to the peer's confidence in that reputation, and the reputation is adjusted to reflect that confidence.

Peer reports are then adjusted according to the node's trust rating for each of the peers. These adjusted reputation scores are averaged, along with the node's own reputation score for that identity, adjusted according to its own confidence in that score.

This averaged score is what is reported to the requesting peer, or the mailserver.

2.6 What is the confidence rating?

The confidence rating is a measure of how confident a GOSSiP node is in the reputation score it is reporting for an identity. The confidence rating based on a node's own data reflects the amount of experience the node has had with the identity in question.

When a node needs to figure out its confidence in its own data for an identity, it refers to the observed behavior history for the identity, and computes the ratio of entries to the size of the history cache. So, for example, if the size of the observed behavior history cache is 10, and the node only has 3 observations in that cache, the ratio is 3/10, or 0.3. This value becomes the confidence rating for that identity when the node determines its own opinion of that identity's reputation.

The internal confidence rating just described is only reported to a peer when it is responding to a query and the query's TTL is 0. Otherwise, the confidence rating the node reports reflects not only that node's internal confidence rating, but its trust in the peers whose data has been incorporated in the confidence rating it is reporting.

2.7 What is the observed behavior history?

The observed behavior history for an identity is a first-in, first-out (FIFO) cache of binary values reflecting the previous N evaluations of that identity's email, as determined by the node's feedback mechanism. N is a fixed size determined when the node is deployed. Each cache entry can have a value of 0, 1, or null (no data). Each interaction with an identity creates an entry in this cache, substituting the null value for either a 0 or a 1.

When confidence ratings need to be computed for an identity, the node examines the observed behavior history and determines how many non-null entries exist. A full cache would result in a 100% confidence rating, whereas a cache whose size was 100 yet contained only 25 entries would result in a confidence rating of 25%.

2.8 what is the feedback mechanism?

The feedback mechanism is a crucial part of the GOSSiP architecture. A GOSSiP node accepts feedback on recently-received mail, and stores this feedback by inserting an entry into the observed behavior history.

Each query initiated by a mailserver includes the unique message-ID string from the Message-ID header. This string is stored in a first-in, first out cache of fixed size. Once an email has been passed through the system, it is possible for an entity to submit feedback to the GOSSiP node about the email referenced by this Message-ID string. If the Message-ID string still exists in the node's cache of Message-ID strings, the node will accept the feedback (which in this instantiation of GOSSiP is a simple spam or nonspam opinion) and make an entry for the identity associated with the Message-ID string in its observed behavior cache.

In this instantiation of GOSSiP, the source of the feedback is the mailserver's spam filtering software. Feedback is automatically submitted to the GOSSiP node for each message processed.

2.9 what is peer trust?

When a node receives feedback for a message-ID, the node examines the cached responses provided by its peers for that message. If a peer disagreed with the feedback (e.g., suggesting the identity was a reputable source of email when in fact the message was spam), and if the node's own opinion of the identity agreed with the feedback, the peer's trust score is decreased. If a peer agreed with both the node's own opinion and the feedback, the peer's trust score is increased. If the feedback disagreed with both the node's own opinion and a peer's opinion, the peer's trust score is left untouched.

3. GOSSiP Concerns
3.1 What data is stored by a GOSSiP node?

A node stores the addresses of its peers, along with unique identifying strings associated with recent messages, and the peer's responses for those messages. The node stores the identities with which it has experience, along with its own recent history regarding that node. Full email addresses are not stored by a node.

3.2 What happens when peers don't agree about an identity's reputation?

When peers disagree about the reputation of a given identity, the node attemps to find consensus among the peers' responses. Obvious outliers are ignored, and an average is taken of the adjusted reports as described in 2.5. If the unlikely situation arises that all peers are equally opposite one another, the result would be a reputation near zero, which the node would interpret as "proceed with caution".

3.3 What happens when a GOSSiP node doesn't agree with its peers?

When a node disagrees with its peers, the discrepancy is resolved as described in 2.5.

3.4 Is it possible to overcome a poor reputation?

It is in fact possible to build a good reputation starting from a poor one. One of the underlying principles of GOSSiP is that reputations are earned through consistent behavior. If for some reason an identity has obtained a poor reputation, the reputation can be improved by repeatedly and consistently adhering to the standards and mores of the community in which the identity wishes to participate. In most communities, this would mean, "don't spam us." A single or small handful of conforming messages isn't enough to improve a truly poor reputation. To overcome a very poor reputation, the identity would need to demonstrate a sea change in its behavior.

3.5 Is it possible to ruin a good reputation?

Just as in 3.4, consistent behavior will determine an identity's overall reputation. If an identity has obtained a good reputation and begins exhibiting a prolonged pattern of undesireable behavior, that identity's reputation will start to decline. The speed with which the reputation declines would depend on an individual node's configuration as well as that node's prior history with the identity. Nonetheless, a few nonconforming emails won't destroy an otherwise pristine reputation. It would take a sustained spam run from a single identity (note that an identity is not the same as an email address) to adversely affect a strongly positive reputation.

3.6 Can forged addresses affect legitimate users' reputations?

In a word, no. Identities are defined in 2.2 as being a combination of the right-hand-side of an email address and the IP from which the mail is being sent. Legitimate identities tend to come from a small number of IPs, whereas forged addresses tend to come from a wide variety of sources, and often in small batches for a given address. GOSSiP would correctly classify these forged identities as separate from legitimate email sent by legitimate users through their normal channels.

3.7 What about malicious feedback?

As GOSSiP currently stands, the feedback mechanism is designed to receive information from a trusted, automated source -- typically, the spam filter that works in conjunction with the mailserver to which the GOSSiP node is tied. Since no human feedback is accepted, the possibility of retaliatory feedback or other malicious attempts to game the feedback system is minimized.

3.8 What about erroneous feedback?

Since the feedback comes from your own automated spamfilters, the necessity of monitoring and correcting these filters already exists. If the spamfilters remain uncorrected, the errors will propagate to the GOSSiP node and the reputation(s) of the identity/ies being affected by your erroneous filters. However, the chances of these erroneous reputations being propagated beyond your single node are low. GOSSiP is designed to work in conjunction with, rather than a substitute for or override to your spamfiltering regimen.

3.9 Are there any privacy issues?

One known issue is that the query propagation mechanism will allow a GOSSiP node to respond with its own stored data without first combining it with its peers' data when the query TTL is 0. At worst, this allows the querying node to know that a particular domain name-IP address combination has been seen by that node in the fairly recent past.

3.10 How does a GOSSiP node distinguish between an identity that has a near-zero reputation because it's a new identity versus a near-zero reputation due to a history of inconsistent behavior?

As described in 2.6, the reputation score is mediated by the confidence raing to distinguish between these two cases.

4. GOSSiP involvement
4.1 Where can I learn more?

The official website for the GOSSiP project is at http://www.sufficiently-advanced.net/. There you will find a draft specification of the GOSSiP architecture along with links to the mailing list archives and subscription mechanism.

4.2 Where can I get code?

As of 7/23/2004, code is not yet ready to be released. If you're interested in assisting the development of GOSSiP, you are strongly encouraged to subscribe to the mailing list and join in.

4.3 Is there a mailing list?

Yes.