Consider large (100,000+ vertices, say) graphs, which we think of as representing some population with edges representing some form of symmetric relation. They might be the Friend graph of Facebook, mathematicians with the collaboration relation, or a large computer network.
These networks have the property that they are neither highly structured, nor totally random. No information about other edges on the graph can tell me for certain whether a given pair of vertices is connected. That said, if a given pair of vertices have many common neighbors, then it is considerably more likely that they are connected by an edge (so it is not entirely random). I've seen some lectures on graphs like this, and I understand they are a productive area of research (see, for instance, Kleinberg or Lovasz).
I am curious about the following phenomenon (my description is vague, but part of my question is asking for a good definition). These networks tend to have subsets (which I will call 'clumps') which are significantly more connected to each other than to the average vertex in the graph. Consider a college in Facebook or a research group in mathematics, for example. If the graphs were small enough to draw in a reasonable way, such clumps would be obvious to the naked eye. For very large graphs, this is impractical; so instead, I ask,
1) What is a graph-theoretic way to characterize these clumps?
Clearly, there won't be a yes-no criterion, but I am hoping for some quantity that measures how much a given subset is a clump. This should also factor in the statistical significance of the clump. Very small subsets which are highly connected will happen even in totally random graphs, whereas a large subset which is even moderately well-connected is unlikely in a random graph, and would be interesting to find.
2) Given a graph (and a definition of a clump), how does one find the clumps?
Is there a definition and an algorithm so robust that it can take networks like Facebook or the collaboration graph, and return the clumps that we know are there, like colleges or research discplines?
Oh, and I am not looking for the Szemeredi partition of a graph, which has some similarities to the kinds of partitions I am looking for, but is explicitly a partition of the graph into similar sized chunks. The clumps in a graph don't have to be the same size, disjoint, or contain every element.